Find Orphaned Source Files Using LINQ
On a project I am working on there is a growing number of files that are in Source Control but are not actually referenced by any .csproj files. I decided to write a quick and dirty command line program to find these files, and at the same time learn a bit of LINQ to XML.
During the course of my development, I ran into a couple of tricky issues. First was how to combine some foreach loops into a LINQ statement, and second was to construct the regex for source file matching. Both I guess I could have solved myself with a bit of time reading books, but I decided to throw them out onto Stack Overflow. Both were answered within a couple of minutes of asking. I have to say this site is incredible, and rather than treating it as a last resort for questions I have reached the end of my resources on, I am now thinking of it more like a super-knowledgeable co-worker who you can just ask a quick question and get a pointer in the right direction.
Here's the final code. I'm sure it could easily be turned into one nested LINQ query and improved on a little, but it does what I need. Feel free to suggest refactorings and enhancements in the comments.
using System.Text;
using System.IO;
using System.Xml.Linq;
using System.Text.RegularExpressions;
namespace SolutionChecker
{
public class Program
{
public const string SourceFilePattern = @"(?<!\.g)\.cs$";
static void Main(string[] args)
{
string path = (args.Length > 0) ? args[0] : GetWorkingFolder();
Regex regex = new Regex(SourceFilePattern);
var allSourceFiles = from file in Directory.GetFiles(path, "*.cs", SearchOption.AllDirectories)
where regex.IsMatch(file)
select file;
var projects = Directory.GetFiles(path, "*.csproj", SearchOption.AllDirectories);
var activeSourceFiles = FindCSharpFiles(projects);
var orphans = from sourceFile in allSourceFiles
where !activeSourceFiles.Contains(sourceFile)
select sourceFile;
int count = 0;
foreach (var orphan in orphans)
{
Console.WriteLine(orphan);
count++;
}
Console.WriteLine("Found {0} orphans",count);
}
static string GetWorkingFolder()
{
return Path.GetDirectoryName(typeof(Program).Assembly.CodeBase.Replace("file:///", String.Empty));
}
static IEnumerable<string> FindCSharpFiles(IEnumerable<string> projectPaths)
{
string xmlNamespace = "{http://schemas.microsoft.com/developer/msbuild/2003}";
return from projectPath in projectPaths
let xml = XDocument.Load(projectPath)
let dir = Path.GetDirectoryName(projectPath)
from c in xml.Descendants(xmlNamespace + "Compile")
let inc = c.Attribute("Include").Value
where inc.EndsWith(".cs")
select Path.Combine(dir, c.Attribute("Include").Value);
}
}
}
Comments
Using a Hashset as the result of FindCSharpFiles() provides a significant speed improvement. Our 1000+ file project is analyzed much much faster.
Joelthanks for the tip Joel. I haven't really got into the hashset class as our project is on .NET 2.0 still. Looks like its a useful class.
Mark H