C# Basics: If-Else If-Else vs Switch vs Dictionary<K,T> in Dealing with Multiple Choices

In this fourth installment of C# Basics, let’s take a look at how you handle multiple choice questions in your code. When you need to decide on a course of action based on the multiple possible values a variable may have, you have three essential choices.

  1. if | else if … | else
  2. switch
  3. Dictionary<K,T>

The code below shows you an example of each:

public class DownloadViewModel
{
  public Byte[] Contents { get; set; }
  public string FileName { get; set; }

  public string ContentType1
  {
    get
    {
      string ext = Path.GetExtension(FileName).ToLower();
      if (ext == ".pdf")
        return "application/pdf";
      else if (ext == ".txt")
        return "application/txt";
      else  
        return "application/octet-stream";
    }
  }

  public string ContentType2
  {
    get
    {
      switch (Path.GetExtension(FileName).ToLower())
      {
        case ".pdf":
          return "application/pdf";
        case ".txt":
          return "application/txt";
        default:
          return "application/octet-stream";
      }
    }
  }

  public string ContentType3
  {
    get
    {
      if (map.Count == 0) LoadMap();
      string val = string.Empty;
      if (!map.TryGetValue(Path.GetExtension(FileName).ToLower(), out val))
      {
        val = "application/octet-stream";
      }
      return val;
    }
  }

  private void LoadMap()
  {
    map.Add(".pdf", "application/pdf");
    map.Add(".txt", "application/txt");
    map.Add("*.*", "application/octet-stream");
  }

  private Dictionary<string, string> map = new Dictionary<string, string>();
}

So how do you pick which one to use? For me, the choice is easy. If I have only a few possible values that are not likely to change, I’ll use the if|else if|else construct. If I have a fair number (more than 3 and less than 16) and these values are not likely to change, I’ll use the switch statement. But if the values are likely to change or be data driven or if the number of values is greater than 15, I prefer to use a Dictionary.

Another highly valuable use of the Dictionary is when the values will kick off some action that may be lengthy or complicated. In that case, I prefer a Dictionary<T, ActionForT>. I can then retrieve the ActionForT and call the Execute method. Like this:

public class DownloadModel
{
  public Byte[] Contents { get; set; }
  public string FileName { get; set; }

  public string ContentType
  {
    get
    {
      string ext = Path.GetExtension(FileName).ToLower();
      if (ext == ".pdf")
        return "application/pdf";
      else if (ext == ".txt")
        return "application/txt";
      else  
        return "application/octet-stream";
    }
  }

  public void Save()
  {
    if (map.Count == 0) LoadMap();
    Action action = null;
    if (map.TryGetValue(Path.GetExtension(FileName).ToLower(), out action))
    {
      action(); //execute the action stored in our map
    }
    else
    {
      map["*.*"](); //execute the default action
    }
  }

  private void SavePdf()
  {
    throw new NotImplementedException();
  }

  private void SaveTxt()
  {
    throw new NotImplementedException();
  }

  private void SaveOther()
  {
    throw new NotImplementedException();
  }

  private void LoadMap()
  {
    map.Add(".pdf", new Action(SavePdf));
    map.Add(".txt", new Action(SaveTxt));
    map.Add("*.*", new Action(SaveOther));
  }

  private Dictionary<string, Action> map = new Dictionary<string, Action>();
}

C# Basics: Beware of Catch (Exception e)

In this third installment of C# Basics, let's take a look at exception handling. It is one the easiest things to do poorly in an application. In C# you can capture an error and do something with it as easily as this:

try
{
  //do something that may raise an exception
}
catch (Exception e)
{
  //do something with e--log it, modify a return value, etc.
  //if you want the caller of your code to have a crack at the same exception
  throw; //raise the exception again with throw; (almost NEVER throw e;)
}

But do try your best to NEVER catch the base Exception class. Be more specific, especially where you have to do something specific with a particular type of exception and allows others to rise up the call stack. Here’s just one example of being a bit more specific:

try
{
  //call a service that reads data from SQL Server
}
catch (SqlException se)
{
  if (se.Message.Contains("Timeout"))
  {
    //send a timeout log message to the DBA service
  }
  throw; //raise it up the stack to be handled by the caller
}

Of course, you can and ought to handle different exceptions differently where required. Here’s a brief example:

try
{
  //call a service that reads data from SQL Server
  //call another service that writes to a file
}
catch (SqlException se)
{
  if (se.Message.Contains("Timeout"))
  {
    //send a timeout log message to the DBA service
  }
  throw; //raise it up the stack to be handled by the caller
}
catch (IOException iox)
{
  //special handling or logging of IO exceptions
  Console.WriteLine(iox.Message);
}
catch (Exception e)
{
  //log a general exception and rethrow
  throw;
}

And don’t forget, you can create your own specialized exception classes. In fact, with Visual Studio you can use the code snippet for Exception. Just put your cursor in a C# file, outside of a class declaration, and start typing Exception and when you see the Intellisense prompt just hit TAB TAB. And you will have the following:

[Serializable]
public class MyException : Exception
{
	public MyException() { }
	public MyException(string message) : base(message) { }
	public MyException(string message, Exception inner) : base(message, inner) { }
	protected MyException(
	 System.Runtime.Serialization.SerializationInfo info,
	 System.Runtime.Serialization.StreamingContext context)
		: base(info, context) { }
}

Flesh out your own custom exception and raise and catch it just like a pro.

public void ShowExample(int count)
{
  if (count < 0) throw new MyException("Count cannot be less than zero.");
  // . . .
}

public void CallExample()
{
  try
  {
    ShowExample(-4);
  }
  catch (MyException me)
  {
    Console.WriteLine(me.Message);
  }
}

Of course, in the above example, you would more likely want to just throw a new ArgumentException. There is a very long list of commonly used exception classes in the .NET base class libraries. Spend some time on MSDN and get familiar with them. Here’s a brief but incomplete list of common exception classes you may want to catch or throw and then catch.

  • System.ArgumentException  
  • System.ArgumentNullException  
  • System.ArgumentOutOfRangeException  
  • System.ArithmeticException  
  • System.DivideByZeroException  
  • System.FormatException  
  • System.IndexOutOfRangeException  
  • System.InvalidOperationException  
  • System.IO.IOException  
  • System.IO.DirectoryNotFoundException  
  • System.IO.EndOfStreamException  
  • System.IO.FileLoadException  
  • System.IO.FileNotFoundException  
  • System.IO.PathTooLongException  
  • System.NotImplementedException  
  • System.NotSupportedException  
  • System.NullReferenceException  
  • System.OutOfMemoryException  
  • System.Security.SecurityException  
  • System.Security.VerificationException  
  • System.StackOverflowException  
  • System.Threading.SynchronizationLockException  
  • System.Threading.ThreadAbortException  
  • System.Threading.ThreadStateException  
  • System.TypeInitializationException  
  • System.UnauthorizedAccessException

C# Basics: The Conditional or Ternary Operator

For the second installment of C# Basics, let’s take a look at the conditional operator, sometimes known as the ternary operator. How many times have you written code like this:

if (a == b)
{
  x = y;
}
else
{
  x = z;
}

Probably never, at least not quite this simplistic or with such ancient style variable names, or one would hope anyway. But the example serves its purpose. Now use the conditional operator and convert that code to this:

x = (a == b) ? y : z;

You can read more about the conditional operator on MSDN here.

C# Basics: The Proper Use of a Constructor

This is the first in what I hope will be a long series of quick posts covering the basics of programming in C#. For this first installment, I’ll review the proper use and improper use of a class or struct constructor.

What is a Constructor
In a C# class, a constructor is a special code block for a class or struct called when an instance of that class or struct is created. If a constructor is not defined in code, the compiler will create a default constructor without parameters. Constructors can be overridden and can call a base class constructor.

Proper Use of a Constructor
Use a constructor to set or initialize class or struct fields. If this initialization code is lengthy, such as in the construction of a Windows Form class, move the initialization code to a partial class file and call it something like InitializeComponents. Here are a few examples:

public partial class Apple : Fruit
{
  //same as a default constructor if not constructor defined
  public Apple()
  {
  }
  
  //initialize a local class member
  public Apple(string name)
  {
    _name = name;
  }
  
  //initialize and call base class constructor
  public Apple(string name, double weight) : base(weight)
  {
    _name = name;
  }

  //constructor with conditional param to control construction behavior
  public Apple(bool initializeAll)
  {
    if (initializeAll) InitializeComponents();
  }

  //shown here but often hidden away in another file of defining the partial class
  void InitializeComponents()
  {
    //do some lengthy but simple initialization of class members
  }  
  
  private string _name;
}

Improper Use of a Constructor
Do not use a constructor for error prone or complex code. Do not use a constructor to execute tasks, business logic. I’m not going to include code examples of bad constructors but here are a few anti-patterns I’ve seen in my travels:

  • The constructor makes a call to an external service or database – BAD
  • The constructor creates child objects and calls methods on those objects to perform some business function – BAD BAD
  • The constructor asks for user input or reads from a file – BAD BAD BAD

Alternatives to a Constructor
If you require lengthy, long running or potentially error prone code to produce an instance object of a class or struct, use either a factory class or a static create method. Here’s how:

public class BookService : IBookService
{
  //a private default construtor prevents use of it outside of the class itself
  private BookService()
  {
  }
  
  public static BookService Create()
  {
    BookService instance = new BookService(); //calls private constructor
	
	//put lenghty or error prone code here that sets up the service
	
	return instance;
  }  
}

//use a factory for even more complex object creation
public static class BookServiceFactory
{
  public static IBookService Create(BookOptions options)
  {
    IBookService instance;
	
	//based on the BookOptions provide create the proper
	//concrete instance of an IBookService interface
	
	return instance;
  } 
}

Of course this is not a comprehensive treatment. It’s a blog post. If you want more, buy a book. If you have topics you would like to see addressed in future installments, please let me know.

System.Data.SQLite Write Comparison of SQLite 3 vs Berkeley DB Engines

In my last post, I covered getting the Berkeley DB SQL API running under the covers of the System.Data.SQLite library for .NET. Another weekend has come and I’ve had some time to run some tests. The whitepaper Oracle Berkeley DB SQL API vs. SQLite API indicates that the Berkeley engine will have relatively the same read speed as SQLite but better write speed due to it’s page locking vs. file locking.

Berkeley Read Bug
Originally, my code would write a certain number of rows in a given transaction on one or more threads and then read all rows on one or more threads, but I had to give up on my read code because the Berkeley implementation of System.Data.SQLite threw a "malloc() failed out of memory" exception whenever I ran the read test with more than one thread. You can read the code below and decide for yourself whether I made a mistake or not, but the test ran fine with the SQLite assembly from the sqlite.org site.

The Tests
There were four tests, each inserting similar but different data into a single table with a column representing each of the main data types that might be used in a typical application. Two single threaded tests, the first inserting 6,000 rows in a single transaction and the second 30,000 rows. The third and fourth tests were multithreaded: 6 threads inserting 1,000 rows each and then 10 threads inserting 500 rows each. 

Here’s the SQL for the table:

CREATE TABLE [TT] 
(
  [src] TEXT, 
  [td] DATETIME, 
  [tn] NUMERIC, 
  [ti] INTEGER, 
  [tr] REAL, 
  [tb] BLOB
);

The Results
The results do not show a vast difference in insert speed for Berkeley as I might have expected. The tests were run on an AMD 6 core machine with 8GB of RAM. I’m writing to an SSD drive which may or may not affect results as well.

sqlitevbdb

You may choose to interpret the results differently, but I don’t see enough difference to make me abandon the more stable SQLite 3 engine at this time.

The Code
I invite you to review the code. If I’ve missed something, please let me know. It’s pretty straightforward. With each test, I wipe out the database environment entirely, starting with a fresh database file. I use the System.Threading.Tasks library.

class Program
{
  static void Main(string[] args)
  {
    Console.WriteLine("SQLite Tests");
    var tester = new Tester();
    tester.DoTest("1 thread with 6000 rows", 1, 6000);
    tester.DoTest("6 threads with 1000 rows each", 6, 1000);

    tester.DoTest("1 thread with 30000 rows", 1, 30000);
    tester.DoTest("10 threads with 500 rows each", 10, 500);

    Console.WriteLine("tests complete");
    Console.ReadLine();
  }
}

class Tester
{
  public void DoTest(string testName, int threadCount, int rowCount)
  {
    DataMan.Create();
    Console.WriteLine("");
    Console.WriteLine(testName + " started:");

    List<Task> insertTasks = new List<Task>();
    DateTime beginInsert = DateTime.Now;
    for (int i = 0; i < threadCount; i++)
    {
      insertTasks.Add(Task.Factory.StartNew(() => InsertTest(i, rowCount), TaskCreationOptions.None));
    }

    Task.WaitAll(insertTasks.ToArray());
    DateTime endInsert = DateTime.Now;
    TimeSpan tsInsert = endInsert - beginInsert;

    //List<Task> readTasks = new List<Task>();
    //DateTime beginRead = DateTime.Now;
    //for (int i = 0; i < threadCount; i++)
    //{
    //   readTasks.Add(Task.Factory.StartNew(() => ReadTest(i), TaskCreationOptions.None));
    //}

    //Task.WaitAll(readTasks.ToArray());
    //DateTime endRead = DateTime.Now;
    //TimeSpan tsRead = endRead - beginRead;

    double rowsPerSecondInsert = (threadCount * rowCount) / tsInsert.TotalSeconds;
    //double rowsPerSecondRead = (threadCount * rowCount) / tsRead.TotalSeconds;
    Console.WriteLine("{0} threads each insert {1} rows in {0} ms", threadCount, rowCount, tsInsert.TotalMilliseconds);
    //Console.WriteLine("{0} threads each read all rows in {1} ms", threadCount, tsRead.TotalMilliseconds);
    Console.WriteLine("{0} rows inserted per second", rowsPerSecondInsert);
    //Console.WriteLine("{0} rows read per second", rowsPerSecondRead);
  }

  void InsertTest(int threadId, int rowCount)
  {
    DateTime begin = DateTime.Now;
    DataMan.InsertRows(rowCount);
    DateTime end = DateTime.Now;
    TimeSpan ts = end - begin;
    //Console.WriteLine("{0} lines inserted in {1} ms on thread {2}", rowCount, ts.TotalMilliseconds, threadId);
  }

  void ReadTest(int threadId)
  {
    DateTime begin = DateTime.Now;
    int count = DataMan.ReadAllRows();
    DateTime end = DateTime.Now;
    TimeSpan ts = end - begin;
    //Console.WriteLine("{0} rows read in {1} ms on thread {2}", count, ts.TotalMilliseconds, threadId);
  }
}

internal static class DataMan
{
  private const string DbFileName = @"C:\temp\bdbvsqlite.db";
  private const string DbJournalDir = @"C:\temp\bdbvsqlite.db-journal";

  public static void Create()
  {
    if (File.Exists(DbFileName)) File.Delete(DbFileName);
    if (Directory.Exists(DbJournalDir)) Directory.Delete(DbJournalDir, true);

    //Console.WriteLine(SQLiteConnection.SQLiteVersion);
    using (SQLiteConnection conn = new SQLiteConnection())
    {
      conn.ConnectionString = "Data Source=" + DbFileName + ";";
      conn.Open();
      using (SQLiteCommand cmd = conn.CreateCommand())
      {
        cmd.CommandTimeout = 180;
        cmd.CommandText = "DROP TABLE IF EXISTS [TT]; CREATE TABLE [TT] ([src] TEXT, [td] DATETIME, [tn] NUMERIC, [ti] INTEGER, [tr] REAL, [tb] BLOB);";
        cmd.CommandType = System.Data.CommandType.Text;
        int result = cmd.ExecuteNonQuery();
      }
      conn.Close();
    }
  }

  public static void InsertRows(int rowCount)
  {
    SQLiteParameter srParam = new SQLiteParameter();
    SQLiteParameter tdParam = new SQLiteParameter();
    SQLiteParameter tnParam = new SQLiteParameter();
    SQLiteParameter tiParam = new SQLiteParameter();
    SQLiteParameter trParam = new SQLiteParameter();
    SQLiteParameter tbParam = new SQLiteParameter();
    Random rnd = new Random(DateTime.Now.Millisecond);

    using (SQLiteConnection conn = new SQLiteConnection())
    {
      conn.ConnectionString = "Data Source=" + DbFileName + ";";
      conn.Open();
      using (SQLiteTransaction trans = conn.BeginTransaction())
      using (SQLiteCommand cmd = conn.CreateCommand())
      {
        cmd.CommandTimeout = 180;
        cmd.CommandType = CommandType.Text;
        cmd.CommandText = "INSERT INTO [TT] ([src],[td],[tn],[ti],[tr],[tb]) VALUES (?,?,?,?,?,?);";
        cmd.Parameters.Add(srParam);
        cmd.Parameters.Add(tdParam);
        cmd.Parameters.Add(tnParam);
        cmd.Parameters.Add(tiParam);
        cmd.Parameters.Add(trParam);
        cmd.Parameters.Add(tbParam);

        int count = 0;
        while (count < rowCount)
        {
          var data = MakeRow(rnd);
          srParam.Value = data.src;
          tdParam.Value = data.td;
          tnParam.Value = data.tn;
          tiParam.Value = data.ti;
          trParam.Value = data.tr;
          tbParam.Value = data.tb;
          int result = cmd.ExecuteNonQuery();
          count++;
        }
        trans.Commit();
      }
      conn.Close();
    }
  }

  public static int ReadAllRows()
  {
    List<DRow> list = new List<DRow>();
    using (SQLiteConnection conn = new SQLiteConnection())
    {
      conn.ConnectionString = "Data Source=" + DbFileName + ";";
      conn.Open();
      using (SQLiteCommand cmd = conn.CreateCommand())
      {
        cmd.CommandTimeout = 180;
        cmd.CommandText = "SELECT [rowid],[src],[td],[tn],[ti],[tr],[tb] FROM [TT];";
        using (SQLiteDataReader reader = cmd.ExecuteReader())
        {
          while (reader.Read())
          {
            var row = new DRow
            {
              rowid = reader.GetInt64(0),
              src = reader.GetString(1),
              td = reader.GetDateTime(2),
              tn = reader.GetDecimal(3),
              ti = reader.GetInt64(4),
              tr = reader.GetDouble(5)
              //tb = (byte[])reader.GetValue(6)
            };
            list.Add(row);
          }
          reader.Close();
        }
      }
      conn.Close();
    }
    return list.Count;
  }

  private static DRow MakeRow(Random rnd)
  {
    int start = rnd.Next(0, 250);
    byte[] b = new byte[512];
    rnd.NextBytes(b);
    return new DRow
    {
      src = spear.Substring(start, 250),
      td = DateTime.Now,
      tn = rnd.Next(2999499,987654321),
      ti = rnd.Next(3939329,982537553),
      tr = rnd.NextDouble(),
      tb = b
    };
  }

  private const string spear = @"FROM fairest creatures we desire increase,That thereby beauty's rose might never die,But as the riper should by time decease,His tender heir might bear his memory:But thou, contracted to thine own bright eyes,Feed'st thy light'st flame with self-substantial fuel,Making a famine where abundance lies,Thyself thy foe, to thy sweet self too cruel.Thou that art now the world's fresh ornamentAnd only herald to the gaudy spring,Within thine own bud buriest thy contentAnd, tender churl, makest waste in niggarding.Pity the world, or else this glutton be,To eat the world's due, by the grave and thee.";
}

internal class DRow
{
  public long rowid { get; set; }
  public string src { get; set; }
  public DateTime td { get; set; }
  public decimal tn { get; set; }
  public long ti { get; set; }
  public double tr { get; set; }
  public byte[] tb { get; set; }
}

Upgrade to System.Data.SQLite 1.0.74.0 in Visual Studio 2010

I wrote about getting SQLite up and working in Visual Studio 2010 and your .NET 4.0 projects in June. Then I got distracted with work and didn’t have a chance to come back to my exploration of SQLite until today. So I decided to look to see if we have a new version. Yes, the new System.Data.SQLite install for 1.0.74.0 (SQLite version 3.7.7.1 which includes a few bug fixes) is ready to download and install.

Unfortunately, the designers are still not available, so if you want any designer, however buggy, in Visual Studio, you’re still stuck with my original path in my previous post. (See the readme once you finish the 1.0.74.0 install.)

After a variety of experiments with the new installer, I ended up taking the following steps that led to successfully running my budding test/prototype application. Follow these steps and you won’t go too wrong:

  1. Uninstall all SQLite related items from Control Panel | Program and Features
  2. Download and install my favorite free SQLite admin tool from http://osenxpsuite.net/?xp=3.
  3. Download the latest from http://system.data.sqlite.org/index.html/doc/trunk/www/downloads.wiki. In my case, it was the sqlite-netFx40-setup-bundle-x64-2010-1.0.74.0.exe.
  4. This will have removed some GAC items and NOT put 1.0.74.0 into GAC. At least on my machine. I could not get the assemblies into the GAC to save my life.
  5. Make sure your project references are pointed to the correct version and that you have Copy Local set to true.
  6. Change your config settings to remove useLegacyV2RuntimeActivationPolicy=true from the startup node like this:
      <startup>
        <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0" />
      </startup>
  7. Change your config settings in DbProviderFactories to remove strong name reference like this:
      <system.data>
        <DbProviderFactories>
          <remove invariant="System.Data.SQLite" />
          <add name="SQLite Data Provider"
              invariant="System.Data.SQLite"
              description=".Net Framework Data Provider for SQLite"
              type="System.Data.SQLite.SQLiteFactory, System.Data.SQLite" />
        </DbProviderFactories>
      </system.data>
  8. Test to taste.

SQLite on Visual Studio 2010 Setup Instructions

Obsolete

Go to the new SQLite on Visual Studio with NuGet post instead.


Be sure to read upgrade instructions I've just posted as a follow-up to this post.

I’m just starting to play with SQLite and the System.Data.SQLite library created by Robert Simpson and taken over by the SQLite.org folks. In attempting to get things working in Visual Studio 2010, I ran into a few issues, so this post is as much a reminder for myself as it is a help for other .NET developers who wish to use SQLite from within Visual Studio 2010.

The current binary installers from sqllite.org for the System.Data.SQLite .NET assembly do not include the Visual Studio designers or the ADO.NET driver required for use of SQLite in the Server Explorer. So here’s the winning combination I’ve found for getting my environment set up properly.

  1. Download and install version 1.0.66.0 from sourceforge.net because current binary installs on SQLite.org at system.data.sqlite.org do not include the Visual Studio designer or ADO.NET driver install.
     
  2. Download and install the latest versions (x86 and x64) from system.data.sqlite.org (currently 1.0.73.0 which contains the SQLite 3.7.6.3 engine).
    Important Note: If you have Visual Studio 2008 and 2010, be sure to choose both when prompted as I found reports from others who had problems otherwise.
     
  3. Copy the x86 binaries into and overwriting file at C:\Program Files (x86)\SQLite.NET\bin (assuming you used the default install location for version 1.0.66.0 and you're running x64 Windows). And if you are on an x64 box, copy the x64 binaries to the C:\Program Files (x86)\SQLite.NET\bin\x64 directory, overwriting existing files.

Now you can open Visual Studio 2010 and navigate to the Server Explorer, right-click the Data Connections node, choose Add Connection and click the Change button for Data source. You can then select the SQLite Database file source like this:

dsource

Click the new button and navigate to your desired directory and supply a file name and you’ll end up with something like this:

dconn

With a new data connection, you can use the table designer, but it has limitations and it’s probably not the best approach with an embedded database engine from a development perspective anyway, since generally you’re going to want your app to be able to create the database from script embedded in code.

So now all that’s left to get started writing code against your database is to add your references like this:

dref

And since Entity Framework supports SQLite, you can add your ADO.NET Entity Data Model to your project and it will produce a nice connection string like this for you (VB-like line _ breaks added):

<connectionStrings>
  <add name="testEntities"
    connectionString="metadata=res://*/Model1.csdl|res://*/Model1.ssdl| _
    res://*/Model1.msl;provider=System.Data.SQLite; _
    provider connection string=&quot;data source=C:\Code\Projects\test.db&quot;"
    providerName="System.Data.EntityClient" />
</connectionStrings>

or a non EF 4 connection string of

<connectionStrings>
  <add name="mytest"
    connectionString="data source=C:\Code\Projects\test.db;"
    providerName="System.Data.SQLite" />
</connectionStrings>

Now remember, model first is not currently supported, so you need to create your data and then update your model from data. You also need to make sure that you add the following to your config file or you’ll get a nasty runtime error:

<startup useLegacyV2RuntimeActivationPolicy="true">
  <supportedRuntime version="v4.0" />
</startup>

UPDATE (30 minutes later) Just a bit more testing reveals that I missed one step. Very important to know and use.

First, I reinstalled the x86 and then x64 latest installs and checked the checkboxes to install to the GAC and modify the path. Then I ran “test” from the command line and after modifying the connection string by giving the file a path like C:\temp\test.db, the tests ran fine. Looking at the chm help file and the test.exe.config revealed the trick I needed.

The 1.0.66.0 install adds a factory reference to your .NET machine.config file like this:

</DbProviderFactories>
  ...
  <add name="SQLite Data Provider"
    invariant="System.Data.SQLite"
    description=".Net Framework Data Provider for SQLite"
    type="System.Data.SQLite.SQLiteFactory,
      System.Data.SQLite,
      Version=1.0.66.0,
      Culture=neutral,
      PublicKeyToken=db937bc2d44ff139" />
</DbProviderFactories>

While this entry makes the Data Connection and designer possible, it will cause you problems when trying to run using the latest 1.0.73.0 referenced assemblies until you add this to your own config file:

<system.data>
  <DbProviderFactories>
    <remove invariant="System.Data.SQLite" />
    <add name="SQLite Data Provider"
      invariant="System.Data.SQLite"
      description=".Net Framework Data Provider for SQLite"
      type="System.Data.SQLite.SQLiteFactory, System.Data.SQLite" />
  </DbProviderFactories>
</system.data>

Next step: experiment with updating my machine.config in its many incarnations to point to the latest and greatest x64 version 1.0.73.0 with token db937bc2d44ff139 like this:

</DbProviderFactories>
  ...
  <add name="SQLite Data Provider"
    invariant="System.Data.SQLite"
    description=".Net Framework Data Provider for SQLite"
    type="System.Data.SQLite.SQLiteFactory,
    System.Data.SQLite,
    Version=1.0.73.0,
    Culture=neutral,
    PublicKeyToken=db937bc2d44ff139" />
</DbProviderFactories>

Note that this will move the machine.config reference to the MSIL (Any CPU) version rather than the x86 version. For my machine, this means making changes in (and remember to open as Administrator so you can save the file):

C:\Windows\Microsoft.NET\Framework\v2.0.50727\CONFIG\machine.config
C:\Windows\Microsoft.NET\Framework\v4.0.30319\Config\machine.config
C:\Windows\Microsoft.NET\Framework64\v2.0.50727\CONFIG\machine.config
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Config\machine.config

Rebooting, just to be paranoid.

And now, run the same test after commenting out the local app.config’s <system.data><DbProviderFactories> node and everything works as expected. No more bad image exception. And the Data Connection designer continues to operate as expected. Now on to some real coding.

Four Functions for Finding Fuzzy String Matches in C# Extensions

How similar are these two strings? Does string X sound like string Y? Could they be duplicates? Is the difference between string N and string M just a typo? There are many scenarios in enterprise software development where the answers to these questions can be highly significant.

In my search to answer such questions with code, the most helpful resource I’ve ever found was presented by George M. Brown on www.codeguru.com who implemented four well known and powerful fuzzy string matching algorithms in VBA for Access a few years ago. In an effort to convert these algorithms to C#, I found two alternatives that saved me some time (see below).

The four algorithms, with requisite Wikipedia links, are:

Each of the algorithms have been implemented here as extensions to the string and string array. Before we get to the code, let’s take a look at some results. Here are two very simplistic tests. The algorithms are not combined in any way. You can experiment with them and create your own secret sauce. 

Name Matching for (mispelled deliberately): "Jensn"
The first test result set presents the raw output of the algorithms on a mispelled surname (mine) against a list of other surnames. I’ve highlighted the best score.

Dice Coefficient for Jensn:
    .00000 against Adams
    .46154 against Benson
    .00000 against Geralds
    .37500 against Johannson
    .42857 against Johnson
    .76923 against Jensen
    .30769 against Jordon
    .30769 against Madsen
    .00000 against Stratford
    .14286 against Wilkins

Levenshtein Edit Distance for Jensn:
    4 against Adams
    2 against Benson
    5 against Geralds
    5 against Johannson
    3 against Johnson
    1 against Jensen
    4 against Jordon
    4 against Madsen
    8 against Stratford
    6 against Wilkins

Longest Common Subsequence for Jensn:
    .04000, s against Adams
    .33333, ensn against Benson
    .05714, es against Geralds
    .08889, jnsn against Johannson
    .17143, jnsn against Johnson
    .56667, jensn against Jensen
    .06667, jn against Jordon
    .13333, en against Madsen
    .02222, s against Stratford
    .11429, ns against Wilkins

Double Metaphone for Jensn: ANSN
    ATMS metaphone for Adams
    PNSN metaphone for Benson
    JRLT metaphone for Geralds
    AHNS metaphone for Johannson
    ANSN metaphone for Johnson
    ANSN metaphone for Jensen
    ARTN metaphone for Jordon
    MTSN metaphone for Madsen
    STTR metaphone for Stratford
    FLKN metaphone for Wilkins

Address Matching for "2130 South Fort Union Blvd."
The second test is the same code run against multiple addresses trying to match a given address. Let’s see how it turned out.

Dice Coefficient for 2130 South Fort Union Blvd.:
    .16000 against 2689 East Milkin Ave.
    .10000 against 85 Morrison
    .27273 against 2350 North Main
    .07843 against 567 West Center Street
    .66667 against 2130 Fort Union Boulevard
    .61538 against 2310 S. Ft. Union Blvd.
    .42553 against 98 West Fort Union
    .12245 against Rural Route 2 Box 29
    .05000 against PO Box 3487
    .04444 against 3 Harvard Square

Levenshtein Edit Distance for 2130 South Fort Union Blvd.:
    18 against 2689 East Milkin Ave.
    22 against 85 Morrison
    18 against 2350 North Main
    22 against 567 West Center Street
    11 against 2130 Fort Union Boulevard
    8 against 2310 S. Ft. Union Blvd.
    14 against 98 West Fort Union
    19 against Rural Route 2 Box 29
    22 against PO Box 3487
    22 against 3 Harvard Square

Longest Common Subsequence for 2130 South Fort Union Blvd.:
    .02116, 2 st in v. against 2689 East Milkin Ave.
    .02020,  son against 85 Morrison
    .04444, 230 oth in against 2350 North Main
    .01010,  st t  against 567 West Center Street
    .25481, 2130 fort union blvd against 2130 Fort Union Boulevard
   .25765, 230 s ft union blvd. against 2310 S. Ft. Union Blvd.
    .25514,  st fort union against 98 West Fort Union
    .02593,  out  o  against Rural Route 2 Box 29
    .01347, o o  against PO Box 3487
    .01389, 3 hrvd against 3 Harvard Square

Double Metaphone for 2130 South Fort Union Blvd.: STFR
    STML metaphone for 2689 East Milkin Ave.
    MRSN metaphone for 85 Morrison
    NRTM metaphone for 2350 North Main
    SSNT metaphone for 567 West Center Street
    FRTN metaphone for 2130 Fort Union Boulevard
    SFTN metaphone for 2310 S. Ft. Union Blvd.
    STFR metaphone for 98 West Fort Union
    RRLR metaphone for Rural Route 2 Box 29
    PPKS metaphone for PO Box 3487
    RFRT metaphone for 3 Harvard Square

As you can see, the double metaphone algorithm may not be as useful on its own as the other algorithms. But when you put them together in creative ways, you’ll get a very powerful result.

Here’s how easy the algorithms, as extension methods, are to use.

private static void NameMatching()
{
	//name matching
	string input = "Jensn";
	string[] surnames = new string[] { 
		"Adams",
		"Benson",
		"Geralds",
		"Johannson",
		"Johnson",
		"Jensen",
		"Jordon",
		"Madsen",
		"Stratford",
		"Wilkins"
		};

	Console.WriteLine("Dice Coefficient for Jensn:");
	foreach (var name in surnames)
	{
		double dice = input.DiceCoefficient(name);
		Console.WriteLine("\t{0} against {1}", 
			dice.ToString("###,###.00000"), name);
	}

	Console.WriteLine();
	Console.WriteLine("Levenshtein Edit Distance for Jensn:");
	foreach (var name in surnames)
	{
		int leven = input.LevenshteinDistance(name);
		Console.WriteLine("\t{0} against {1}", leven, name);
	}

	Console.WriteLine();
	Console.WriteLine("Longest Common Subsequence for Jensn:");
	foreach (var name in surnames)
	{
		var lcs = input.LongestCommonSubsequence(name);
		Console.WriteLine("\t{0}, {1} against {2}", 
			lcs.Item2.ToString("###,###.00000"), lcs.Item1, name);
	}

	Console.WriteLine();
	string mp = input.ToDoubleMetaphone();
	Console.WriteLine("Double Metaphone for Jensn: {0}", mp);
	foreach (var name in surnames)
	{
		string nameMp = name.ToDoubleMetaphone();
		Console.WriteLine("\t{0} metaphone for {1}", nameMp, name);
	}
}

Source References
In researching and finding these algorithms, I poured over what seemed like hundreds of articles, blogs and open source resources. In the end, I settled on three primary sources and made certain modifications to suite my own needs. Here they are.

Additional references:

I recommend you spend time with these sources and doing your own research. Undoubtedly you will come up with even better algorithms. When you do, please share them with us here.

Update: code now on GitHub