Keyword Extraction in C# with Word Co-occurrence Algorithm

by Tyler Jensen 14. March 2010 22:56

A few years ago I worked on a project called Atrax which among other things included an implementation of the work of Yutaka Matsuo of the National Institute of Advanced Industrial Science and Technology in Tokyo and by Mitsuru Ishizuka of the University of Tokyo.

I decided to revisit the keyword extraction algorithm and update it a bit and isolate it from the overall Atrax code to make it easier for anyone to use. You can download the code Keyword.zip (17.87 KB).

Here are the top ten keywords the code returns from the Gettysburg Address and from Scot Gu’s most recent blog post:

gettys
   dedicated
   nation
   great
   gave
   dead
   rather
   people
   devotion
   people people
   lives

gu
   ASP.NET MVC
   VS 2010 and Visual Web Developer
   ASP.NET 3.5
   ASP.NET
   MVC
   Web Developer 2008 Express
   VS 2010
   VS 2008
   release
   Improved Visual Studio

Let me know if you end up using the implementation of the algorithm and if you happen to make improvements to it.

Tags:

Code

blog comments powered by Disqus

Me...

Tyler Jensen

Tyler Jensen
.NET Developer and Architect

Month List

Other Stuff