Let me know you are alive!

Coordinator
Mar 27, 2009 at 11:19 PM
Edited Jun 11, 2009 at 6:19 AM

Please let me know if you tried, liked or disliked this project.

Jun 11, 2009 at 2:03 AM

Your algorithm is pretty fast and very useful.

I tried and had some issue here

In my database I had a region called Docklands, Dock Five , Rocklyn, Mocland, Rocklands, Bucklands

I tried to type a misspelled word like

docklansd ->result is empty - I expected it will show me docklands instead

dockland -> result are Docklands,Rocklyn, Mocland, Rocklands, Bucklands - I expected it will show me only Docklands not the others.

Code i used in the program

string suburb = "dockladsn";

List<string> results = new List<string>();

foreach(string suburbName in SuburbList) {

if(Utilities.DistanceBetweenString(suburb, suburbName, 3) <= 2.0F)

  results.Add(suburb);

}

where DistanceBetweenString is method that using your algorithm.

Could you please show me what i've done wrong and how to fix it.

Thanks

Coordinator
Jun 11, 2009 at 6:19 AM
Edited Jun 11, 2009 at 6:20 AM

First of all, Sift is case sensitive, that means that DockLands and docklands are at a distance of 1 since capital D is replaced by lowercase d. That means that Docklands and docklansd are at a distance of 3 (both in StringSift3.Distance and in StringMetrics.LevensteinDistance). The other words might look different, but they are only two letters apart, the first and some other in the middle.

The second problem is that you have used Distance. The distance between XYZ and ABC is 3. The distance between 123456789XYZ and 123456789ABC is also 3. But the latter are more similar than the former, isn't it? So you might be better off with the Similarity method that gives you the percentual distance between two strings.

Not to ignore is that parameter 3 you gave to DistanceBetweenStrings. Is that the parameter you would give to StringMetrics.FastDistance or is it the one you give to the StringSift3 constructor?