Removed fuzzywuzzy dependency #497

Chipe1 · 2017-04-14T11:12:54Z

I replaced the function fuzz.ratio() with an implementation similar to mean_boolean_error().

BesanHalwa · 2017-04-14T15:10:08Z

There is a typo in line 618.
def fitness_ration(str_1, str_2): but it should be def fitness_ratio(str_1, str_2): n

Chipe1 · 2017-04-14T16:21:24Z

@Agent-Pandit Could you add few basic testcases(test if algo gives a valid output without errors) for GA. I'm not sure my implementation does what fuzz.ratio() does

antmarakis · 2017-04-14T18:06:45Z

I took a look at the code, and I think I'm missing something. The fitness function in the genetic algorithm measures how good an individual is. The fuzzywuzzy function does not do that. From my understanding, fuzzywuzzy (what a beautiful name) calculates some sort of Edit/Levenshtein distance. It is a similarity function and does not indicate which individual is better, since we don't know the solution to compare it to.

Also, I have to say that the implementation of the genetic algorithm we have right now does not follow the pseudocode. It doesn't follow the same structure, nor does it work the same way. For one, the fitness function is given as input and is not a set method, like in the implementation.

Personally I feel the implementation needs to be rewritten, or even have #480 mostly reverted, since it veers far away from the pseudocode in the book.

@Agent-Pandit, is there something I'm missing? Do you have examples of usage or tests we can take a look at to see the implementation in use?

BesanHalwa · 2017-04-14T18:47:13Z

Errors and GA are very closely related. Throughout the process (in GA) we try to minimise the error (or maximise the fitness). There may be cases in which one may not get the exact solution but only nearly correct solution (which also might differ each time). Thus, I believe adding a test case may not be very useful.

As far as removing the dependency on fuzzywuzzy is considered, I believe it is a good idea. But I have some concerns…

After correcting the typo this works well but it is not a very general approach and might fail under some cases.
I compared fuzz.ratio and fitness_ratio(), here is the result
(str1, str2) |fitness value from fuzz.ratio(str1, str2) |fitness value from fitness_ratio(str_1, str_2) | Result of fitness_ratio
(Eshan, Eshan) | 100 | 100 | acceptable
(Eshak, Eshan) | 80 | 80 | acceptable
(Eshak1, Eshan) | 73 | 66.66666666666667 | acceptable*
(Eshak11, Eshan) | 67 | 57.142857142857146 | acceptable*
(Eshan, EshanABCD) | 71 | 100 | not acceptable
In the last case we see that fitness_ratio function fails.

The particular approach (used in this PR) works fine as far as the implementation of GA is considered in search.py because there the length of the strings are same. This might seem to produces the correct result (as expected from GA) but it is not the correct way to do it. (This is again a personal point of view)

Difference in fitness value is another issue, and I don’t understand the reason. However it is not a major concern. As long as we get the extreme values (0 and 100) accurate, the mid range does not matter much because we make a relative comparison.

@Chipe1 If you address the issue it will be great. Till then it believe that using an external module is not that bad idea. It makes the approach more general and increases the ease to understand.

antmarakis · 2017-04-14T19:20:02Z

@Agent-Pandit: I'm not saying errors and genetic algorithms aren't connected, I'm just saying your implementation only works for this one case, and in general GA does not work the way your implementation does.

In GAs, we don't know the solution and we want to approximate it to an adequate score (or get it spot on). Your implementation takes the solution as input (in_str), and approximates that. It is not the same thing. For fitness you are comparing an individual with the given solution.

Take for example graph coloring. We don't know the solution beforehand. We do know though how a solution should look. Namely, edges should not connect nodes of the same color. So, for a fitness function we can count how many acceptable edges an individual has. The more the merrier. We keep working this way until we find the solution, or a very good approximation.

Your implementation, unfortunately, cannot solve problems other than string matching (and for those, you already know the solution/target). It is not bad as an introduction to how the algorithm works, but it is just a toy example.

Later tonight I will try and get a complete implementation with example running, so I can better showcase what I'm talking about.

antmarakis · 2017-04-14T23:51:19Z

I have made the PR in #501.

Chipe1 · 2017-04-15T04:51:43Z

#501 fixes the error and algorithm

Removed fuzzywuzzy dependency

116db6d

Chipe1 mentioned this pull request Apr 14, 2017

Build Fail due to dependency #498

Closed

antmarakis mentioned this pull request Apr 14, 2017

Fix build #500

Closed

Chipe1 closed this Apr 15, 2017

Chipe1 deleted the fuzz branch April 15, 2017 04:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Removed fuzzywuzzy dependency #497

Removed fuzzywuzzy dependency #497

Uh oh!

Chipe1 commented Apr 14, 2017

Uh oh!

BesanHalwa commented Apr 14, 2017

Uh oh!

Chipe1 commented Apr 14, 2017 •

edited

Loading

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

BesanHalwa commented Apr 14, 2017

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

Chipe1 commented Apr 15, 2017

Uh oh!

Uh oh!

Removed fuzzywuzzy dependency #497

Removed fuzzywuzzy dependency #497

Uh oh!

Conversation

Chipe1 commented Apr 14, 2017

Uh oh!

BesanHalwa commented Apr 14, 2017

Uh oh!

Chipe1 commented Apr 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

BesanHalwa commented Apr 14, 2017

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

antmarakis commented Apr 14, 2017

Uh oh!

Chipe1 commented Apr 15, 2017

Uh oh!

Uh oh!

Chipe1 commented Apr 14, 2017 •

edited

Loading