Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@f-j-r
Copy link

@f-j-r f-j-r commented Jul 28, 2021

Fixed FSelector filter bug: Chi Squared was calculated instead of Information Gain due to wrong filter name.

Code to reproduce / investigate the problem the long way:

Gain ratio is different for FSelector and FSelectorRcpp:

generateFilterValuesData(
  makeClassifTask(data = iris, target = "Species"),
  method = c(
    "FSelector_gain.ratio",
    "FSelectorRcpp_gain.ratio"
    )
)
FilterValues:
Task: iris
           name    type                   filter     value
1:  Petal.Width numeric     FSelector_gain.ratio 0.9432359
2: Petal.Length numeric     FSelector_gain.ratio 0.9346311
3: Sepal.Length numeric     FSelector_gain.ratio 0.6288067
4:  Sepal.Width numeric     FSelector_gain.ratio 0.4922162
5:  Petal.Width numeric FSelectorRcpp_gain.ratio 0.8713692
6: Petal.Length numeric FSelectorRcpp_gain.ratio 0.8584937
7: Sepal.Length numeric FSelectorRcpp_gain.ratio 0.4196464
8:  Sepal.Width numeric FSelectorRcpp_gain.ratio 0.2472972

Docs look like it should be the same:

Indeed Gain Ratio is the same using base packages:

FSelector::gain.ratio(formula = Species~., data = iris)
FSelectorRcpp::information_gain(formula = Species~., data = iris, type = "gainratio")
            attr_importance
Sepal.Length       0.4196464
Sepal.Width        0.2472972
Petal.Length       0.8584937
Petal.Width        0.8713692

    attributes importance
1 Sepal.Length  0.4196464
2  Sepal.Width  0.2472972
3 Petal.Length  0.8584937
4  Petal.Width  0.8713692

mlr Docs hint that there is maybe a problem with the FSelector Chi Squared call, as it has the same description as the Info Gain:
listFilterMethods() (or see filters docs)

Indeed the results are the same:

generateFilterValuesData(
  makeClassifTask(data = iris, target = "Species"),
  method = c(
    "FSelector_gain.ratio",
    "FSelector_chi.squared"
  )
)
FilterValues:
Task: iris
           name    type                filter     value
1:  Petal.Width numeric  FSelector_gain.ratio 0.9432359
2: Petal.Length numeric  FSelector_gain.ratio 0.9346311
3: Sepal.Length numeric  FSelector_gain.ratio 0.6288067
4:  Sepal.Width numeric  FSelector_gain.ratio 0.4922162
5:  Petal.Width numeric FSelector_chi.squared 0.9432359
6: Petal.Length numeric FSelector_chi.squared 0.9346311
7: Sepal.Length numeric FSelector_chi.squared 0.6288067
8:  Sepal.Width numeric FSelector_chi.squared 0.4922162

After the fix it seems to work correctly now (Info Gain is the same for FSelector and FSelectorRcpp):

FilterValues:
Task: iris
            name    type                   filter     value
 1:  Petal.Width numeric FSelectorRcpp_gain.ratio 0.8713692
 2: Petal.Length numeric FSelectorRcpp_gain.ratio 0.8584937
 3: Sepal.Length numeric FSelectorRcpp_gain.ratio 0.4196464
 4:  Sepal.Width numeric FSelectorRcpp_gain.ratio 0.2472972
 5:  Petal.Width numeric     FSelector_gain.ratio 0.8713692
 6: Petal.Length numeric     FSelector_gain.ratio 0.8584937
 7: Sepal.Length numeric     FSelector_gain.ratio 0.4196464
 8:  Sepal.Width numeric     FSelector_gain.ratio 0.2472972
 9:  Petal.Width numeric    FSelector_chi.squared 0.9432359
10: Petal.Length numeric    FSelector_chi.squared 0.9346311
11: Sepal.Length numeric    FSelector_chi.squared 0.6288067
12:  Sepal.Width numeric    FSelector_chi.squared 0.4922162

@f-j-r
Copy link
Author

f-j-r commented Jul 28, 2021

As requested I would like to request a review by @pat-s @larskotthoff @mllg @berndbischl (not sure if I should link only one or all). Thank you.

@larskotthoff
Copy link
Member

Thanks for this -- yes, this is simply a typo.

Copy link
Member

@pat-s pat-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Ugly bug - might be worth getting out a patch release soon.

@pat-s pat-s merged commit 4b3430d into mlr-org:main Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants