Conversation
…orest with an extra option to turn off resampling function. The original is designed to deal with imbalanced datasets.
|
Hi Ethan, Cheers, |
| "Defines how m, defined by mFeaturesPerTreeSize, is interpreted. M represents the total number of features.", | ||
| new String[]{"Specified m (integer value)", "sqrt(M)+1", "M-(sqrt(M)+1)", | ||
| "Percentage (M * (m / 100))"}, | ||
| new String[]{"SpecifiedM", "SqrtM1", "MSqrtM1", "Percentage"}, 1); |
There was a problem hiding this comment.
The default value should be 3 in this option
| "Percentage (M * (m / 100))"}, | ||
| new String[]{"SpecifiedM", "SqrtM1", "MSqrtM1", "Percentage"}, 3); | ||
| "The number of trees.", 10, 1, Integer.MAX_VALUE); |
There was a problem hiding this comment.
The default option should be 100 for the number of trees
| @Override | ||
| public String getPurposeString() { | ||
| return "Adaptive Random Forest algorithm for evolving data streams from Gomes et al."; | ||
| return "Adaptive Random Forest with Resampling --- beta version"; |
There was a problem hiding this comment.
Please revert the purpose string, there is not need to change it
| new String[]{"SpecifiedM", "SqrtM1", "MSqrtM1", "Percentage"}, 1); | ||
|
|
||
| public IntOption mFeaturesPerTreeSizeOption = new IntOption("mFeaturesPerTreeSize", 'm', | ||
| "Number of features allowed considered for each split. Negative values corresponds to M - m", 60, Integer.MIN_VALUE, Integer.MAX_VALUE); |
There was a problem hiding this comment.
similar to the other comments, leave these default values as they are
| "The lambda parameter for bagging.", 6.0, 1.0, Float.MAX_VALUE); | ||
| "The lambda parameter for bagging.", 6.0, 1.0, Float.MAX_VALUE); | ||
|
|
||
| public MultiChoiceOption streamBalanceOption = new MultiChoiceOption("streamBalanceOption", 'r', "The option of stream balance", new String[]{"Imbalance", "Balance"}, |
There was a problem hiding this comment.
This should be the only option added / changed for this PR
There was a problem hiding this comment.
The name would be better if it were classBalancingOption. The options that are displayed on the GUI should be None (No balance) and Balanced and not Imbalance and Balance
| private void init(int indexOriginal, ARFHoeffdingTree instantiatedClassifier, BasicClassificationPerformanceEvaluator evaluatorInstantiated, | ||
| long instancesSeen, boolean useBkgLearner, boolean useDriftDetector, ClassOption driftOption, ClassOption warningOption, boolean isBackgroundLearner) { | ||
| //Option stream/dataset balance or not. Resampling for imbalance, No resampling for balance dataset | ||
| public MultiChoiceOption resampling = streamBalanceOption; |
There was a problem hiding this comment.
Why not use the Option directly? It is not clear why the attribute resampling is needed
I see now why you did it. However, perhaps use a boolean attribute in ARFBaseLearner instead of a MultiChoiceOption. Also, instead of calling it resampling, what about calling balancing, which can be on (true) or off (false).
| @Override | ||
| public void trainOnInstanceImpl(Instance instance) { | ||
| ++this.instancesSeen; | ||
| if(this.ensemble == null) |
There was a problem hiding this comment.
avoid changes to the format of the code, such as adding spaces or extra lines
| this.evaluator.reset(); | ||
| } | ||
|
|
||
| private void populateCounterArray(Instance instnc) { |
There was a problem hiding this comment.
You can call it instance instead of instnc :)
…on as a FlagOption. Modified some default option values.
Comparing with the original AdaptiveRandomForest, New_AdaptiveRandomForest with an extra option to turn off resampling function. The original is designed to deal with imbalanced datasets.