Using Reinforcement Learning To Select An Optimal
Using Reinforcement Learning To Select An Optimal
56 2024 © Yassine Akhiat et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0)
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
Feature Selection
Based on Evaluation Method
ML algorithm ML algorithm
Feature evaluation
Performance evaluation Performance
Evaluation
Optimal subset
Optimal subset
Optimal subset
57
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
Feature Space
Feature space containes all
features (irrelevant, noisy,
redundant ,and relevant)
First, for each feature, they train different models Random forests are among the most popular
using different classi ication algorithms. Then, they machine learning algorithms [34]. Thanks to its
store them in a library of models. Second, they use performance, robustness, and interpretability, RF has
a selection with replacement technique [30] to ind proved the frequency of its bene icial applicability.
the optimal subset of models that, when averaged They can select informative variables [11]. RF per‐
together, achieves excellent performance. Another forms feature selection using mean decrease impu‐
wrapper method based on graph representation is rity and means decrease accuracy criteria [35]. Mean
proposed in [14], where the node degree is used as decrease impurity is used to measure the decrease
a criterion to select the best features subset among in the weighted impurity in trees by each feature.
the whole features space. This algorithm consists of Therefore, the features are ranked according to this
two phases: (1) Choosing features to be used in graph measure. Mean decrease accuracy is a measure of
construction. (2) Constructing a graph in which each the feature impact on model accuracy. The values
node corresponds to each feature, and each edge has of each feature are permuted irst. Then, we mea‐
a weight corresponding to the pairwise score among sure how this permutation decreases model accu‐
features connected by that edge. Finally, the best fea‐ racy. The informative features decrease the model
tures are the nodes with the highest degree. In [31], a accuracy signi icantly, while unimportant features
pairwise feature selection (FS‐P) has been introduced, do not.
features are evaluated in pairs using decision tree As opposed to the traditional feature selection (FS)
classi ier. First, it ranks features individually. Second, formalization and the inspiration generated from the
it involves the machine learning algorithm (Decision reinforcement learning approach, the feature selec‐
tree) to evaluate pairs of features. In [32, 33], a well‐ tion problem can be effortlessly handled with the prof‐
known wrapper approach is presented, Recursive Fea‐ itable reliability of our proposed system. The feature
ture Elimination using Random Forest (RFE). RFE per‐ space using our approach can be seen as a Markov
forms feature selection recursively. At the irst itera‐ decision process (MDP) [36, 37], where each subset
tion, the model (Random forest) is trained on whole of features is represented by a state (decision tree
set of attributes. After ranking features according to branch). Our system explores the state space while it
the model’s importance, the least important features exploits the gathered experiences so far using the pro‐
are eliminated. As iteration takes place, the consider‐ posed transition similarity measure (TSM). In [38], the
ing set of features become smaller and smaller until authors proposed a method based on reinforcement
the desired number of features is reached. learning (RL) for selecting the best subset. First, they
use an AOR (average of rewards) criterion to identify
the effectiveness of a given feature in different con‐
ditions. AOR is the average of the difference between
58
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
two consecutive states in several iterations. Second, ‐ Environment is the feature space through which
they introduce an optimal graph search to reduce the the system moves. It receives the system’s current
complexity of the problem. state and action as input; then, it returns the reward
The way our system traverses from one state and the next state of the system.
to another is handled using decision tree branches ‐ State is the current situation where the agent inds
to represent each state, as mentioned before. In its itself. In our context, this is the current node of the
totality, this technique is similar to the way RF cre‐ branch.
ates branch. The RF method creates multiple trees. As the reinforcement concepts are transparently
For each tree, only a random subset of input vari‐ tackled and highlighted, the following steps may
ables is used at each splitting node. Therefore, the unfold in depth with the constitutive mainstay or the
inal trees of RF are independent of each other, and technical infrastructure of our proposed algorithm.
they do not learn from the previously created trees. The feature selection system (agent) scrutinizes
On the other hand, our system can learn from prior the environment, and then starts with a single node
attempts. At each iteration, it explores new branches arbitrarily without any pre‐stockpiled knowledge
and exploits the assimilated knowledge to create (exploration phase), which branches into possible out‐
highly‐performative and qualitative ones in the subse‐ comes. Each of those outcomes leads to the next nodes
quent iteration. (action). To indicate how effective the chosen action
is, a difference between two consecutive states is pro‐
3. Feedback Feature Selection System duced. Since the depth is not yet reached, the system
This paper foregrounds to the brings a new fea‐ keeps adding one node at a time in order to create
ture selection system based on reinforcement learn‐ a branch. As iterations take place, the system assem‐
ing; the proposed system principally comprises three bles experiences and becomes able to take actions
parts. First, decision tree branches are used to tra‐ that maximize the overall reward R. As a yielded off‐
verse the search space (features space) to create new spring, branches of high quality are created. A tran‐
rules (branches or feature subsets) and select the best sition similarity measure is proposed to establish a
feature subset. Second, a transition similarity measure balanced equipoise between exploiting what has been
(TSM) is introduced to ensure that the system keeps learned so far to choose the next action that maximizes
exploring the state space by creating new branches rewards, and continuously exploring the feature space
and exploiting what it has learned so far to circum‐ to achieve long‐term bene its. The way we construct
vent the problematic implications or the drawbacks of the branch is the same as the decision tree (c4.5), the
redundancy. Finally, the relevant features are the most difference is when we add a node to the branch, we
involved ones in constructing the branches of high retain only the best branch with the highest thresh‐
quality. For further illustrative explications, the sub‐ old. The following steps give more precise information
sequent section will accessibly resurface the general about creating a branch.
framework of reinforcement learning and delineate
3.2. Steps to Create a DT Branch
the know‐how dimensions in which our system can
synthesize the bene its of this powerful approach. We start with a random feature as the root of the
branch. As long as the branch did not reach the desired
3.1. Reinforcement Learning Problem depth or min sample leaf yet, the system keeps adding
RL is the most active and fast‐developing area in to the branch one node at a time. The added node is the
machine learning and is one of three basic machine one we obtained using the feature and its threshold
learning approaches, alongside supervised learning that produces the highest AUC score (Area Under the
and unsupervised learning. RL consists of the fol‐ Curve ROC). The idea behind using depth and min
lowing concepts: Agent, environment, actions, and simple leaf parameters as stopping criteria is to avoid
reward. The agent takes action A and interacts with as much as possible the over‐ itting problem. The most
an environment to maximize the total reward received common stopping method is min sample leaf, which is
R. At iteration t, the agent observes state St from the the minimum number of samples assigned to each leaf
environment. In return, the agent gets a reward Rt. node. If the number is less than a given value, then no
The agent takes action 𝐴𝑡 . In response, the environ‐ further split can be done, and the node is considered a
ment provides the next state 𝑆𝑡+1 and reward; the inal leaf node. Besides, the depth of the branch is very
process continues until the agent will be able to take useful in controlling over‐ itting because the deeper
the right actions that maximize the total reward. The the branch is, the more information captured by the
agent must balance between exploiting what has been data and more the splits it has, which leads to predict
learned so far and continuously exploring the envi‐ well on the training data. However, it fails to generalize
ronment to gather more information that may help in on the unseen data.
maximizing the total reward.
3.3. Reward Function
‐ Agent: An agent takes actions. In our case, the agent
is the proposed feature selection system. A reward function R [38] is used to calculate the
score at each level of the branch by computing the
‐ Actions is the ensemble of all possible moves the difference between the score of the current branch
agent can make, for our system, the actions are the and its score after a new node is added (DS). The DS
nodes that may be used to create a branch. indicates how useful the newly added feature is.
59
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
Observation Action A t
Environment
Ot
Reward Rt
Agent
This function is de ined as follows: Therefore, the system keeps learning the same
rules and branches. This means that the system will
(𝐴𝑈𝐶𝑛𝑒𝑥𝑡 − 𝐴𝑈𝐶𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ) × log (‖𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ‖) be expensive in terms of execution time, while the
(1) system should be less resources consuming (run time
Where 𝐴𝑈𝐶𝑛𝑒𝑥𝑡 and 𝐴𝑈𝐶𝑐𝑢𝑟𝑟𝑒𝑛𝑡 is the score of the and storage requirement), and the branches should be
current branch and the score after adding a new node, strong and diverse.
𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 is the length of samples used to split an The similarity between two transitions is com‐
internal node. puted by the following formula:
3.4. Transition Similarity Measure |𝑆1 ∩ 𝑆2|
Definition (Transition) 𝑇𝑆𝑀 = (2)
‖𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ‖
A transition is the process in which something
changes from one state to another. In our system, the Where ‖𝑆1 ∩ 𝑆2‖ is the number of shared samples
transition is the link between two successive nodes of between two transitions.
the same branch. 3.5. The Proposed FS method
Transition Similarity Measure Since the proposed algorithm is iterative, the num‐
We proposed a transition similarity measure ber of iteration N is given as the input. The reward
(TSM) to ensure that our system keeps exploring the function is set to zero at the beginning. Our system
state space, learning new rules, and preventing the starts with an empty set F, and at each iteration, the
redundant branches. For each branch, we stock all system creates a new branch and adds it to F. If the
transitions with the corresponding samples used to next subset (branch) is already experienced by the
split each internal node. Since the algorithm is iter‐ system (seen by the system), the system uses this
ative, different branches may share the same transi‐ gathered experiences in the upcoming iterations. Oth‐
tions, which is not a problem. In the case when the erwise, the system keeps exploring new rules, new
majority of the samples (higher than a given thresh‐ patterns, and new branches.
old) are equally used by those transitions of different 3.6. Starting Example
branches, those two transitions are deemed similar,
To explain the proposed algorithm further, we sug‐
which is a huge problem. Allowing similar transitions
gest the following example. We suppose that we have
to be in different branches can lead to constructing
a dataset of 10 features. The igure bellow (Fig. 4)
redundant and useless branches.
contains the whole space of features. The purpose is
60
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
(d)
3 5 7 4
1 5
10 3 10 9
10
10 10 3 1
5
7
6 5 2
to select the best subset of features using the proposed 2) The second iteration 4(c): As we can see in the sec‐
system. ond iteration, the transition (3 ← 9) appeared for
1) First iteration 4(b): The system traverses the fea‐ the second time. Here the TSM (transition similar‐
tures space and creates the irst branch without ity measure) should be involved. If two transitions
any prior knowledge. At each level of the branch, of different branches are similar (nodes with green
the system stores the AUC score using the reward color), the system should not allow them to be in
function R. Moreover, it stores each transition (2 ← the next branches (the current branch included).
3, 3 ← 9, 9 ← 5, 5 ← 6) and its corresponding The system has to explore the state’s environment
subset of samples. to ind new rules to prevent the redundancy in
creating branches.
3) The N iteration 4(d): After N iterations, the system
is capable of identifying the best branches using
the gathered experiences during each iteration.
The top ranked branches constructed using the
system are the illustrated in the sub igure 4(d).
61
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
From the above Figure 4, it is clear that the 4.1. Benchmarking Datasets
top subset of features is [3, 5, 10], because those
In this paper, nine binary classi ication datasets
features are involved the most in creating the best
have been employed in different experimental design
branches.
aiming to evaluate the performance of the proposed
feature selection method.
4. Experimental Results and Discussion The datasets are chosen to be different in terms of
class distribution (balanced or imbalanced), linearity,
This experimental section attests to the ef iciency
dataset shift, number of instances and variables. The
of the proposed feedback feature selection system
datasets, which are publicly available, are collected
(FBS) in selecting the best features. Two benchmarks
and downloaded from UCI repository and kaggle plat‐
have been conducted, and then the pro itable service‐
form [39]. An overview of the main characteristics of
ability of our system is appraised by comparing it
each dataset is illustratively tabulated in Table 1.
with two feature selection algorithms. The irst one
is the popular wrapper algorithm named Recursive
4.2. Experiments Settings
Feature Elimination RFE (RFE‐RF). The second one is
the pairwise feature selection algorithm (FS‐P), which Two experimental endeavors are undertaken to
is recently proposed and proved its effectiveness in estimate the workable prospects and the consequen‐
identifying the best features [31]. tial rami ications of our proposed system. Initially, we
62
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
(a) Dataset: credit card (b) Dataset: sonar (c) Dataset: spambase
Figure 6. Performance of our system compared with the selected features selected by pairwise method on nine
benchmark datasets
will empirically embody the applications of the pro‐ 4.3. Feedback System Parameters
posed algorithms based on the datasets displayed in The Feedback system parameters incorporate a
Table 1 in terms of Area Under the Roc Curve (AUC) systematic trilogy of changeable parameters which
where FBS is compared with the pairwise method, are in a dynamic alteration in accordance with each
namely FS‐P and with RFE. dataset.
In correlative parallelism with the previous step, ‐ S is the similarity value.
the subsequent stage will demonstrate the eligible ‐ D is concerned with the indication of the branches’
capability of the FBS system in encircling the practical depth.
subset as swiftly as possible through the exclusive
‐ N re lects the number of iterations.
employment of the few features supplemented by sec‐
To exemplify the probable changeability of these
ond benchmarking.
parameters. Datasets with large size, the N and D
All datasets are segmented into two subsets; values should be higher since the best branches, in
one subset is employed for training and testing the this case, should be deeper. The following table sup‐
branches using cross‐validation with 3‐folds while the plements a panoramic overview underlying the best
other subset is quarantined and cast aside (holdout parameters used for each dataset.
set) and the performance of the inal selected feature As clearly articulated in the aforementioned sec‐
subset is evaluated on it. For the sake of a fair com‐ tion, the choice of parameters is indispensable. The
parison, the inal selected subset using FBS, FS‐P, and following graph delineates the in luence of the depth
RFE is evaluated using a Random Forest with a grid parameter (D) on the quality of the constructed
search strategy for the hyper‐parameters. The AUC branches using the sonar dataset. This graphic plot
score is calculated using the out of bag (OOB) score of displays a summative snapshot of the train and the test
the random forest classi ier. Since the benchmarking AUC scores after the gradually exponential variation of
datasets used in this paper to evaluate the proposed D parameter from 1 to 15 is ful illed.
system are unbalanced, the AUC metric is considered The recorded outcomes on the sonar dataset show
the best choice. Moreover, the AUC metric generally clearly that the branches prone to over‐ it for large
can be viewed as a better measure than accuracy [40]. depth values because the branches perfectly predict
63
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
all of the train data (the blue line). However, they fail (𝐹𝐵𝑆10 , 𝐹𝑆 − 𝑃10 , 𝑅𝐹𝐸10 ) and subset of 15 (𝐹𝐵𝑆15 ,
to generalize on unseen data (the red line). As can be 𝐹𝑆 − 𝑃15 , 𝑅𝐹𝐸15 ).
visually observed, the best depth for the sonar dataset
itself equals three (D=3). 4.5. Results and Discussion
After selecting the feature subset, the same clas‐
4.4. Conducted Experiments si ier (RF) is essentially mandatory to calculate the
AUC score. The Random forest is utilized to determine
The proposed method is compared to the RFE and
the test performance for the top‐ranked features of
FS‐P approachs in terms of prediction AUC score. In
each employed dataset. The comparative juxtaposi‐
this manuscript, two empirically conclusive and thor‐
tion between FBS, FS‐P and RFE is accessibly repre‐
oughgoing experiments are conducted.
sented in Figure 6 (First experiment).
1) First Experiments: To evaluate our proposed
As stated, our feature selection algorithm FBS
approach FBS, we compare the obtained perfor‐
exceeds and outstrips FSP and RFE considerably in
mance (in terms of AUC score) by FBS with the
almost all datasets, such as SPECT (Figure 6(f)), credit
wrapper method (Recursive feature elimination
card (Figure 6(a)), ionosphere (Figure 6(i)), musk
with random forest RFE) and with the pairwise
(Figure 6(e)), caravan (Figure 6(g)), and sonar (Fig‐
algorithm FS‐P.
ure 6(b)), except for spambase dataset ( 6(c)).
2) Second Experiments: This experiment is con‐ For the numerai dataset (Figure 6(h)), our method
ducted to show the ability of the proposed system has a restrictively limited, if not downgraded perfor‐
FBS in achieving the maximum performance using mance at the beginning compared to RFE and FS‐P. As
just a few features. For a fair comparison between our method does not select just the best‐ranked fea‐
FBS, FS‐P, and RFE, we ix the generated subset size ture as a starting point to prevent selecting a subop‐
for all algorithms compared as follows: subset of timal subset but also attempt to maximize the overall
size 5 (𝐹𝐵𝑆5 , 𝐹𝑆 − 𝑃5 , 𝑅𝐹𝐸5 ), a subset of size 10 performance of the selected subset taking into account
AUC score using the top -5 first features AUC score using the top-10 first features
0,98
0,89
0,97
0,87
0,96
0,85
0,83
0,95
0,81
0,94
0,79
0,93
0,77
0,92
0,75
0,91 0,73
Ionosphere Spambase Musk Sonar Eye Credit card
FBS 5 FS-P5 RFE5 FBS10 FS-P10 RFE10
(a) Dataset: ionosphere, Spambase and Musk datasets (b) Dataset: sonar, Eye and Credit card datasets
0,72
0,67
0,62
0,57
0,52
Caravan SPECT Numerai
FBS15 FS-P15 RFE15
Figure 7. The performance of FBS, RFE and FS‐P using feature subsets of 5, 10, and 15 features
64
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
65
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024
and bioinformatics, vol. 13, no. 5, pp. 971–989, [29] L. C. Molina, L. Belanche, and À. Nebot, “Feature
2015. selection algorithms: A survey and experimental
evaluation,” in 2002 IEEE International Confer-
[16] L. A. Belanche and F. F. González, “Review and
ence on Data Mining, 2002. Proceedings. IEEE,
evaluation of feature selection algorithms in syn‐
2002, pp. 306–313.
thetic problems,” arXiv preprint arXiv:1101.2320,
2011. [30] R. Caruana, A. Niculescu‐Mizil, G. Crew, and
A. Ksikes, “Ensemble selection from libraries of
[17] G. Chandrashekar and F. Sahin, “A survey on fea‐
models,” in Proceedings of the twenty- irst inter-
ture selection methods,” Computers & Electrical
national conference on Machine learning, 2004,
Engineering, vol. 40, no. 1, pp. 16–28, 2014.
p. 18.
[18] B. Nithya and V. Ilango, “Evaluation of machine [31] A. Yassine, C. Mohamed, and A. Zinedine, “Fea‐
learning based optimized feature selection ture selection based on pairwise evalution,” in
approaches and classi ication methods for 2017 Intelligent Systems and Computer Vision
cervical cancer prediction,” SN Applied Sciences, (ISCV). IEEE, 2017, pp. 1–6.
vol. 1, no. 6, pp. 1–16, 2019.
[32] B. Gregorutti, B. Michel, and P. Saint‐Pierre,
[19] A. Bommert, X. Sun, B. Bischl, J. Rahnenführer, “Correlation and variable importance in random
and M. Lang, “Benchmark for ilter methods for forests,” Statistics and Computing, vol. 27, no. 3,
feature selection in high‐dimensional classi ica‐ pp. 659–678, 2017.
tion data,” Computational Statistics & Data Anal-
[33] J. Kacprzyk, J. W. Owsinski, and D. A. Viattchenin,
ysis, vol. 143, p. 106839, 2020.
“A new heuristic possibilistic clustering algo‐
[20] Y. Akhiat, M. Chahhou, and A. Zinedine, “Ensem‐ rithm for feature selection,” Journal of Automa-
ble feature selection algorithm,” International tion Mobile Robotics and Intelligent Systems,
Journal of Intelligent Systems and Applications, vol. 8, 2014.
vol. 11, no. 1, p. 24, 2019.
[34] L. Breiman, “Random forests,” Machine learning,
[21] L. Čehovin and Z. Bosnić, “Empirical evaluation vol. 45, no. 1, pp. 5–32, 2001.
of feature selection methods in classi ication,” [35] H. Han, X. Guo, and H. Yu, “Variable selec‐
Intelligent data analysis, vol. 14, no. 3, pp. 265– tion using mean decrease accuracy and mean
281, 2010. decrease gini based on random forest,” in 2016
[22] Y. Asnaoui, Y. Akhiat, and A. Zinedine, “Fea‐ 7th ieee international conference on software
ture selection based on attributes clustering,” in engineering and service science (icsess). IEEE,
2021 Fifth International Conference On Intelligent 2016, pp. 219–224.
Computing in Data Sciences (ICDS). IEEE, 2021, [36] R. Sutton and A. Barto, “Reinforcement learn‐
pp. 1–5. ing: An introduction. 2017. ucl,” Computer Sci-
[23] Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature ence Department, Reinforcement Learning Lec-
selection: A review and comparative study,” in tures, 2018.
E3S Web of Conferences, vol. 351. EDP Sciences, [37] Y. Fenjiro and H. Benbrahim, “Deep reinforce‐
2022, p. 01046. ment learning overview of the state of the art.”
[24] A. Destrero, S. Mosci, C. D. Mol, A. Verri, Journal of Automation, Mobile Robotics and Intel-
and F. Odone, “Feature selection for high‐ ligent Systems, pp. 20–39, 2018.
dimensional data,” Computational Management [38] S. M. H. Fard, A. Hamzeh, and S. Hashemi, “Using
Science, vol. 6, pp. 25–40, 2009. reinforcement learning to ind an optimal set of
features,” Computers & Mathematics with Appli-
[25] V. Fonti and E. Belitser, “Feature selection using
cations, vol. 66, no. 10, pp. 1892–1904, 2013.
lasso,” VU Amsterdam Research Paper in Business
Analytics, vol. 30, pp. 1–25, 2017. [39] M. Lichman, “Uci machine learning repository
[http://archive. ics. uci. edu/ml]. irvine, ca: Uni‐
[26] I. Guyon and A. Elisseeff, “An introduction to vari‐ versity of california, school of information and
able and feature selection,” Journal of machine computer science,” URL: http://archive. ics. uci.
learning research, vol. 3, no. Mar, pp. 1157–1182, edu/ml, 2013.
2003.
[40] F. F. Provost, “T., and kohavi, r. the case against
[27] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, accuracy estimation for comparing classi iers,”
and J. Saeed, “A comprehensive review of dimen‐ in Proceedings of the Fifteenth International Con-
sionality reduction techniques for feature selec‐ ference on Machine Learning, 1998.
tion and feature extraction,” Journal of Applied
Science and Technology Trends, vol. 1, no. 2, pp.
56–70, 2020.
[28] J. Miao and L. Niu, “A survey on feature selection,”
Procedia Computer Science, vol. 91, pp. 919–926,
2016.
66