Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views11 pages

Using Reinforcement Learning To Select An Optimal

This paper presents a novel feature selection method using reinforcement learning to identify relevant features while eliminating irrelevant ones. The proposed system utilizes decision tree branches to traverse the feature space and employs a transition similarity measure to balance exploration and exploitation, ultimately improving model interpretability and performance. The effectiveness of the method is validated through experiments on nine benchmark datasets, demonstrating its capability to enhance accuracy and reduce dimensionality.

Uploaded by

Prajit B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Using Reinforcement Learning To Select An Optimal

This paper presents a novel feature selection method using reinforcement learning to identify relevant features while eliminating irrelevant ones. The proposed system utilizes decision tree branches to traverse the feature space and employs a transition similarity measure to balance exploration and exploitation, ultimately improving model interpretability and performance. The effectiveness of the method is validated through experiments on nine benchmark datasets, demonstrating its capability to enhance accuracy and reduce dimensionality.

Uploaded by

Prajit B
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

VOLUME 18, N∘ 1 2024

Journal of Automation, Mobile Robotics and Intelligent Systems

USING REINFORCEMENT LEARNING TO SELECT AN OPTIMAL FEATURE SET


Submitted: 23rd February 2022; accepted: 25th January 2023

Yassine Akhiat, Ahmed Zinedine, Mohamed Chahhou

DOI: 10.14313/JAMRIS/1‐2024/6 [7–9, 9, 10]. In contrast to feature extraction, feature


Abstract: selection is the process of identifying the relevant fea‐
Feature Selection (FS) is an essential research topic in tures and removing the irrelevant and redundant ones
the area of machine learning. FS, which is the process with the objective of obtaining the best performing
of identifying the relevant features and removing the subset of original features without any transformation
irrelevant and redundant ones, is meant to deal with high [11–13]. Thus, the constructed learning models using
dimensionality problems to select the best performing the selected subset of features are more interpretable
feature subset. In the literature, many feature selection and readable. This gives preference to the reliable
techniques approach the task as a research problem, applicability of feature selection as an effective alter‐
where each state in the search space is a possible feature native prioritized over feature extraction in many real‐
subset. In this paper, we introduce a new feature selec‐ world datasets. The major reasons for applying the
tion method based on reinforcement learning. First, deci‐ feature selection are the following:
sion tree branches are used to traverse the search space. ‐ Making models easier to interpret.
Second, a transition similarity measure is proposed so as ‐ Reducing resources requirement (shorter training
to ensure exploit‐explore trade‐off. Finally, the informa‐ time, small storage capacity etc.).
tive features are the most involved ones in constructing ‐ Avoiding the curse of dimensionality.
the best branches. The performance of the proposed
approaches is evaluated on nine standard benchmark ‐ Avoiding the over‐ itting problem, thus, a better
datasets. The results using the AUC score show the effec‐ model.
tiveness of the proposed system. ‐ Improving accuracy: less noise in data means
improved modeling accuracy.
Keywords: Feature selection, Data mining, Decision tree,
In general, feature selection algorithms are cate‐
Reinforcement learning, Dimensionality reduction
gorized into: Supervised, Semi‐supervised and Unsu‐
pervised feature selection [12, 14–18]. In this paper,
1. Introduction we put more emphasis on supervised feature selec‐
With the advent of high‐dimensional data, typ‐ tion, which is a threefold approach, Filter, Wrapper
ically many features are irrelevant, redundant and [19–23], and Embedded [24–26] (see Fig. 1). Filter
noisy for a given learning task as they have harm‐ Methods rely on the relationship between features
ful consequences in terms of performance and/or and class label (such as distance, dependency, corre‐
computational cost. Moreover, a large number of fea‐ lation etc.) to assess the importance of features. This
tures requires a large amount of memory or storage category is a pre‐processing step, which is indepen‐
space. Applying data mining and machine learning dent from the induction algorithm. Filters are known
algorithms in high‐dimensional data usually leads to by their ease of use and low computational cost. On
the downgrading of their performance due to over‐ the contrary, the Wrapper approach generates mod‐
itting problem [1, 2]. Given the existence of a large els with subsets of features. Then, it uses predic‐
number of features, machine learning models become tion performance as a criterion function or a guiding‐
intricately complicated to interpret as their complex‐ compass to orient the search for the best feature sub‐
ity increases leading to the restriction of the generaliz‐ set. This approach takes into account the interactions
ability. Therefore, reducing the dimensionality of data between features. Generally, Wrappers achieve better
has become indispensable in real world scenarios to performance than some Filter methods. The Embed‐
successfully build understandable and accurate mod‐ ded approach performs feature selection by implica‐
els that can improve data‐mining performance and tion while simultaneously constructing models, which
enhance models interpretability. Data mining can take makes them less costly in terms of execution time than
advantage of dimensionality reduction tools which wrappers do.
are integral parameters central to data pre‐processing
to reduce the highness of data dimensionality [3].
Dimensionality reduction can be categorized into fea‐
ture extraction and feature selection (see igure 1)
[4–6]. Feature extraction aims at transforming the
original feature space to a new reduced one, where
features lose their meaning due to the transformation

56 2024 © Yassine Akhiat et al. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0)
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

Feature Selection
Based on Evaluation Method

Filter method Wrapper method Embedded method

Initial feature set Initial feature set Initial feature set

Generated subset Generated subset Generated subset

ML algorithm ML algorithm

Feature evaluation
Performance evaluation Performance
Evaluation

Optimal subset
Optimal subset
Optimal subset

Figure 1. Feature selection categorization

1.1. Research Objectives 3) Transition similarity measure (TSM) is intended


In this paper, we introduce a new feedback system to maintain the progressive sustainability the
based on reinforcement learning to solve the feature system’s environmental exploration by creating
selection problem. The system keeps exploring the new branches and simultaneously exploiting what
state space while it is moving through the available has learned to avoid redundancy and maximize
space of features to select the best subset. In this sys‐ diversity.
tem, we have used the decision tree branches. There‐ 4) The proposed system can be adapted to any prob‐
fore, each subset is represented by a branch. The lem (it is not dependent on a speci ic dataset)
main idea of the proposed feature selection algorithm because our feature selection problem is consid‐
is to select the applicable subset of features, which ered as reinforcement learning.
are mostly involved in constructing ef icient branches. The remainder of this paper is organized as fol‐
In its preliminary outset, the system endeavors to lows: section two presents the related works. Section
build the irst branch without any pre‐installed knowl‐ three is devoted to the problem and our introduced
edge (exploring the environment). As iterations tran‐ contributions. In the fourth section, the results of the
spire in linearly successive alternation, the system proposed system are introduced. As to the last section,
accumulates experiences that furnish the ground for it is put forward to conclude this work.
constructing better branches (diverse, relevant, etc.)
using the propounded Transition Similarity Measure 2. Related Works
(TSM). Out of the best branches, we select the most
Extensive research and deeply thorough break‐
utilized features in creating them (See the Fig. 2). The
throughs have been disclosed in the domain of feature
contributory aspirations and the quintessential main‐
selection as an ever‐evolving ield of study [3, 13, 19,
springs of this study are fourfold.
27]. In this section, some wrapper algorithms similar
1) A reinforcement learning‐based method is devel‐
to the fundamental one encompassed by this paper are
oped to be used in selecting the best subset of
brie ly reviewed. The irst algorithm is ubiquitous in
features.
FS state of the art, which is forward selection [28, 29].
2) The proposed system traverses the state space to (1) it starts with an empty subset; (2) adds to the
select the informative subset using a modi ied ver‐ subset the feature that increases its performance ; (3)
sion of decision tree branches. Since the transi‐ repeats Step 2 until all features have been examined or
tion between states (feature subsets) is controlled until no better performance is possible ; (4) returns
using Decision tree branches, the proposed sys‐ the subset of features that yields maximum perfor‐
tem is straightforwardly accessible. As a result, the mance on the validation set [30]. This method is fast
spotlighted solution, through effective implemen‐ and effective, but it tends to over‐ it the validation set.
tation of the suggested feature selection method, In [20], the authors proposed a new algorithm enti‐
our proposed system is rendered comprehensively tled ensemble feature selection, which signi icantly
interpretable. reduces this problem.

57
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

Feature Space
Feature space containes all
features (irrelevant, noisy,
redundant ,and relevant)

Traversing the search space Decision Tree branches


by creating DT branches
(Algorithm 1)

Transition Similarity Measure


(TMS)

TMSis used to ensure


The best branches are
the exploit/exploretrade-off
identified according to the
Reward function
Best Constructed of reinforcement learning
Branches
(Best feature subbsets)

The optimal subset includes


the most involved features in
Optimal subset
constructing the best branches

Figure 2. Flow‐chart of the proposed feedback feature selection system

First, for each feature, they train different models Random forests are among the most popular
using different classi ication algorithms. Then, they machine learning algorithms [34]. Thanks to its
store them in a library of models. Second, they use performance, robustness, and interpretability, RF has
a selection with replacement technique [30] to ind proved the frequency of its bene icial applicability.
the optimal subset of models that, when averaged They can select informative variables [11]. RF per‐
together, achieves excellent performance. Another forms feature selection using mean decrease impu‐
wrapper method based on graph representation is rity and means decrease accuracy criteria [35]. Mean
proposed in [14], where the node degree is used as decrease impurity is used to measure the decrease
a criterion to select the best features subset among in the weighted impurity in trees by each feature.
the whole features space. This algorithm consists of Therefore, the features are ranked according to this
two phases: (1) Choosing features to be used in graph measure. Mean decrease accuracy is a measure of
construction. (2) Constructing a graph in which each the feature impact on model accuracy. The values
node corresponds to each feature, and each edge has of each feature are permuted irst. Then, we mea‐
a weight corresponding to the pairwise score among sure how this permutation decreases model accu‐
features connected by that edge. Finally, the best fea‐ racy. The informative features decrease the model
tures are the nodes with the highest degree. In [31], a accuracy signi icantly, while unimportant features
pairwise feature selection (FS‐P) has been introduced, do not.
features are evaluated in pairs using decision tree As opposed to the traditional feature selection (FS)
classi ier. First, it ranks features individually. Second, formalization and the inspiration generated from the
it involves the machine learning algorithm (Decision reinforcement learning approach, the feature selec‐
tree) to evaluate pairs of features. In [32, 33], a well‐ tion problem can be effortlessly handled with the prof‐
known wrapper approach is presented, Recursive Fea‐ itable reliability of our proposed system. The feature
ture Elimination using Random Forest (RFE). RFE per‐ space using our approach can be seen as a Markov
forms feature selection recursively. At the irst itera‐ decision process (MDP) [36, 37], where each subset
tion, the model (Random forest) is trained on whole of features is represented by a state (decision tree
set of attributes. After ranking features according to branch). Our system explores the state space while it
the model’s importance, the least important features exploits the gathered experiences so far using the pro‐
are eliminated. As iteration takes place, the consider‐ posed transition similarity measure (TSM). In [38], the
ing set of features become smaller and smaller until authors proposed a method based on reinforcement
the desired number of features is reached. learning (RL) for selecting the best subset. First, they
use an AOR (average of rewards) criterion to identify
the effectiveness of a given feature in different con‐
ditions. AOR is the average of the difference between

58
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

two consecutive states in several iterations. Second, ‐ Environment is the feature space through which
they introduce an optimal graph search to reduce the the system moves. It receives the system’s current
complexity of the problem. state and action as input; then, it returns the reward
The way our system traverses from one state and the next state of the system.
to another is handled using decision tree branches ‐ State is the current situation where the agent inds
to represent each state, as mentioned before. In its itself. In our context, this is the current node of the
totality, this technique is similar to the way RF cre‐ branch.
ates branch. The RF method creates multiple trees. As the reinforcement concepts are transparently
For each tree, only a random subset of input vari‐ tackled and highlighted, the following steps may
ables is used at each splitting node. Therefore, the unfold in depth with the constitutive mainstay or the
inal trees of RF are independent of each other, and technical infrastructure of our proposed algorithm.
they do not learn from the previously created trees. The feature selection system (agent) scrutinizes
On the other hand, our system can learn from prior the environment, and then starts with a single node
attempts. At each iteration, it explores new branches arbitrarily without any pre‐stockpiled knowledge
and exploits the assimilated knowledge to create (exploration phase), which branches into possible out‐
highly‐performative and qualitative ones in the subse‐ comes. Each of those outcomes leads to the next nodes
quent iteration. (action). To indicate how effective the chosen action
is, a difference between two consecutive states is pro‐
3. Feedback Feature Selection System duced. Since the depth is not yet reached, the system
This paper foregrounds to the brings a new fea‐ keeps adding one node at a time in order to create
ture selection system based on reinforcement learn‐ a branch. As iterations take place, the system assem‐
ing; the proposed system principally comprises three bles experiences and becomes able to take actions
parts. First, decision tree branches are used to tra‐ that maximize the overall reward R. As a yielded off‐
verse the search space (features space) to create new spring, branches of high quality are created. A tran‐
rules (branches or feature subsets) and select the best sition similarity measure is proposed to establish a
feature subset. Second, a transition similarity measure balanced equipoise between exploiting what has been
(TSM) is introduced to ensure that the system keeps learned so far to choose the next action that maximizes
exploring the state space by creating new branches rewards, and continuously exploring the feature space
and exploiting what it has learned so far to circum‐ to achieve long‐term bene its. The way we construct
vent the problematic implications or the drawbacks of the branch is the same as the decision tree (c4.5), the
redundancy. Finally, the relevant features are the most difference is when we add a node to the branch, we
involved ones in constructing the branches of high retain only the best branch with the highest thresh‐
quality. For further illustrative explications, the sub‐ old. The following steps give more precise information
sequent section will accessibly resurface the general about creating a branch.
framework of reinforcement learning and delineate
3.2. Steps to Create a DT Branch
the know‐how dimensions in which our system can
synthesize the bene its of this powerful approach. We start with a random feature as the root of the
branch. As long as the branch did not reach the desired
3.1. Reinforcement Learning Problem depth or min sample leaf yet, the system keeps adding
RL is the most active and fast‐developing area in to the branch one node at a time. The added node is the
machine learning and is one of three basic machine one we obtained using the feature and its threshold
learning approaches, alongside supervised learning that produces the highest AUC score (Area Under the
and unsupervised learning. RL consists of the fol‐ Curve ROC). The idea behind using depth and min
lowing concepts: Agent, environment, actions, and simple leaf parameters as stopping criteria is to avoid
reward. The agent takes action A and interacts with as much as possible the over‐ itting problem. The most
an environment to maximize the total reward received common stopping method is min sample leaf, which is
R. At iteration t, the agent observes state St from the the minimum number of samples assigned to each leaf
environment. In return, the agent gets a reward Rt. node. If the number is less than a given value, then no
The agent takes action 𝐴𝑡 . In response, the environ‐ further split can be done, and the node is considered a
ment provides the next state 𝑆𝑡+1 and reward; the inal leaf node. Besides, the depth of the branch is very
process continues until the agent will be able to take useful in controlling over‐ itting because the deeper
the right actions that maximize the total reward. The the branch is, the more information captured by the
agent must balance between exploiting what has been data and more the splits it has, which leads to predict
learned so far and continuously exploring the envi‐ well on the training data. However, it fails to generalize
ronment to gather more information that may help in on the unseen data.
maximizing the total reward.
3.3. Reward Function
‐ Agent: An agent takes actions. In our case, the agent
is the proposed feature selection system. A reward function R [38] is used to calculate the
score at each level of the branch by computing the
‐ Actions is the ensemble of all possible moves the difference between the score of the current branch
agent can make, for our system, the actions are the and its score after a new node is added (DS). The DS
nodes that may be used to create a branch. indicates how useful the newly added feature is.

59
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

Observation Action A t
Environment
Ot

Reward Rt

Agent

Figure 3. Reinforcement learning framework

Algorithm 1: Create a DT branch


1: Create the root node and choose the split feature. Choose the irst feature randomly.
2: Compute the best threshold of the chosen feature.
3: Split the data on this feature into subsets in order to de ine the node.
4: Compute the AUC score on left and on right of the node, then, we keep the branch with the best AUC score.
5: Add the children node to root node.
6: Choose the next best feature.
7: Repeat from STEP 2 to STEP 5 until the desired depth or min sample leaf of the branch is reached.

This function is de ined as follows: Therefore, the system keeps learning the same
rules and branches. This means that the system will
(𝐴𝑈𝐶𝑛𝑒𝑥𝑡 − 𝐴𝑈𝐶𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ) × log (‖𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ‖) be expensive in terms of execution time, while the
(1) system should be less resources consuming (run time
Where 𝐴𝑈𝐶𝑛𝑒𝑥𝑡 and 𝐴𝑈𝐶𝑐𝑢𝑟𝑟𝑒𝑛𝑡 is the score of the and storage requirement), and the branches should be
current branch and the score after adding a new node, strong and diverse.
𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 is the length of samples used to split an The similarity between two transitions is com‐
internal node. puted by the following formula:
3.4. Transition Similarity Measure |𝑆1 ∩ 𝑆2|
Definition (Transition) 𝑇𝑆𝑀 = (2)
‖𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ‖
A transition is the process in which something
changes from one state to another. In our system, the Where ‖𝑆1 ∩ 𝑆2‖ is the number of shared samples
transition is the link between two successive nodes of between two transitions.
the same branch. 3.5. The Proposed FS method
Transition Similarity Measure Since the proposed algorithm is iterative, the num‐
We proposed a transition similarity measure ber of iteration N is given as the input. The reward
(TSM) to ensure that our system keeps exploring the function is set to zero at the beginning. Our system
state space, learning new rules, and preventing the starts with an empty set F, and at each iteration, the
redundant branches. For each branch, we stock all system creates a new branch and adds it to F. If the
transitions with the corresponding samples used to next subset (branch) is already experienced by the
split each internal node. Since the algorithm is iter‐ system (seen by the system), the system uses this
ative, different branches may share the same transi‐ gathered experiences in the upcoming iterations. Oth‐
tions, which is not a problem. In the case when the erwise, the system keeps exploring new rules, new
majority of the samples (higher than a given thresh‐ patterns, and new branches.
old) are equally used by those transitions of different 3.6. Starting Example
branches, those two transitions are deemed similar,
To explain the proposed algorithm further, we sug‐
which is a huge problem. Allowing similar transitions
gest the following example. We suppose that we have
to be in different branches can lead to constructing
a dataset of 10 features. The igure bellow (Fig. 4)
redundant and useless branches.
contains the whole space of features. The purpose is

60
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

Algorithm 2: Feedback feature selection system pseudo‐code


1: Input:
2: N: number of iteration
3: S: Similarity
4: Output:
5: R: Reward
6: for iteration=1 to N do:
7: F={} to store subsets (branches)
8: Step1: Create the root node (Algorithm 1)
9: Step2: Find all possible transitions (𝑃𝑡 )
10: Add the created node to F for 𝑇𝑖 in 𝑃𝑡 do:
11: for 𝑇𝑖 in 𝑃𝑡 do:
12: if 𝑇𝑖 exist in F then:
13: Compute the similarity between the two transitions using TSM
14: if similarity higher than S then:
15: f = New node (keep learning and exploring the environment )
16: R{F}= (𝐴𝑈𝐶𝑛𝑒𝑥𝑡 − 𝐴𝑈𝐶𝑐𝑢𝑟𝑟𝑒𝑛𝑡 ) × 𝑙𝑜𝑔 (|𝑆𝑢𝑏𝑠𝑒𝑡𝑐𝑢𝑟𝑟𝑒𝑛𝑡 |)
17: else:
18: 𝐹 ∪ 𝑓: Add the chosen node to the branch
19: end
20: end
21: Step3: Repeat until the desired depth and min sample leaf is reached
22: end
23: end
24: Return Reward R

(a) (b) (c)


5 1 5 1 5 1
2 2 2
9 9 9
4 8 4 8 4 8
10 7 10 7 10 7
6 6 6
3 3 3

(d)
3 5 7 4
1 5
10 3 10 9
10
10 10 3 1
5
7
6 5 2

Figure 4. FBS proposed algorithm main steps

to select the best subset of features using the proposed 2) The second iteration 4(c): As we can see in the sec‐
system. ond iteration, the transition (3 ← 9) appeared for
1) First iteration 4(b): The system traverses the fea‐ the second time. Here the TSM (transition similar‐
tures space and creates the irst branch without ity measure) should be involved. If two transitions
any prior knowledge. At each level of the branch, of different branches are similar (nodes with green
the system stores the AUC score using the reward color), the system should not allow them to be in
function R. Moreover, it stores each transition (2 ← the next branches (the current branch included).
3, 3 ← 9, 9 ← 5, 5 ← 6) and its corresponding The system has to explore the state’s environment
subset of samples. to ind new rules to prevent the redundancy in
creating branches.
3) The N iteration 4(d): After N iterations, the system
is capable of identifying the best branches using
the gathered experiences during each iteration.
The top ranked branches constructed using the
system are the illustrated in the sub igure 4(d).

61
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

Table 1. Characteristics of the benchmarking datasets.

#No Datasets #Features #Examples Distribution Class


1: Spambase 57 4601 39% + / 61% ‐ 2
2: Numerai 22 96320 50% + 50% ‐ 2
3: Clean 167 6598 15% + / 85% ‐ 2
4: SPECT 32 80 33% + / 67% ‐ 2
5: Caravan 86 5823 6% + / 94% ‐ 2
6: Ionosphere 34 351 64% + / 36% ‐ 2
7: Credit card 24 30000 22% + / 78% ‐ 2
8: Eye 15 14980 45% + / 55% ‐ 2
9: Sonar 61 208 47% + / 53% ‐ 2

Table 2. Characteristics of the benchmarking datasets.

#No Datasets Interation N Depth Similarity


1: Spambase 360 4 0.95
2: Numerai 100 5 0.9
3: Clean 1000 10 0.65
4: SPECT 100 4 0.7
5: Caravan 650 8 0.6
6: Ionosphere 220 4 0.62
7: Credit card 200 7 0.8
8: Eye 110 6 0.9
9: Sonar 600 3 0.65

Figure 5. Over‐fitting problem

From the above Figure 4, it is clear that the 4.1. Benchmarking Datasets
top subset of features is [3, 5, 10], because those
In this paper, nine binary classi ication datasets
features are involved the most in creating the best
have been employed in different experimental design
branches.
aiming to evaluate the performance of the proposed
feature selection method.
4. Experimental Results and Discussion The datasets are chosen to be different in terms of
class distribution (balanced or imbalanced), linearity,
This experimental section attests to the ef iciency
dataset shift, number of instances and variables. The
of the proposed feedback feature selection system
datasets, which are publicly available, are collected
(FBS) in selecting the best features. Two benchmarks
and downloaded from UCI repository and kaggle plat‐
have been conducted, and then the pro itable service‐
form [39]. An overview of the main characteristics of
ability of our system is appraised by comparing it
each dataset is illustratively tabulated in Table 1.
with two feature selection algorithms. The irst one
is the popular wrapper algorithm named Recursive
4.2. Experiments Settings
Feature Elimination RFE (RFE‐RF). The second one is
the pairwise feature selection algorithm (FS‐P), which Two experimental endeavors are undertaken to
is recently proposed and proved its effectiveness in estimate the workable prospects and the consequen‐
identifying the best features [31]. tial rami ications of our proposed system. Initially, we

62
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

(a) Dataset: credit card (b) Dataset: sonar (c) Dataset: spambase

(d) Dataset: Eye (e) Dataset: musk (f) Dataset: SPECT

(g) Dataset: Caravan (h) Dataset: Numerai (i) Dataset: Ionosphere

Figure 6. Performance of our system compared with the selected features selected by pairwise method on nine
benchmark datasets

will empirically embody the applications of the pro‐ 4.3. Feedback System Parameters
posed algorithms based on the datasets displayed in The Feedback system parameters incorporate a
Table 1 in terms of Area Under the Roc Curve (AUC) systematic trilogy of changeable parameters which
where FBS is compared with the pairwise method, are in a dynamic alteration in accordance with each
namely FS‐P and with RFE. dataset.
In correlative parallelism with the previous step, ‐ S is the similarity value.
the subsequent stage will demonstrate the eligible ‐ D is concerned with the indication of the branches’
capability of the FBS system in encircling the practical depth.
subset as swiftly as possible through the exclusive
‐ N re lects the number of iterations.
employment of the few features supplemented by sec‐
To exemplify the probable changeability of these
ond benchmarking.
parameters. Datasets with large size, the N and D
All datasets are segmented into two subsets; values should be higher since the best branches, in
one subset is employed for training and testing the this case, should be deeper. The following table sup‐
branches using cross‐validation with 3‐folds while the plements a panoramic overview underlying the best
other subset is quarantined and cast aside (holdout parameters used for each dataset.
set) and the performance of the inal selected feature As clearly articulated in the aforementioned sec‐
subset is evaluated on it. For the sake of a fair com‐ tion, the choice of parameters is indispensable. The
parison, the inal selected subset using FBS, FS‐P, and following graph delineates the in luence of the depth
RFE is evaluated using a Random Forest with a grid parameter (D) on the quality of the constructed
search strategy for the hyper‐parameters. The AUC branches using the sonar dataset. This graphic plot
score is calculated using the out of bag (OOB) score of displays a summative snapshot of the train and the test
the random forest classi ier. Since the benchmarking AUC scores after the gradually exponential variation of
datasets used in this paper to evaluate the proposed D parameter from 1 to 15 is ful illed.
system are unbalanced, the AUC metric is considered The recorded outcomes on the sonar dataset show
the best choice. Moreover, the AUC metric generally clearly that the branches prone to over‐ it for large
can be viewed as a better measure than accuracy [40]. depth values because the branches perfectly predict

63
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

all of the train data (the blue line). However, they fail (𝐹𝐵𝑆10 , 𝐹𝑆 − 𝑃10 , 𝑅𝐹𝐸10 ) and subset of 15 (𝐹𝐵𝑆15 ,
to generalize on unseen data (the red line). As can be 𝐹𝑆 − 𝑃15 , 𝑅𝐹𝐸15 ).
visually observed, the best depth for the sonar dataset
itself equals three (D=3). 4.5. Results and Discussion
After selecting the feature subset, the same clas‐
4.4. Conducted Experiments si ier (RF) is essentially mandatory to calculate the
AUC score. The Random forest is utilized to determine
The proposed method is compared to the RFE and
the test performance for the top‐ranked features of
FS‐P approachs in terms of prediction AUC score. In
each employed dataset. The comparative juxtaposi‐
this manuscript, two empirically conclusive and thor‐
tion between FBS, FS‐P and RFE is accessibly repre‐
oughgoing experiments are conducted.
sented in Figure 6 (First experiment).
1) First Experiments: To evaluate our proposed
As stated, our feature selection algorithm FBS
approach FBS, we compare the obtained perfor‐
exceeds and outstrips FSP and RFE considerably in
mance (in terms of AUC score) by FBS with the
almost all datasets, such as SPECT (Figure 6(f)), credit
wrapper method (Recursive feature elimination
card (Figure 6(a)), ionosphere (Figure 6(i)), musk
with random forest RFE) and with the pairwise
(Figure 6(e)), caravan (Figure 6(g)), and sonar (Fig‐
algorithm FS‐P.
ure 6(b)), except for spambase dataset ( 6(c)).
2) Second Experiments: This experiment is con‐ For the numerai dataset (Figure 6(h)), our method
ducted to show the ability of the proposed system has a restrictively limited, if not downgraded perfor‐
FBS in achieving the maximum performance using mance at the beginning compared to RFE and FS‐P. As
just a few features. For a fair comparison between our method does not select just the best‐ranked fea‐
FBS, FS‐P, and RFE, we ix the generated subset size ture as a starting point to prevent selecting a subop‐
for all algorithms compared as follows: subset of timal subset but also attempt to maximize the overall
size 5 (𝐹𝐵𝑆5 , 𝐹𝑆 − 𝑃5 , 𝑅𝐹𝐸5 ), a subset of size 10 performance of the selected subset taking into account

AUC score using the top -5 first features AUC score using the top-10 first features
0,98

0,89

0,97
0,87

0,96
0,85

0,83
0,95

0,81
0,94

0,79

0,93

0,77

0,92
0,75

0,91 0,73
Ionosphere Spambase Musk Sonar Eye Credit card
FBS 5 FS-P5 RFE5 FBS10 FS-P10 RFE10

(a) Dataset: ionosphere, Spambase and Musk datasets (b) Dataset: sonar, Eye and Credit card datasets

AUC score using the top 15 - first features

0,72

0,67

0,62

0,57

0,52
Caravan SPECT Numerai
FBS15 FS-P15 RFE15

(c) Dataset: Caravan, SPECT and Numerai datasets

Figure 7. The performance of FBS, RFE and FS‐P using feature subsets of 5, 10, and 15 features

64
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

the interactions between features. After the selection References


of the numerai dataset’s ifth feature (Figure 6(h)), [1] R. Roelofs, S. Fridovich‐Keil, J. Miller, V. Shankar,
the aforementioned behavioral veracity is rendered M. Hardt, B. Recht, and L. Schmidt, “A meta‐
observable, and FBS shows its drastically improved analysis of over itting in machine learning,” in
performance over FS‐P and RFE. Proceedings of the 33rd International Conference
Table 2 shows the best parameters used in our on Neural Information Processing Systems, 2019,
feedback system. The insightful bottom‐line conclu‐ pp. 9179–9189.
sion we can excerpt from the table is that the choice of
[2] X. Ying, “An overview of over itting and its solu‐
the best parameters to use in each dataset is crucial,
tions,” in Journal of Physics: Conference Series, vol.
which means that the parameters should be carefully
1168, no. 2. IOP Publishing, 2019, p. 022022.
chosen to construct branches with high quality.
The purpose of the proposed feature selection [3] M. Li, H. Wang, L. Yang, Y. Liang, Z. Shang, and
method is not only to improve the classi ication H. Wan, “Fast hybrid dimensionality reduction
performance but also to yield excellent performance method for classi ication based on feature selec‐
using a minimum number of features (select the tion and grouped feature extraction,” Expert Sys-
fewest possible number of features). tems with Applications, vol. 150, p. 113277, 2020.
Figure 7 shows the number of selected features [4] H. Liu, H. Motoda, and L. Yu, “A selective sampling
with the highest AUC score on nine benchmarks data approach to active feature selection,” Arti icial
sets. As it is illustrated through this benchmarking, Intelligence, vol. 159, no. 1‐2, pp. 49–74, 2004.
FBS selects the proper features compared with FS‐P
[5] Y. Akhiat, Y. Asnaoui, M. Chahhou, and A. Zine‐
and RFE almost in all datasets. One point to mention
dine, “A new graph feature selection approach,”
here is that the proposed feedback system can ind
in 2020 6th IEEE Congress on Information Science
the best subset using a minimum amount of features,
and Technology (CiSt). IEEE, 2021, pp. 156–161.
as shown in Figure 7. Thus, the minimum resources
requirement, fast execution, and better generalization. [6] D. M. Atallah, M. Badawy, and A. El‐Sayed, “Intel‐
ligent feature selection with modi ied k‐nearest
neighbor for kidney transplantation prediction,”
5. Conclusion SN Applied Sciences, vol. 1, no. 10, pp. 1–17, 2019.
In this paper, we have proposed a new feature [7] I. Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh,
selection method based on the decision tree branches Feature extraction: foundations and applications.
concept to represent feature subsets. The proposed Springer, 2008, vol. 207.
system deals with the FS problem as a reinforcement
learning problem; the system tries to ind a compro‐ [8] I. Guyon and A. Elisseeff, “An introduction to fea‐
mise between exploring the search space by experi‐ ture extraction,” in Feature extraction. Springer,
encing new rules (creating new branches) and exploit‐ 2006, pp. 1–25.
ing the gathered experiences so as to choose the right [9] A. Yassine, “Feature selection methods for high
actions (relevant feature). The exploit/explore trade‐ dimensional data,” 2021.
off is controlled by the proposed TSM. The proposed [10] Y. Manzali, Y. Akhiat, M. Chahhou, M. Elmohajir,
system can construct the best branches, hence, select‐ and A. Zinedine, “Reducing the number of trees
ing the best subset of features. in a forest using noisy features,” Evolving Sys-
To assess the effectiveness of the selected features tems, pp. 1–18, 2022.
using our proposed method, we have conducted an
extensive set of experiments using nine benchmark‐ [11] Y. Akhiat, Y. Manzali, M. Chahhou, and A. Zine‐
ing datasets. The results con irm that the proposed dine, “A new noisy random forest based method
feedback feature selection system is not only effective for feature selection,” CYBERNETICS AND INFOR-
at selecting the best performing subsets of features MATION TECHNOLOGIES, vol. 21, no. 2, 2021.
that produce the best performance but also choose the [12] S. Abe, “Feature selection and extraction,” in Sup-
fewest number of features. port vector machines for pattern classi ication.
Springer, 2010, pp. 331–341.
[13] J. Cai, J. Luo, S. Wang, and S. Yang, “Feature selec‐
AUTHORS
tion in machine learning: A new perspective,”
Yassine Akhiat∗ – Department of Informatics, fac‐
Neurocomputing, vol. 300, pp. 70–79, 2018.
ulty of sciences dhar el mahraz, USMBA, Fez Morocco,
e‐mail: [email protected]. [14] Y. Akhiat, M. Chahhou, and A. Zinedine, “Fea‐
Ahmed Zinedine – Department of Informatics, fac‐ ture selection based on graph representation,” in
ulty of sciences dhar el mahraz, USMBA, Fez Morocco, 2018 IEEE 5th International Congress on Informa-
e‐mail: [email protected]. tion Science and Technology (CiSt). IEEE, 2018,
Mohamed Chahhou – Department of Informatics, pp. 232–237.
faculty of sciences, UAE, Tetouan Morocco, e‐mail: [15] J. C. Ang, A. Mirzal, H. Haron, and H. N. A. Hamed,
[email protected]. “Supervised, unsupervised, and semi‐supervised

Corresponding author feature selection: a review on gene selection,”
IEEE/ACM transactions on computational biology

65
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 18, N∘ 1 2024

and bioinformatics, vol. 13, no. 5, pp. 971–989, [29] L. C. Molina, L. Belanche, and À. Nebot, “Feature
2015. selection algorithms: A survey and experimental
evaluation,” in 2002 IEEE International Confer-
[16] L. A. Belanche and F. F. González, “Review and
ence on Data Mining, 2002. Proceedings. IEEE,
evaluation of feature selection algorithms in syn‐
2002, pp. 306–313.
thetic problems,” arXiv preprint arXiv:1101.2320,
2011. [30] R. Caruana, A. Niculescu‐Mizil, G. Crew, and
A. Ksikes, “Ensemble selection from libraries of
[17] G. Chandrashekar and F. Sahin, “A survey on fea‐
models,” in Proceedings of the twenty- irst inter-
ture selection methods,” Computers & Electrical
national conference on Machine learning, 2004,
Engineering, vol. 40, no. 1, pp. 16–28, 2014.
p. 18.
[18] B. Nithya and V. Ilango, “Evaluation of machine [31] A. Yassine, C. Mohamed, and A. Zinedine, “Fea‐
learning based optimized feature selection ture selection based on pairwise evalution,” in
approaches and classi ication methods for 2017 Intelligent Systems and Computer Vision
cervical cancer prediction,” SN Applied Sciences, (ISCV). IEEE, 2017, pp. 1–6.
vol. 1, no. 6, pp. 1–16, 2019.
[32] B. Gregorutti, B. Michel, and P. Saint‐Pierre,
[19] A. Bommert, X. Sun, B. Bischl, J. Rahnenführer, “Correlation and variable importance in random
and M. Lang, “Benchmark for ilter methods for forests,” Statistics and Computing, vol. 27, no. 3,
feature selection in high‐dimensional classi ica‐ pp. 659–678, 2017.
tion data,” Computational Statistics & Data Anal-
[33] J. Kacprzyk, J. W. Owsinski, and D. A. Viattchenin,
ysis, vol. 143, p. 106839, 2020.
“A new heuristic possibilistic clustering algo‐
[20] Y. Akhiat, M. Chahhou, and A. Zinedine, “Ensem‐ rithm for feature selection,” Journal of Automa-
ble feature selection algorithm,” International tion Mobile Robotics and Intelligent Systems,
Journal of Intelligent Systems and Applications, vol. 8, 2014.
vol. 11, no. 1, p. 24, 2019.
[34] L. Breiman, “Random forests,” Machine learning,
[21] L. Čehovin and Z. Bosnić, “Empirical evaluation vol. 45, no. 1, pp. 5–32, 2001.
of feature selection methods in classi ication,” [35] H. Han, X. Guo, and H. Yu, “Variable selec‐
Intelligent data analysis, vol. 14, no. 3, pp. 265– tion using mean decrease accuracy and mean
281, 2010. decrease gini based on random forest,” in 2016
[22] Y. Asnaoui, Y. Akhiat, and A. Zinedine, “Fea‐ 7th ieee international conference on software
ture selection based on attributes clustering,” in engineering and service science (icsess). IEEE,
2021 Fifth International Conference On Intelligent 2016, pp. 219–224.
Computing in Data Sciences (ICDS). IEEE, 2021, [36] R. Sutton and A. Barto, “Reinforcement learn‐
pp. 1–5. ing: An introduction. 2017. ucl,” Computer Sci-
[23] Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature ence Department, Reinforcement Learning Lec-
selection: A review and comparative study,” in tures, 2018.
E3S Web of Conferences, vol. 351. EDP Sciences, [37] Y. Fenjiro and H. Benbrahim, “Deep reinforce‐
2022, p. 01046. ment learning overview of the state of the art.”
[24] A. Destrero, S. Mosci, C. D. Mol, A. Verri, Journal of Automation, Mobile Robotics and Intel-
and F. Odone, “Feature selection for high‐ ligent Systems, pp. 20–39, 2018.
dimensional data,” Computational Management [38] S. M. H. Fard, A. Hamzeh, and S. Hashemi, “Using
Science, vol. 6, pp. 25–40, 2009. reinforcement learning to ind an optimal set of
features,” Computers & Mathematics with Appli-
[25] V. Fonti and E. Belitser, “Feature selection using
cations, vol. 66, no. 10, pp. 1892–1904, 2013.
lasso,” VU Amsterdam Research Paper in Business
Analytics, vol. 30, pp. 1–25, 2017. [39] M. Lichman, “Uci machine learning repository
[http://archive. ics. uci. edu/ml]. irvine, ca: Uni‐
[26] I. Guyon and A. Elisseeff, “An introduction to vari‐ versity of california, school of information and
able and feature selection,” Journal of machine computer science,” URL: http://archive. ics. uci.
learning research, vol. 3, no. Mar, pp. 1157–1182, edu/ml, 2013.
2003.
[40] F. F. Provost, “T., and kohavi, r. the case against
[27] R. Zebari, A. Abdulazeez, D. Zeebaree, D. Zebari, accuracy estimation for comparing classi iers,”
and J. Saeed, “A comprehensive review of dimen‐ in Proceedings of the Fifteenth International Con-
sionality reduction techniques for feature selec‐ ference on Machine Learning, 1998.
tion and feature extraction,” Journal of Applied
Science and Technology Trends, vol. 1, no. 2, pp.
56–70, 2020.
[28] J. Miao and L. Niu, “A survey on feature selection,”
Procedia Computer Science, vol. 91, pp. 919–926,
2016.

66

You might also like