Active Learning From Imbalanced Data

This paper proposes an active learning algorithm called Active Online Weighted Extreme Learning Machine (AOW-ELM) to address challenges with active learning on imbalanced data. AOW-ELM uses weighted ELM as its base classifier, adopts an online learning mode for weights, and introduces an early stopping criteria. It is shown to outperform other active learning algorithms for imbalanced data in being more effective and efficient while maintaining good performance. The paper discusses how active learning can be disrupted by imbalanced data and analyzes factors influencing this. It suggests using clustering to select an initial labeled set to avoid potential issues.

Uploaded by

Hrishikesh Lahkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

103 views4 pages

Active Learning From Imbalanced Data

Uploaded by

Hrishikesh Lahkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Active Learning From Imbalanced Data: A Solution of Online

Weighted Extreme Learning Machine

ABSTRACT:
It is well known that active learning can simultaneously improve the quality of the
classification model and decrease the complexity of training instances. However, several
previous studies have indicated that the performance of active learning is easily disrupted by
an imbalanced data distribution. Some existing imbalanced active learning approaches also
suffer from either low performance or high time consumption. To address these problems,
this paper describes an efficient solution based on the extreme learning machine (ELM)
classification model, called active online-weighted ELM (AOW-ELM). The main
contributions of this paper include: 1) the reasons why active learning can be disrupted by an
imbalanced instance distribution and its influencing factors are discussed in detail; 2) the
hierarchical clustering technique is adopted to select initially labeled instances in order to
avoid the missed cluster effect and cold start phenomenon as much as possible; 3) the
weighted ELM (WELM) is selected as the base classifier to guarantee the impartiality of
instance selection in the procedure of active learning, and an efficient online updated mode of
WELM is deduced in theory; and 4) an early stopping criterion that is similar to but more
flexible than the margin exhaustion criterion is presented. The experimental results on 32
binary-class data sets with different imbalance ratios demonstrate that the proposed AOW-
ELM algorithm is more effective and efficient than several state-of the-art active learning
algorithms that are specifically designed for the class imbalance scenario.

INTRODUCTION:
Active learning is a popular machine learning paradigm and it is frequently deployed in the
scenarios when largescale instances are easily collected, but labelling them is expensive
and/or time-consuming [1]. By adopting active learning, a classification model can iteratively
interact with human experts to only select those most significant instances for labelling and to
further promote its performance as quickly as possible. Therefore, the merits of active
learning lie in decreasing both the burden of human experts and the complexity of training
instances but acquiring a classification model that delivers superior or comparable
performance to the model with labelling all instances.
EXISTING SYSTEM:
Existing system were a large number of active learning models, and generally, we have
several different taxonomies to organize these models. Based on different ways of entering
the unlabeled data, active learning can be divided into pool-based and stream-based models.
The former previously collects and prepares all unlabeled instances, while the latter can only
visit a batch of newly arrived unlabeled data at each specific time point. In addition, we have
several different significance measures to rank unlabeled instances, including uncertainty,
representativeness, inconsistency, variance, and error. In the past decade, active learning has
also been deployed in a variety of real world applications, such as video annotation, image
retrieval, text classification, remote sensing image annotation, speech recognition, network
intrusion detection, and bioinformatics. Active learning is undoubtedly effective, but several
recent studies have indicated that active learning tends to fail when it is applied to data with a
skewed class distribution. That is, similar to traditional supervised learning, active learning
also dares to face class imbalance problem. Several previous studies have tried to address this
problem by using different techniques. In particular, cost-sensitive SVM (CS-SVM) was
employed as the base learner, empirical costs were assigned according to the prior imbalance
ratio, and two traditional stopping criteria, i.e., the minimum error and the maximum
confidence, were adopted to find the appropriate stopping condition for active learning. The
method is robust and effective; however, it is also more time-consuming because the high
time-complexity of training an SVM and no use of online learning. Tomanek and Hahn
proposed two methods based on the inconsistency significance measure: balanced-batch
active learning (AL-BAB) and active learning with boosted disagreement (AL-BOOD),
where the former selects n labeled instances that are class balanced from 5n bnew labeled
instances on each round of active learning, while the latter modifies the equation of voting
entropy to make instance selection focus on the minority class. AL-BOOD must deploy many
diverse base learners (ensemble learning) to calculate the voting entropy of predictive labels,
which will inevitably increase the computational burden. Ertekin et al. indicated that near the
boundary of two different classes, the imbalance ratio is generally much lower than the
overall ratio, thus adopting active learning can effectively alleviate the negative effects of
imbalanced data distribution. In other words, they consider active learning to be a specific
sampling strategy. In addition, a margin exhaustion criterion is proposed as an early stopping
criterion to confirm the stopping condition because they selected SVM as a base learner. To
summarize the existing active learning algorithms applied in the scenario of unbalanced data
distributions, we found that they suffer from either low classification performance or high
time-consumption problems.
Disadvantages:
1) Active learning tends to fail when it is applied to data with a skewed class
distribution.
2) Active learning also dares to face class imbalance problem.
3) More time-consuming

PROPOSED SYSTEM:
We wish to propose an effective and efficient algorithm. The proposed algorithm is named
active online weighted ELM (AOW-ELM), and it should be applied in the pool-based batch-
mode active learning scenario with an uncertainty significance measure and ELM classifier.
We select ELM as the baseline classifier in active learning based on three observations: 1) It
always has better than or at least comparable generality ability and classification performance
as do SVM and MLP; 2) It can tremendously save training time compared to other classifiers;
and 3) It has an effective strategy for conducting active learning. In AOW-ELM, we first take
advantage of the idea of cost-sensitive learning to select the weighted ELM (WELM) as the
base learner to address the class imbalance problem existing in the procedure of active
learning. Then, we adopt the AL-ELM algorithm to construct an active learning framework.
Next, we deduce an efficient online learning mode of WELM in theory and design an
effective weight update rule. Finally, benefiting from the idea of the margin exhaustion
criterion, we present a more flexible and effective early stopping criterion. Moreover, we try
to simply discuss why active learning can be disturbed by skewed instance distribution,
further investigating the influence of three main distribution factors, including the class
imbalance ratio, class overlapping, and small disjunction. Specifically, we suggest adopting
the clustering techniques to previously select the initially labelled seed set, and thereby avoid
the missed cluster effect and cold start phenomenon as much as possible. Experiments are
conducted on 32 binary-class imbalanced data sets, and the results demonstrate that the
proposed algorithmic framework is generally more effective and efficient than several state-
of-the art active learning algorithms that were specifically designed for the class imbalance
scenario. The rest of work is organized as follows. (1) Introduces some priori knowledge
related to similar work. (2) We construct several representative synthetic data sets with
different distributions to analyze the reason why active learning can be destroyed by skewed
instance distribution. (3) Presents our proposed algorithmic framework in detail.
Advantages:
1) Better performance
2) Saves training time
3) More effective and efficient than earlier studies.

SYSTEM REQUIREMENTS:
Hardware Requirements:

System : Intel Core i5 3 GHZ

Memory : 16GB.
Hard Disk : 250 GB.
GPU : Nvidia Gforce GTX1050Ti 4GB

Software Requirements:
Operating System : Windows 7 / 8 or above.
Language : Python 3
Tool : Anaconda

CONCLUSION:
In this paper, we explore the problem of active learning in class imbalance scenario, and
present a solution of online WELM named the AOW-ELM algorithm. We find that the
harmfulness of skewed data distribution is related to multiple factors, and can be seen as a
combination of these factors. Hierarchical clustering can be effectively used to previously
extract initial representative instances into a seed set to address the potential missed cluster
effect and cold start phenomenon. The comparison between the proposed AOW-ELM
algorithm and some other benchmark algorithms indicates that AOW-ELM is an effective
strategy to address the problem of active learning in a class imbalance scenario. The merits of
the AOW-ELM algorithm can be summarized as follows.
1) It has a robust weight update rule.
2) Its running time is fast and linear with the training instances.
3) It has a flexible early stopping criterion.
4) It is appropriate for various types of data sets.
In the future work, we will focus more on the problem of active learning on multiclass
imbalanced data sets. In addition, the active learning strategies addressing imbalanced and
unlabelled data streams with handling concept drifts will also be investigated.

Active Learning for Imbalanced Data
No ratings yet
Active Learning for Imbalanced Data
22 pages
Active Learning for Imbalanced Data
No ratings yet
Active Learning for Imbalanced Data
34 pages
Mathematics 11 00820
No ratings yet
Mathematics 11 00820
38 pages
A Survey On Online Active Learning T41pz1uj
No ratings yet
A Survey On Online Active Learning T41pz1uj
64 pages
Krawczyk2016 Article LearningFromImbalancedDataOpen
No ratings yet
Krawczyk2016 Article LearningFromImbalancedDataOpen
12 pages
Human Annotator For Imbalanced Dossier
No ratings yet
Human Annotator For Imbalanced Dossier
11 pages
Active Learning For Data Streams A Survey
No ratings yet
Active Learning For Data Streams A Survey
48 pages
An Active Learning Algorithm Based On Parzen Window Classification
No ratings yet
An Active Learning Algorithm Based On Parzen Window Classification
14 pages
hospedalesEtAl Pakdd2011
No ratings yet
hospedalesEtAl Pakdd2011
12 pages
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
No ratings yet
Kumar 2021 IOP Conf. Ser. Mater. Sci. Eng. 1099 012077
9 pages
A Survey On Imbalanced Learning - Latest Research, Applications and Future Directions
No ratings yet
A Survey On Imbalanced Learning - Latest Research, Applications and Future Directions
51 pages
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
100% (1)
A Systematic Review On Imbalanced Data Challenges in Machine Learning: Applications and Solutions
36 pages
Learning From Imbalanced Data: Open Challenges and Future Directions
No ratings yet
Learning From Imbalanced Data: Open Challenges and Future Directions
13 pages
Active Learning For Data Streams A Survey
No ratings yet
Active Learning For Data Streams A Survey
55 pages
IR-Lab Project of Yanjun Qi (Fall 2004) : A Brief Literature Review of Class Imbalanced Problem
No ratings yet
IR-Lab Project of Yanjun Qi (Fall 2004) : A Brief Literature Review of Class Imbalanced Problem
5 pages
1 s2.0 S0950705119302898 Main
No ratings yet
1 s2.0 S0950705119302898 Main
17 pages
Caie D 22 03171
No ratings yet
Caie D 22 03171
36 pages
Imbalanced Data
No ratings yet
Imbalanced Data
54 pages
2018 12state of ArtofImbalancedDataClassificationMethods
No ratings yet
2018 12state of ArtofImbalancedDataClassificationMethods
7 pages
Analysis of Imbalanced Classification Algorithms A Perspective View
No ratings yet
Analysis of Imbalanced Classification Algorithms A Perspective View
5 pages
Ecmlpkdd 2019
No ratings yet
Ecmlpkdd 2019
17 pages
Imbalanced Data Learning Guide
No ratings yet
Imbalanced Data Learning Guide
22 pages
TR1648
No ratings yet
TR1648
47 pages
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
No ratings yet
Streaming Active Learning With Deep Neural Networks: Ash & Adams 2020
17 pages
An Overview of Classification Algorithms For Imbalanced Datasets
No ratings yet
An Overview of Classification Algorithms For Imbalanced Datasets
7 pages
Active Learning
No ratings yet
Active Learning
102 pages
2018 NeuroComp Imbalanced
No ratings yet
2018 NeuroComp Imbalanced
14 pages
Data Mining - Utrecht University - 13. Active Learning
No ratings yet
Data Mining - Utrecht University - 13. Active Learning
57 pages
ADASYN: Adaptive Synthetic Sampling Approach For Imbalanced Learning
No ratings yet
ADASYN: Adaptive Synthetic Sampling Approach For Imbalanced Learning
7 pages
Active Learning for SVMs
No ratings yet
Active Learning for SVMs
8 pages
Class Notes
No ratings yet
Class Notes
24 pages
Imbalanced Data Classification Method Based On LSSASMOTE
No ratings yet
Imbalanced Data Classification Method Based On LSSASMOTE
9 pages
10.1007@s10489 019 01428 1
No ratings yet
10.1007@s10489 019 01428 1
14 pages
Active Learning Icml09
No ratings yet
Active Learning Icml09
96 pages
Eng2 12298 PDF
No ratings yet
Eng2 12298 PDF
24 pages
Active Rare Class Discovery and Classification Using Dirichlet Processes
No ratings yet
Active Rare Class Discovery and Classification Using Dirichlet Processes
18 pages
Yuan 2019
No ratings yet
Yuan 2019
9 pages
A Review On Handling Imbalanced Data
No ratings yet
A Review On Handling Imbalanced Data
12 pages
An Insight Into Classification With Imbalanced Data
No ratings yet
An Insight Into Classification With Imbalanced Data
29 pages
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
No ratings yet
Expert Systems With Applications: Georgios Douzas, Fernando Bacao
8 pages
30 - Bayesian Active Learning For Production, A Systematic Study and A Reusable
No ratings yet
30 - Bayesian Active Learning For Production, A Systematic Study and A Reusable
8 pages
MARE RE Next
No ratings yet
MARE RE Next
6 pages
2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification
No ratings yet
2008 - CVPR-gjqi - Two-Dimensional Active Learning For Image Classification
8 pages
Active Downsampling For Binary Classification With An Imbalanced Dataset
No ratings yet
Active Downsampling For Binary Classification With An Imbalanced Dataset
7 pages
Class Imbalance Problem in Data Mining: Review
No ratings yet
Class Imbalance Problem in Data Mining: Review
5 pages
2013-Elsevier-Weighted Extreme Learning Machine For Imbalance Learning
No ratings yet
2013-Elsevier-Weighted Extreme Learning Machine For Imbalance Learning
14 pages
1 s2.0 S0950705121000411 Main
No ratings yet
1 s2.0 S0950705121000411 Main
15 pages
Imbalanced Data Problem in Machine Learning A Review
No ratings yet
Imbalanced Data Problem in Machine Learning A Review
14 pages
Natural Language Processing
No ratings yet
Natural Language Processing
31 pages
Combining Active Learning With Concept Drift Detection For Data Stream Mining
No ratings yet
Combining Active Learning With Concept Drift Detection For Data Stream Mining
6 pages
Smart Pools of Data With Ensembles For Adaptive Learning in Dynamic Data Streams With Class Imbalance
No ratings yet
Smart Pools of Data With Ensembles For Adaptive Learning in Dynamic Data Streams With Class Imbalance
9 pages
Active Sample Learning and Feature Selection: A Unified Approach
No ratings yet
Active Sample Learning and Feature Selection: A Unified Approach
11 pages
Activing Learning Method Using SVM For Text Classification
No ratings yet
Activing Learning Method Using SVM For Text Classification
9 pages
Gal 17 A
No ratings yet
Gal 17 A
10 pages
Active Learning U1
No ratings yet
Active Learning U1
35 pages
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
No ratings yet
Classifying Imbalanced Data Sets Using Similarity Based Hierarchical Decomposition
16 pages
SMOTE and OSS for Multiclass EDM
No ratings yet
SMOTE and OSS for Multiclass EDM
5 pages
Yih-Chun Hu Et Al., (11) Invented A New Protocol For Secure Routing Named As Ariadne. Ariadne
No ratings yet
Yih-Chun Hu Et Al., (11) Invented A New Protocol For Secure Routing Named As Ariadne. Ariadne
2 pages
Literature Survey: David B. Johnson, Et Al., (1) Proposed A New Routing Protocol To The Ad-Hoc Network Called As
No ratings yet
Literature Survey: David B. Johnson, Et Al., (1) Proposed A New Routing Protocol To The Ad-Hoc Network Called As
3 pages
Anaconda Install
No ratings yet
Anaconda Install
7 pages
1.5 Organization of The Project: 1.4.1 Advantages of Proposed System
No ratings yet
1.5 Organization of The Project: 1.4.1 Advantages of Proposed System
2 pages
Chapter - 1
No ratings yet
Chapter - 1
2 pages
4.3 Module Description
No ratings yet
4.3 Module Description
2 pages
Home Automation Using IOT: (Low Gas Level) (Else)
No ratings yet
Home Automation Using IOT: (Low Gas Level) (Else)
1 page
2.2 Technology Used
No ratings yet
2.2 Technology Used
2 pages
Chapter 1
No ratings yet
Chapter 1
4 pages
Online Student Feedback System
No ratings yet
Online Student Feedback System
2 pages
Chapter-3 Project Overview: 3.1 High Level Description of The Project
No ratings yet
Chapter-3 Project Overview: 3.1 High Level Description of The Project
2 pages
Smart Medicine Reminder Box: Abstract
No ratings yet
Smart Medicine Reminder Box: Abstract
6 pages
Level 0
No ratings yet
Level 0
2 pages
Level 0
No ratings yet
Level 0
2 pages
Online Video Streaming System: International Journal of Advanced Research in Computer Science
No ratings yet
Online Video Streaming System: International Journal of Advanced Research in Computer Science
3 pages
Data Flow Diagram: Level-0
No ratings yet
Data Flow Diagram: Level-0
2 pages
Deep Learning RNN & LSTM Guide
100% (1)
Deep Learning RNN & LSTM Guide
44 pages
Shape Classification Using Histogram of Oriented Gradients
No ratings yet
Shape Classification Using Histogram of Oriented Gradients
6 pages
Berlo's Communication Model Guide
No ratings yet
Berlo's Communication Model Guide
27 pages
Laboratory 10: Identification by The Least-Squares Method: Problem 1
No ratings yet
Laboratory 10: Identification by The Least-Squares Method: Problem 1
3 pages
Constraints
No ratings yet
Constraints
26 pages
Log
No ratings yet
Log
85 pages
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
No ratings yet
Keras Cheat Sheet Python For Data Science: Model Architecture Inspect Model
1 page
Machine Learning
No ratings yet
Machine Learning
9 pages
Applied Artificial Intelligence: by Jerry Felsen, PH.D
No ratings yet
Applied Artificial Intelligence: by Jerry Felsen, PH.D
8 pages
Data Structures & Algorithms Guide
No ratings yet
Data Structures & Algorithms Guide
7 pages
SQL Notes 1
No ratings yet
SQL Notes 1
101 pages
Understanding Etl Er1
No ratings yet
Understanding Etl Er1
34 pages
ChatGPT Overview for Tech Enthusiasts
No ratings yet
ChatGPT Overview for Tech Enthusiasts
15 pages
Stuart Umpleby's Cybernetics Works
No ratings yet
Stuart Umpleby's Cybernetics Works
27 pages
The Quadruple-Tank Process - Johansson
No ratings yet
The Quadruple-Tank Process - Johansson
24 pages
CNC 2004 (English)
No ratings yet
CNC 2004 (English)
6 pages
Analysis of Seized Drug Samples
No ratings yet
Analysis of Seized Drug Samples
20 pages
Control and Optimization of A Multiple Effect Evaporator
No ratings yet
Control and Optimization of A Multiple Effect Evaporator
6 pages
Automated Vehicle License Plate Detection System Using Image Processing Algorithms PDF
No ratings yet
Automated Vehicle License Plate Detection System Using Image Processing Algorithms PDF
5 pages
MPC Its
No ratings yet
MPC Its
15 pages
Machines That Can See
No ratings yet
Machines That Can See
7 pages
Hand Gesture Recognition Approach:A Survey
No ratings yet
Hand Gesture Recognition Approach:A Survey
4 pages
Semantic (Concept of Meaning) : Word Count (1,878)
No ratings yet
Semantic (Concept of Meaning) : Word Count (1,878)
9 pages
Program Schedule PDF
No ratings yet
Program Schedule PDF
18 pages
AI Problem-Solving Tasks
No ratings yet
AI Problem-Solving Tasks
2 pages
Final Intern Report
No ratings yet
Final Intern Report
30 pages
Docbank: A Benchmark Dataset For Document Layout Analysis
No ratings yet
Docbank: A Benchmark Dataset For Document Layout Analysis
12 pages
Dbms Assignment 1
No ratings yet
Dbms Assignment 1
17 pages
Image Processing for Developers
No ratings yet
Image Processing for Developers
64 pages

Active Learning From Imbalanced Data

Uploaded by

Active Learning From Imbalanced Data

Uploaded by

Active Learning From Imbalanced Data: A Solution of Online

Weighted Extreme Learning Machine

System : Intel Core i5 3 GHZ

You might also like