Classification in Machine Learning By: Loga Aswin
Classification involves the process of grouping data or objects into predetermined
categories or classes, relying on their inherent features or attributes.
In the context of machine learning, classification is a specific type of supervised
learning approach. It entails training an algorithm using a labeled dataset to make
predictions about the class or category of new, previously unseen data.
➢ For instance, a classification model could be trained on a dataset of images,
which are marked as either dogs or cats. Subsequently, it can be employed to
predict the class of new, unclassified images of dogs or cats by analyzing
features like color, texture, and shape.
[Having Similarity in One Class and Dissimilarities in Other Class]
Types of Classification:
Binary Classification:
• Binary classification aims to categorize input data into one of
two predefined classes or categories.
• For instance, in the context of binary classification, we might
assess a person's health condition to determine whether they
have a specific disease or not.
Example:
1. Spam Email Detection:
Binary Outcome: Spam or Not Spam
Given an email, classify it as either spam or not spam based on its
content and characteristics.
2. Medical Diagnosis - Disease Detection:
Binary Outcome: Presence of a Disease or No Disease
Evaluate a patient's medical data to determine whether they have a
specific disease (e.g., diabetes, cancer).
Multiclass Classification:
• Multiclass classification involves the objective of classifying
input data into one of multiple classes or categories, more than
two.
• As an example of multiclass classification, we can use data
related to various species of flowers to identify which species a
given observation belongs to.
Example:
1. Language Identification:
Multiple Classes: English, French, Spanish, etc.
Determine the language of a given text, classifying it into one of
several language categories.
2. Iris Flower Species Classification:
Multiple Classes: Setosa, Versicolor, Virginica
Using features like sepal length and width, and petal length and width,
classify iris flowers into one of three species.
Types of classification algorithms
1. Linear Classifiers:
Linear classifiers establish a straight-line decision boundary between
classes. Some examples of linear classification models include:
➢ Logistic Regression
➢ Support Vector Machines with a linear kernel
➢ Single-layer Perceptron
2. Non-linear Classifiers:
Non-linear classifiers define a curved or non-linear decision boundary
between classes. Some non-linear classification models include:
➢ K-Nearest Neighbors
➢ Kernel Support Vector Machines
➢ Naive Bayes
➢ Decision Tree Classification
➢ Ensemble Learning Classifiers:
3. Ensemble learning classifiers:
It combine multiple base models to make predictions. They often
outperform individual models and offer increased accuracy and
robustness.
Some ensemble classifiers include:
➢ Random Forests
➢ AdaBoost
➢ Bagging Classifier
➢ Voting Classifier
How does classification work?
➢ Data Collection:
Gather a dataset that contains labeled examples, where each
example has a set of features and a known class or category.
➢ Data Preprocessing:
Clean and preprocess the data, which may involve handling
missing values, scaling features, and encoding categorical
variables.
➢ Feature Selection/Extraction:
Choose relevant features that are likely to have a strong impact on
the classification task. Feature extraction can also be used to
create new features.
➢ Data Splitting:
Divide the dataset into training, validation, and testing sets. This
is typically done to train the model, tune hyperparameters, and
evaluate its performance.
➢ Model Selection:
Choose an appropriate classification algorithm based on the
nature of the problem and the characteristics of the dataset.
Common choices include logistic regression, decision trees,
support vector machines, etc.
➢ Model Training:
Use the training data to fit the chosen classification model. The
model learns to make predictions based on the input features and
class labels.
➢ Hyperparameter Tuning:
Optimize the model's hyperparameters using the validation set.
Techniques like grid search or random search can be employed to
find the best hyperparameters.
➢ Model Evaluation:
Assess the model's performance on the testing set using
evaluation metrics such as accuracy, precision, recall, F1 score, or
area under the ROC curve (AUC).
➢ Model Deployment:
If the model meets the performance requirements, deploy it for
real-world use. This could involve integrating it into an
application or system.
Applications of Classification Algorithm:
➢ Credit risk assessment
➢ Medical diagnosis
➢ Image classification
➢ Sentiment analysis
➢ Fraud detection
➢ Recommendation System