Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views17 pages

1.2 ML Termenoly and Activation Function

The document provides an overview of machine learning, detailing its basic components such as data, features, labels, models, algorithms, and key terminology. It categorizes data into structured, unstructured, and semi-structured types, and explains various machine learning paradigms including supervised, unsupervised, and reinforcement learning. Additionally, it discusses activation functions used in neural networks, their purposes, and the importance of selecting appropriate functions for different layers.

Uploaded by

ketaki.mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views17 pages

1.2 ML Termenoly and Activation Function

The document provides an overview of machine learning, detailing its basic components such as data, features, labels, models, algorithms, and key terminology. It categorizes data into structured, unstructured, and semi-structured types, and explains various machine learning paradigms including supervised, unsupervised, and reinforcement learning. Additionally, it discusses activation functions used in neural networks, their purposes, and the importance of selecting appropriate functions for different layers.

Uploaded by

ketaki.mahajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Machine Learning Basics: Components & Key Terminology

Basic Components of Machine Learning:


1. Data:

DataRaw information used to train and test ML models. Machine Learning models rely heavily
on the quality, format, and type of data fed into them. Data can come from various sources like
sensors, user inputs, transaction logs, social media, images, audio, and more. It forms the
foundation of all ML tasks such as classification, prediction, detection, and generation.
Raw information used to train and test ML models.
Types: Structured (tables), unstructured (images, text), semi-structured (JSON, XML).

Raw information used to train and test ML models. Machine Learning models rely heavily on the
quality, format, and type of data fed into them. Data can come from various sources like sensors,
user inputs, transaction logs, social media, images, audio, and more. It forms the foundation of
all ML tasks such as classification, prediction, detection, and generation.

Types of Data:

📃 Types of Data in Machine Learning

1. Structured Data
o Data that is organized into rows and columns and is stored in databases or
spreadsheets. (e.g., spreadsheets, databases).
o It follows a fixed schema, making it easy to query and analyze.
o Examples: Customer databases, sales transactions, student grades, Excel sheets
Examples: Sales records, sensor data, and user profiles.

2. Unstructured Data
o Data that does not follow a predefined data model or structure.
o It is harder to analyze and process directly but contains valuable insights when
processed with the right tools.
o Examples: Emails, social media posts, audio files, video recordings, images, and
text documents.
3. Semi-Structured Data
o Data that does not reside in a relational database but still has some structure
through tags or markers.
o It combines elements of both structured and unstructured data.
o Examples: JSON files, XML documents, HTML pages, log files.
4. Quantitative (Numerical) Data
o Expressed in numbers; measurable.
o Types:
 Discrete: Countable (e.g., number of clicks).
 Continuous: Measurable (e.g., temperature, price).
5. Qualitative (Categorical) Data
o Descriptive; represents categories.
o Types:
 Nominal: No order (e.g., gender, colors).
 Ordinal: Ordered (e.g., education level, rating scale).
6. Time Series Data
o Data collected at different time points.
o Example: Stock prices, weather logs.
7. Text Data
o Unstructured data in natural language.
o Example: Tweets, reviews, news articles.
8. Image & Video Data
o Pixels forming visual content.
o Used in computer vision tasks like classification, detection, and segmentation.

2. Features
Individual measurable properties or characteristics of the data.
Example: In house price prediction, features could be number of
rooms, location, area, etc.

3. Labels (Target Variables)


The output or result we want to predict (in supervised learning).
Example: House price.
4. Model
A mathematical representation that maps input features to the desired
output.

5. Algorithm
The procedure or method used to train a model.
Example: Linear Regression, Decision Trees, ANN, CNN.

6. Training
The process of feeding data into an algorithm to build the model.

7. Testing/Validation
Assessing the performance of a trained model using unseen data.

8. Prediction
Using a trained model to estimate outputs for new data.

9. Loss/Cost Function
A method to measure the difference between predicted and actual
values.

10. Optimization
Adjusting the model parameters to minimize the loss function (e.g.,
using Gradient Descent).
🔑 Key Terminology in Machine Learning
Term Definition
Supervised Learning Learning with labeled data (e.g., classification,
regression).
Unsupervised Learning patterns from unlabeled data (e.g.,
Learning clustering, dimensionality reduction).
Reinforcement Learning via rewards and penalties from
Learning interactions with an environment.
Overfitting Model performs well on training data but poorly
on unseen (test) data.
Underfitting Model is too simple to capture the patterns in
data.
Generalization Model’s ability to perform well on new, unseen
data.
Cross-Validation A method to evaluate model performance by
splitting data into multiple subsets.
Hyperparameters Settings that define the model structure or how
it’s trained (e.g., learning rate, number of layers).
Epoch One complete pass through the entire training
dataset.
Bias Error due to overly simplistic assumptions in the
model.
Variance Error due to model sensitivity to small
fluctuations in the training set.
Confusion Matrix Table used to evaluate the performance of a
classification algorithm.
Accuracy, Precision, Metrics used to evaluate model performance.
Recall, F1 Score

Types of Activation Functions in Machine Learning


1. Linear Activation Function:
o Formula: ( f(x) = x )
Range: (−∞, ∞)
Use Case: Regression problems (not ideal for hidden layers).
Pros: Simple to implement.
Cons: Cannot model complex patterns, no non-linearity.
2. Sigmoid Function
o Formula: ( f(x) = )
Range: (0, 1)
Use Case: Output layer for binary classification.
Pros: Smooth gradient, outputs interpretable as probabilities.
Cons: Vanishing gradient, slow convergence, not zero-centered.
3. Tanh (Hyperbolic Tangent)
o Formula: ( f(x) = (x) = )
Range: (−1, 1)
Use Case: Hidden layers for better centered output.
Pros: Zero-centered, steeper gradient than sigmoid.
Cons: Still suffers from vanishing gradients.
4. ReLU (Rectified Linear Unit)
o Formula: ( f(x) = (0, x) )
Range: [0, ∞)
Use Case: Most commonly used in hidden layers of deep neural
networks.
Pros: Efficient, reduces likelihood of vanishing gradient.
Cons: Dying ReLU problem (neurons can become inactive).
5. Leaky ReLU
o Formula: ( f(x) = x ) if ( x > 0 ), else ( x ) (e.g., ( = 0.01 ))
Range: (−∞, ∞)
Use Case: Hidden layers to avoid dying ReLU issue.
Pros: Allows small gradient when ( x < 0 ).
Cons: Outputs not zero-centered.
6. Parametric ReLU (PReLU)
o Extension of Leaky ReLU where ( ) is a learned parameter.
Use Case: Adaptive activation control in hidden layers.
Pros: Learns best slope for negative values.
Cons: Adds complexity to the model.
7. ELU (Exponential Linear Unit)
o Formula:
( f(x) = x ) if ( x ), else ( (e^x - 1) )
Range: (−α, ∞)
Use Case: Advanced neural networks needing negative outputs.
Pros: Helps maintain zero mean, smooth activation.
Cons: More computationally intensive.
8. Softmax Function
o Formula:
( (x_i) = )
Range: (0, 1) for each output, total sums to 1
Use Case: Output layer in multi-class classification.
Pros: Outputs probabilities across classes.
Cons: Only used in the output layer.

Summary Table

Activatio Range Use Case Pros Cons


n
Linear (-∞, ∞) Regression Simple No non-linearity
Sigmoid (0, 1) Binary classification Probabilistic Vanishing gradient
Tanh (-1, 1) Hidden layers Zero-centered Vanishing gradient
ReLU [0, ∞) Hidden layers Fast, simple Dying neuron
problem
Leaky (-∞, ∞) Hidden layers Solves dying ReLU Not zero-centered
ReLU
PReLU (-∞, ∞) Hidden layers Learnable slope More complex
ELU (-∞, ∞) Hidden layers Smooth, avoids dead Expensive
neurons
Softmax (0, 1) Output layer (multi- Probabilities Used only in
class) output
What is an activation function? What are the different types of
activation functions? Discuss their pros and cons
Categories: DL Training and Optimization
Updated: November 6, 2023

For a complete understanding of Neural Network, check out the post: Basic
architecture of Neural Network, Training Process, and Hyper-parameters
tuning
Mathematical model of a Neuron
Title: Zooming into a neuron – how is activation function applied
Source: V7 Labs

An activation function is a mathematical function applied to the output of a


neuron (or node) in a neural network. When a neural network receives input
data and processes it through its layers, each neuron computes a weighted
sum of the inputs, adds a bias term, and then applies an activation function
to produce the neuron’s output. This output is then passed to the next layer
of neurons as input. Using a simple neural network example, the role of
activation function is illustrated in the figure below:

A simple neural network

Title: Role of Activation Function in a Neural Network


Source: AIML.com Research
Activation functions serve two main purposes:
1. Introduce Non-linearity: Without non-linearity, a neural network
would behave like a linear model, no matter how deep it is. Activation
functions allow the network to learn and represent complex, non-linear
mappings between inputs and outputs.
2. Control Neuron Activation: Activation functions control the firing
behavior of neurons. Depending on the activation function’s output, a
neuron might become activated (output a non-zero value) or remain
inactive (output zero).
Typically, same activation function is applied to all the hidden layers, while
the output layer uses a different activation function, based on the type of
prediction model aims to make. How to choose an activation function is
explained later in this article.

Different types of Activation Functions:


Activation functions, which are popularly used in neural network models, are
shown in the figure below.

Title: Different type of Activation Functions


Source: Stanford CS231 course
Other activations functions are Parametric ReLU (PReLU) (parametric version
of Leaky ReLU), Scaled Exponential Linear Unit (SELU), Swish, Hard Swish,
and Gaussian Error Linear Unit (GELU) Activation.

Pros and Cons of different Activation Functions


Title: Pros and Cons of Activation function
Source: AIML.com Research

Which activation function to choose from?

Activation function for hidden layers:


In practice, the choice of activation functions is as follows in order of priority:
 ReLU is the top choice as it is simpler, faster, much lower run time,
better convergence performance and do not suffer from vanishing
gradient issues
 Leaky ReLU, PReLU, Maxout and ELU
 tanh
 Sigmoid (not preferred anymore)

Activation function for output layers:


 Sigmoid for binary classification, multi-label classification
 Softmax for multi-class classification
 Identity / linear function for regression problems

Some key terms to know w.r.t Activation functions:


Saturation
Related Question: What do you mean by saturation in neural network
training? Discuss the problems associated with saturation
In the context of neural networks, saturation refers to a situation where the
output of an activation function or neuron becomes very close to the
function’s minimum or maximum value (asymptotic ends), and small
changes in the input have little to no effect on the output. Saturation
becomes a critical issue in neural network training as it leads to
the vanishing gradient problem, limiting the model’s information capacity
and its ability to learn complex patterns in the data. When a unit is
saturated, small changes to its incoming weights will hardly impact the unit’s
output. Consequently, a weight optimization training algorithm will face
difficulty in determining whether this weight change positively or negatively
affected the neural network’s performance. The training algorithm would
ultimately reach a standstill, preventing any further learning from taking
place.

Vanishing gradient
Related Question: What is the vanishing and exploding gradient problem,
and how are they typically addressed?
Vanishing gradient refers to a problem that can occur during the training of
deep neural networks, when the gradients of the loss function with respect to
the model’s parameters become extremely small (close to zero) as they are
backpropagated through the layers of the network during training. This leads
to impairment in learning in deep neural networks (DNN). When the
gradients become too small, it means that the model’s weights are not being
updated effectively. As a result, the network’s training may stagnate or
become extremely slow, making it difficult for the network to learn complex
patterns in the data.
Zero-centered output
Related Question: Why is Zero-centered output preferred for an activation
function?
Optimization algorithms like gradient descent tend to work more efficiently
when gradients are centered around zero. Zero-centered activations ensure
that the mean activation value is around zero, preventing them from
becoming too small (vanishing gradients) or too large (exploding gradients).
This contributes to smoother and faster convergence during training. In
addition, zero-centered output helps with bias mitigation by ensuring that
neurons start with balanced initial activations. Balanced activations can lead
to more stable and unbiased updates to the model’s weights during training.
However, please note that while zero centered property is important, it is not
necessary.
Machine Learning – Supervised,
Unsupervised and
Reinforcement
BY AUTHOR · PUBLISHED MAY 15, 2025 · UPDATED MAY 15, 2025
Machine Learning is a technology that enables computers
to learn from given data and make predictions or decisions
without being explicitly programmed. The predictions or
decisions involves training the machine algorithms on large
datasets to recognize patterns and improve over time.
Types of Machine Learning:
1. Supervised,
2. Unsupervised, and
3. Reinforcement Learning
Table of Contents
 Supervised Learning
o Types of Supervised Learning:
 Unsupervised Learning
o Unsupervised learning is used when the dataset does not have labeled responses. The goal is
to infer the natural structure present within a set of data points.
 Reinforcement Learning
o Types of Reinforcement Learning:
 Related Post

Supervised Learning
Supervised learning involves training a model on a labeled dataset, where
each training example is paired with an output label. The model learns to map
the input to the output, making it possible to predict the output for new,
unseen data.
Types of Supervised Learning:
Classification: Used when the output variable is a category.
 Logistic Regression
 Decision Tree
 Random Forest
 Support Vector Machine (SVM)
 K-Nearest Neighbors (KNN)
 Naive Bayes
 Neural Networks
Regression: Used when the output variable is a real or continuous value.
 Linear Regression
 Polynomial Regression
 Ridge Regression
 Lasso Regression
 Elastic Net
 Support Vector Regression (SVR)
 Neural Networks

Unsupervised Learning
Unsupervised learning is used when
the dataset does not have labeled
responses. The goal is to infer the
natural structure present within a
set of data points.
Types of Unsupervised Learning

Clustering: Grouping a set of objects in such a way that objects in the same
group are more similar to each other than to those in other groups.
 K-Means Clustering
 Hierarchical Clustering
 DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
 Gaussian Mixture Models (GMM)
Dimensional Reduction: Reducing the number of random variables under
consideration.
 Principal Component Analysis (PCA)
 t-Distributed Stochastic Neighbor Embedding (t-SNE)
 Linear Discriminant Analysis (LDA)
 Autoencoders

Reinforcement Learning
Reinforcement learning involves training an agent to make a sequence of
decisions by rewarding it for good decisions and penalizing it for bad ones.
The agent learns to achieve its goal by interacting with the environment.
Types of Reinforcement Learning:
Value-Based Methods: Learning the value of actions in states.
 Q-Learning
 SARSA (State-Action-Reward-State-Action)
Policy-Based Methods: Learning a policy that maps states to actions.
 REINFORCE Algorithm
 Proximal Policy Optimization (PPO)
Model-Based Methods: Learning a model of the environment to simulate
future states.
 Dyna-Q

Related Post
 Machine Learning – Principal Component Analysis
 Confusion Matrix – Machine Learning

You might also like