Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views9 pages

Introduction To Machine Learning

Machine Learning (ML) is a branch of artificial intelligence that allows systems to learn from data and make predictions without explicit programming, with applications in various fields. It includes supervised, unsupervised, and reinforcement learning, and offers benefits like improved decision-making and automation, though it faces challenges such as data bias and high costs. Key concepts in ML also involve regression techniques, data preprocessing, statistics, and probability, which are essential for model training and evaluation.

Uploaded by

akshitthakur371
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Introduction To Machine Learning

Machine Learning (ML) is a branch of artificial intelligence that allows systems to learn from data and make predictions without explicit programming, with applications in various fields. It includes supervised, unsupervised, and reinforcement learning, and offers benefits like improved decision-making and automation, though it faces challenges such as data bias and high costs. Key concepts in ML also involve regression techniques, data preprocessing, statistics, and probability, which are essential for model training and evaluation.

Uploaded by

akshitthakur371
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Machine Learning

Machine Learning (ML) is a subset of artificial intelligence (AI) that enables systems to learn from data and make
predictions or decisions without being explicitly programmed. It involves feeding data into algorithms to identify
patterns and make predictions on new data. Machine learning is used in various applications, including image and
speech recognition, natural language processing, and recommender systems.
Types of Machine Learning:
1. Supervised Learning – Models learn from labeled data (technique : classification, regression) .Labeled data is data
that is already known and categorized, helping to identify the type of data coming from the user.
2. Unsupervised Learning – Models find hidden patterns in unlabeled data (technique : clustering, dimensionality
reduction). Unlabeled data is raw data that has not been categorized or assigned specific labels. It does not have
predefined outputs, so machine learning models must find patterns and relationships on their own.
3. Reinforcement Learning – Models learn by interacting with an external environment and receiving feedback (e.g.,
game playing, robotics).

Benefits of Machine Learning


 Saves Time and Effort: Machine learning (ML) can handle repetitive tasks, allowing people to focus on more
important work. It also makes processes faster and more efficient.
 Better Decisions: ML can analyze large amounts of data and find patterns that humans might not notice. This helps
in making smarter decisions based on real information.
 Personalized Experience: ML improves user experiences by customizing recommendations, ads, and content based
on individual preferences.
 Smarter Machines and Robots: ML helps robots and machines perform complex tasks more accurately, which is
transforming industries like manufacturing and logistics.

Scope and Limitations of Machine Learning:

Scope (Where Machine Learning is Used)


 Automation in Different Fields: ML is used in healthcare, finance, marketing, and more to automate tasks and
improve efficiency.
 Better Decision-Making: ML helps predict future trends, making it useful for businesses and research.
 Personalized Experience: ML powers recommendation systems like Netflix and Amazon, showing users content they
are likely to enjoy.
 Fraud Detection: ML helps banks and online platforms detect and prevent fraud.
 Smart Assistants: Virtual assistants like Siri and Google Assistant use ML to understand and respond to voice
commands.
Limitations (Challenges of Machine Learning)
 Needs a Lot of Data: ML works best with large amounts of data. Without enough data, it may not learn properly.
 Can Be Biased: If the training data is biased, ML can make unfair or incorrect decisions.
 High Cost and Time-Consuming: Training ML models requires powerful computers and a lot of time.
 Lacks Human Understanding: ML can recognize patterns but does not truly "understand" like a human does.
 Security Risks: Hackers can trick ML models, making them vulnerable to cyberattacks.

Regression in Machine Learning


Regression in machine learning refers to a supervised learning technique that establishes a relationship between
independent and dependent variables. It helps understand how changes in independent variables affect the dependent
variable.
For example, when buying a mobile phone, the price (dependent variable) depends on factors like RAM, storage, and
camera quality (independent variables). Regression helps find how much each factor influences the price.
Regression is used to predict continuous values based on input data.
Types of Regression:
1. Simple Linear Regression – Establishes a straight-line relationship between one independent variable and one
dependent variable.
2. Multiple Linear Regression – Models the relationship between two or more independent variables and a
dependent variable using a straight line.
3. Polynomial Regression – Fits a curved (polynomial) relationship between the independent and dependent
variables, useful when data is not linear.

1. The dependent variable (target) is what we are trying to predict, such as the price of a house.
2. The independent variables (features) are the factors that influence this prediction, like the locality, number of rooms,
and house size.

Advantages of Regression
 Simple to understand and explain.
 Works well even if some data points are very different (outliers).
 Can easily handle straight-line (linear) relationships between variables.
Disadvantages of Regression
 Assumes that the relationship between variables is always a straight line.
 Can give incorrect results if two or more independent variables are too similar (multicollinearity).
 Not the best choice for very complex relationships.

Data Visualization:
Data Visualization is the process of turning complex data or predictions into interactive and visually appealing graphs or
charts, making it easier to understand the results.
This is an optional feature you provide to your clients to help them better understand the output. It includes various types
of graphs, such as:
1. Bar Chart – A bar chart is a graph that uses rectangular bars to show data in a visual format, where the length or
height of each bar represents the value of the data
2. Line Chart – A line chart is a graph that uses a line to represent data points over time, showing trends or changes in
values.
3. Pie Chart – A pie chart is a circular graph where each slice represents a portion of the whole, showing how different
values compare to each other.
4. Scatter Plot – A scatter plot is a graph that displays data using dots, showing the relationship between two variables.
5. Histogram – Shows the distribution of data over a range.
6. Heatmap – Uses colors to represent values in a matrix or table.

Data Preprocessing:
Data Preprocessing is a process used to clean, integrate and transform the data before feeding it into the training data. As
the name suggests, preprocessing means processing the data before the algorithm uses the training data set.

For Example: Imagine you conducted a survey among students about their study habits and satisfaction levels, but the
collected data has issues. Some responses have missing values, duplicate entries, and inconsistent formats (e.g., "5 hrs"
vs. "Five hours"). There are also outliers like unrealistic study hours ("25 hours per day") and irrelevant data such as
"Email ID." Additionally, different rating scales (1-5 vs. 1-10) and typographical errors ("exelent" instead of "excellent")
can affect analysis. These issues need to be fixed through Data Preprocessing before further use.

Data Preprocessing Steps:


1. Cleaning: Converting data according to the requirements of the training dataset. For example, if there are
null/empty values but the training data should not contain them, we must handle them appropriately. Unwanted/
Irrelevant data is considered noise in machine learning. Therefore, we perform denoising (reducing noise) to
improve data quality.
2. Integration: Gathering data from different sources into a single system. Since different databases may have different
schemas and formats, we need to standardize them into a unified structure or group them into a single place with
similar attributes of data. After that we gather whole data into a single place. This process is known as data
integration.
3. Reduction: Reducing the dimensionality of data using techniques such as Principal Component Analysis (PCA).
Additionally, numeric values can be reduced for storage efficiency, and compression techniques can be applied to
save space. However, data quality may slightly decrease in the process. Therefore, we need to compress data in a
way that minimizes quality loss while maintaining accuracy.
4. Transformation: Modifying data slightly to fit within a specified range. This process, called normalization, helps in
faster processing and ensures consistency in data representation.
5. Data Discretization: Visualizing or showing data in certain intervals.
cleaning

Data
Integration
Discretizatio

Transformation Reduction
Data augmentation:
Data augmentation is a method used to enhance a dataset’s diversity by applying transformations to existing data
instead of collecting new samples. These transformations, such as rotation, scaling, flipping, or noise addition,
create modified versions while preserving the original labels. This technique helps machine learning models
generalize better, improving their performance and robustness.

This technique is particularly beneficial in image processing tasks

Advantages of Data Augmentation:


 Improves Model Accuracy – Helps machine learning models learn better by providing more varied data.
 Reduces Overfitting – Prevents the model from memorizing the training data and helps it perform well on new
data.
 Saves Time & Cost – No need to collect a lot of new data, as existing data is modified to create more samples.
 Works for Different Data Types – Can be used for images, text, and audio to improve learning.

Disadvantages of Data Augmentation:


 Computational Cost – Requires extra processing power to generate and train on augmented data.
 Not Always Useful – Some types of data may not benefit from augmentation, especially if small changes affect
the meaning.
 Risk of Adding Noise – If not done properly, augmentation can create unrealistic data that confuses the model.
 Slower Training – More data means the model takes longer to train.

How Data Augmentation Works for Images


Data augmentation enhances image datasets by applying transformations to create new training examples.
1. Geometric Transformations – Modifies image shape, including rotation, flipping, scaling, translation, and shearing.
2. Color Adjustments – Alters brightness, contrast, saturation, and hue to change image appearance.
3. Kernel Filters – Applies effects like blurring, sharpening, and edge detection.
4. Random Erasing – Hides parts of an image to help models handle missing data.
5. Combining Techniques – Multiple augmentations are applied together for more diverse training data.

Statistics in ML:
Statistics is the branch of mathematics that deals with collecting, organizing, analyzing, and understanding data. In machine
learning It helps to summarize information, make predictions, and draw conclusions. Statistical methods also measure
uncertainty, allowing researchers to make confident, data-based decisions.

Why Do We Use Statistics in ML?


1. Data Understanding: Helps in summarizing and visualizing data.
2. Feature Selection: Identifies important variables.
3. Model Evaluation: Evaluates performance using statistical metrics.
4. Prediction & Decision Making: Supports making predictions and decisions based on data trends.
Statistical methods are essential for:
 Training Models – Providing data-driven insights for learning algorithms.
 Evaluating Performance – Measuring accuracy, variance, and error rates.
 Feature Selection – Identifying the most relevant data points.
 Probability and Uncertainty – Estimating confidence levels and handling noisy data.
 Hypothesis Testing – Validating assumptions and comparing models.

Key Statistical Concepts in ML


1. Mean (Average)
 Definition: The sum of all values divided by the number of values.
 Formula: Mean=(Σx)/n
 Example:
Data = [2, 4, 6, 8]
Mean = (2+4+6+8)/4 = 5
2. Median
 Definition: The middle value when data is sorted.
 Formula: Median = (xₙ/₂ + xₙ/₂₊₁) / 2
 Example:
Data = [3, 5, 7] → Median = 5
Data = [3, 5, 7, 9] → Median = (5+7)/2 = 6
3. Mode
 Definition: The most frequently occurring value in a dataset.
 Formula: Mode=Value that appears most frequently
 Example:
Data = [1, 2, 2, 3] → Mode = 2
4. Variance
 Definition: Measures how far data points are from the mean.
 Formula: σ 2=n ∑ ( x−μ ) 2 ​/n
 Example:
Data = [2, 4, 4, 4, 5, 5, 7, 9]
Mean = 5, Variance = 4
5. Standard Deviation
 Definition: The square root of the variance. It shows the spread of data.
 Formula: under root of variance.
 Example:
If Variance = 4 → SD = √4 = 2

Types of Statistics
Statistics is divided into two main types:
1. Descriptive Statistics
o Definition: Descriptive statistics help in organizing, summarizing, and presenting data in a meaningful way.
It makes large amounts of data easier to understand using numbers, tables, and graphs.
o Example: If a teacher calculates the average marks of a class from a test, it helps summarize the
performance of all students.
2. Inferential Statistics
o Definition: Inferential statistics allow us to analyze a small sample of data and make predictions or
conclusions about a larger group (population).
o Example: A survey of 100 people is conducted to predict the opinion of an entire city on a new product.
Convex Optimization:
Convex optimization is a mathematical technique used to minimize a cost or loss function by reducing the difference
between actual values and predicted values. In this context, "optimization" refers to the process of finding the best
possible solution, while "convex" means that the function being optimized has a well-defined shape that ensures a single
best solution. This technique is widely used in machine learning and mathematical modeling to improve accuracy and
efficiency.

Probability:
"Machine Learning (ML) is a subset of Artificial Intelligence (AI) that focuses on making predictions or decisions based on data. Since these
predictions often involve uncertainty, probability plays a key role in ML. It helps in modeling uncertainty and making informed guesses about
outcomes, making probability a fundamental concept in machine learning."

🔢 What is Probability? (Simple Definition)


Probability is a way to measure how likely something is to happen.
 It’s always between 0 and 1:
o 0 means impossible

o 1 means certain

o 0.5 means 50-50 chance

Example:
 Tossing a fair coin:
o Probability of heads = 0.5

o Probability of tails = 0.5

🧠 Why Do We Need Probability in Machine Learning?


Machine Learning is all about learning patterns from data and making predictions — but:
 Real-world data is often uncertain and imperfect.
 Models need to guess or estimate outcomes, not just give fixed answers.
✅ So, probability helps in ML:
1. To handle uncertainty – e.g., is this message spam or not?
2. To make better predictions – e.g., what’s the chance it will rain?
3. To learn from data – how likely is this pattern or behavior?
4. To choose between different outcomes – which answer is more probable?

📚 Types of Probability in ML
1. Prior Probability (Before data)
 This is the original belief before seeing any data.
Example:
If you know that 30% of all emails are spam, the prior probability of spam is 0.3.
2. Posterior Probability (After data)
 This is the updated belief after you see some data.
Example:
After reading an email with suspicious words, your belief that it’s spam may rise to 0.9.
3. Conditional Probability
 Probability of something given something else happened.
Formula:
P(A | B) = Probability of A if B is true
Example:
P(spam | "Buy now" in subject line)
= Probability that an email is spam given it contains "Buy now".
4. Joint Probability
 The chance of two things happening together.
Example:
P(x = "Buy now", y = spam)
= Probability that the email has "Buy now" and is spam.
5. Marginal Probability
 The overall probability of one event, regardless of others.
Example:
P("Buy now") = The chance that "Buy now" appears in emails, no matter if it’s spam or not.

🧩 Key Probability Topics Used in ML (Explained Simply)


1. Bayes’ Theorem
Helps update our beliefs after seeing data.
Formula:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)


Real use: Naive Bayes classifier in spam detection.
2. Probability Distributions
Show how probabilities are spread out.
 Bernoulli distribution: for yes/no outcomes (like coin toss).
 Normal (Gaussian) distribution: bell-shaped curve, used for continuous data.
 Multinomial distribution: for categories with more than two outcomes.
3. Random Variables
A variable whose value depends on chance.
Example:
X = result of a dice roll (1 to 6).
4. Expectation (Mean)
Average or expected value of a random variable.
Example:
Expected value of rolling a dice:
E(X) = (1+2+3+4+5+6) / 6 = 3.5
5. Variance
How much the values spread out from the average.
6. Entropy
Measures uncertainty or surprise in data.
Used in Decision Trees to split data.
7. Likelihood & MLE (Maximum Likelihood Estimation)
 Likelihood: how likely is the data given a model?
 MLE: finding the best model that makes the data most likely.

Loss function:
1. Regression Loss Functions
These are used when the output or prediction is a continuous value, such as price, age, or temperature.
🔹 a. Mean Squared Error (MSE)
Definition:
It calculates the average of the squares of the errors between the predicted and actual values. It gives more weight to larger errors.
1 ​
Formula: MSE= ∑ ( yi ​− y i ​)2 i=1 to i=n
n
 yi = actual value
 y^i = predicted value
 n = number of data points
Used In:
 Linear regression
 Neural networks (for regression)
 Any model predicting numeric values

🔹 b. Mean Absolute Error (MAE)


Definition:
It calculates the average of the absolute differences between predicted and actual values. It treats all errors equally.

1 ​
Formula: MSE= ∑∨ yi ​− y i ​∨¿ i=1 to i=n
n
Used In:
 Robust regression models
 When you want to reduce the effect of outliers
 Forecasting models

🔴 2. Classification Loss Functions


These are used when the output is a class or category, such as "spam" or "not spam", or selecting among multiple labels.
🔹 a. Binary Cross-Entropy Loss
Definition:
It measures the performance of a classification model where the output is a probability between 0 and 1. It compares the predicted probability
with the actual class label.
​ ​
Formula: Loss=−[ y ⋅ log ( y )+(1− y )⋅ log (1− y )]
 y = actual label (0 or 1)
 y^ = predicted probability (between 0 and 1)
Used In:
 Logistic regression
 Binary classification tasks (e.g., disease detection, spam detection)

🔹 b. Categorical Cross-Entropy Loss


Definition:
It is used when there are more than two classes. It compares the predicted probability distribution with the actual class label (usually one-hot
encoded).

Formula: Loss=−∑ ​yi ​⋅ log ( y i ​) i=1 to i=C

 C = number of classes
 y = actual class (1 for correct class, 0 otherwise)
 y^ = predicted probability for class iii
Used In:
 Multi-class classification (e.g., digit recognition, sentiment analysis)
 Softmax-based output layers in neural networks

Gradient descent:
Gradient Descent is an algorithm used to find the best solution to a problem by making small adjustments in the right direction. It’s like trying to
find the lowest point in a hilly area by walking down the slope, step by step, until you reach the bottom .It is used for tasks like training neural
networks, fitting regression lines, and minimizing cost functions in models.
For example:
Imagine you're standing on top of a hill, and your goal is to reach the lowest point in the valley below. But you can’t see the whole valley, only
the ground right around you.
Here’s what you do:
 Start at the Top: You begin your journey from the top of the hill. This is like starting with a random guess for solving a problem.
 Feel the Slope: You check which direction the ground goes downhill. This is like finding the steepest direction using math — called the
“gradient.”
 Take a Step Down: You take a small step in the downhill direction. If the slope is steep, you take a bigger step. If it’s flat, you take a smaller
step.
 Repeat: You keep feeling the slope and walking downhill again and again. Slowly, you reach the lowest point in the valley.
Main Idea:
Just like you walk down to the bottom of the hill step by step, Gradient Descent helps a machine learning model improve step by step — by
reducing the error in its predictions.

Advantages of Gradient Descent


 Scalability: Works with large datasets and models.
 Widely used: Backbone of training deep neural networks.
 Adaptability: Easily enhanced with techniques like momentum, Adam, RMSprop.
 Works well with differentiable functions: Can find minima efficiently if the function is smooth.

Disadvantages of Gradient Descent


 Local Minima: May get stuck in local minima (especially in non-convex functions).
 Choice of Learning Rate:
o Too small → slow convergence

o Too large → may overshoot and never converge

 Requires multiple iterations: Can be time-consuming


 Sensitive to feature scaling: Features must be normalized for faster and better convergence.

Unstable Gradient Problem


Also known as the vanishing or exploding gradient problem, this happens primarily in deep neural networks:
Vanishing Gradients
 In very deep networks, gradients become very small as they backpropagate through layers.
 The weights stop updating effectively.
 Learning slows down or stops.
Exploding Gradients
 Gradients grow too large.
 Causes very large weight updates and may lead to numerical instability (NaN errors).
Solutions:
 Use activation functions like ReLU instead of Sigmoid/Tanh.
 Apply gradient clipping to prevent exploding gradients.
 Use batch normalization to stabilize training.
 Use residual connections (like in ResNet) for deeper networks.

You might also like