0% found this document useful (0 votes)

7 views4 pages

Machine Learning Notes

Uploaded by

durgavihashini.p26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views4 pages

Machine Learning Notes

Uploaded by

durgavihashini.p26

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

MACHINE LEARNING 19/5/25

STATISTICS

TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON

SAMPLE DATA TO PREDICT OP)

DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE

POSITION(percentile 100, quantile equal parts by any values, decile 10, quartile
4), CENTRAL TENDENCY(mean, median,mode), FREQUENCY

INFERENCE -> sample is subset of population. takes sample dataset to predict the
total outcome of all the population.

Data types-> Quantitative , Qualitative

Quantitative ->
Descriptive is countable, Continuous product price

Qualitative -> category

Sampling techniques -> Probablistic and Non-probablistic

Probablistic-> random, statistical, systematic, clustering. Chances of being

selected is same
Non P-> Chances of being selected is not equal. Convenience, snowball, consecutive,
quota

another type of data is -> structured (csv), semi-structured , non structured.

Sampling bias-> if any case of

Skewness & kertosis -> normal, right, left,

mode <- median <- mean

sci-py Scientific python

highly skewed, moderately skewed, normally skewed.

kertosis is how sharp it is? leptokartic, meso, platyokertotic

empirical rule -> if 68% data falls in

21/05/2025

Variance, Standard Deviation(mostly preferred), co-variance(captures direction

only), correlation (relation between two variables, captures both direction and
strength) (x & y correlation is 0 means less correlation. x&y =1 means highly
correlated, x&y is -1 then negatively highly correlation),

mean > median > mode

CENTRAL LIMIT THEOREM (CLT) =>

Sampling theorem
Law of Large Number ->

impute (fill null values), mean, median, mode.

median is not sensitive to outliers. to fill the impute values use median of the
dataset numbers.

Probability Distribution -> Discrete, Continuous. Binomial Bernoulli

uniform distribution implies that all outcomes within a specific range have an
equal probability of occurring, while a normal distribution (also known as the
Gaussian distribution) describes a probability distribution where most data points
cluster around the mean, tapering off symmetrically toward the extremes
Distribution types:
Binomial, Bernoulli
Normal, Uniformed, Continuous

Transformation -> if data is skewed we can't apply ml models on it. (Skewed

transformation -> normal transformation) POWER TRANSFORMATION,

FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min

H/W = Hypothesis testing , Traditional Programming vs Machine learning.

types of ML -> supervised, unsupervised, reinforcement.

Batch learning, online learning.

Problem we face in ML-> missing values, bias, imbalance, choosing the right algo,
getting a labelled data

ML lifecycle. - business understanding, data collection, model selection, training,

evaluation, deployment.

Data drift(tomorrow H/W)

when to use ML and DL -> if limited data points t\are there then ML is preferred
else DL is used.

Feature Engineering -> Better/Accurate Performance.

imputation, encoding, Scaling, Normalization of data, Binning(grping values unto

bins or buckets),

in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),

feat.extraction.

feature.scaling-> Normalizes or standardizes numerical features to a specific range

or distribution.
Standardization: Transforms features to have a mean of 0 and a standard deviation
of 1 AKA z-score.
Normalization: Min-max, Max abs, min normalization, robust

Data -> Numerical ; Category-> nominal, ordinal

ENCODING -> textual data to numerical data ml can't process text. like a matrix,
for n values conider only n-1 cols in the matrix.

scikit-learn, skilearn.

how outliers come is by mistake at the time of collecting the data,

why outliers? -> statistical, biasness,

techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=

q1-1.5*iqr. upperbound = q3+ 1.*iqr any value less than lowebound an dgreater than
upper bound is outliers; Z-scores, sorting, graphing, scatter plot(visual way).

how outliers are treated -> imputation, remove, transformation, capping, binning,

Curse of Dimension -> if features are increasing then it is difficult for ml to

find pattern.

problems when many feature's are there -> overfitting, time,complexity, performance
will decrease(inaccurate predictions)

problem is having many fatures -> dimension reducing techniques (PCA for linear
datas, TSNE for non-linear datas) ;

Feature Selection -> filter VIF, wrapper, embedd (random forest)

23/5/25

LINEAR REGRESSIONtypes: Simple (independent & dependent columns)

, Multiple (independent & dependent columns)
, Polynomial (independent columns)

Assumptions of Linear Regression:

Linearity => if one variable changes the other variable changes in the same amount,
plots can be used
Normality => it follows normal distribution, quantile plots
Independence => ACF , ARIMA((Autoregressive Integrated Moving Average)
Homoscedasticity(same variance) => he variance of error terms (residuals) should be
consistent across all levels of the independent variables, to remove this non-
linearity can use NOVA

to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic

transformations )

How to solve linear regression?

1) closed form solution -> OLS uses mathematical formula (library used are
statsmodel and scikit learn)
2) non-closed solution ->Gradient Descendent uses approximate

(apply statsmodel and multiple polynomial )

Simple Linear Regression Model working: y=mx+c

a) calculate x bar y bar
b) calculate m, c. m is slope c is intercept

Gradient Descent -> start with guess, calculate the error, calculate the
gradient(go through videos), update the value of m1 b, Repeat find min value for
cost price.

Gradient descent has three main types: batch, stochastic, and mini-batch.

Evaluation Metric for Regression : use Performance instead of Accuracy(only for

classification problem).
Mean Absolute Error (MAE),
Mean Squared Error (MSE) gives result in square unit form,
Root Mean Squared Error (RMSE) root of MSE,
R-squared (Coefficient of Determination) R2 Square ,
Mean Absolute Percentage Error (MAPE),
Adjusted-R square 85%

PMA Unit-2 PDF
No ratings yet
PMA Unit-2 PDF
19 pages
Module 2 Data Preprocessing
No ratings yet
Module 2 Data Preprocessing
31 pages
ML Mid 1 Solution
No ratings yet
ML Mid 1 Solution
36 pages
Dass 42 Scales
100% (1)
Dass 42 Scales
4 pages
Statistics Formula Tables
No ratings yet
Statistics Formula Tables
8 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Statistics (Mã đề 217)
No ratings yet
Statistics (Mã đề 217)
5 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
COMM 191 Reviewer
No ratings yet
COMM 191 Reviewer
17 pages
Final ML
No ratings yet
Final ML
2 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
Margin 6794edf99eb1f 6794ede66a47f
No ratings yet
Margin 6794edf99eb1f 6794ede66a47f
2 pages
ML Lecture 6 7 Preprocess
No ratings yet
ML Lecture 6 7 Preprocess
43 pages
Complete Data Science Questions
No ratings yet
Complete Data Science Questions
5 pages
DATA 240 - 23 - Lec3 - FA 2024 - Dist
No ratings yet
DATA 240 - 23 - Lec3 - FA 2024 - Dist
50 pages
Cheatsheet FDA A4 Full
No ratings yet
Cheatsheet FDA A4 Full
2 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Data Cleaning
No ratings yet
Data Cleaning
6 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Resumo Adp
No ratings yet
Resumo Adp
5 pages
SML
No ratings yet
SML
8 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Advanced Surveying Solutions
100% (1)
Advanced Surveying Solutions
17 pages
Data and Metrics
No ratings yet
Data and Metrics
35 pages
ML Notes
No ratings yet
ML Notes
44 pages
Experiment 1-B Evaluation of Analytical Data
No ratings yet
Experiment 1-B Evaluation of Analytical Data
5 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
Features
No ratings yet
Features
42 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Standard Deviation Calculator Guide
No ratings yet
Standard Deviation Calculator Guide
4 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
Notes
No ratings yet
Notes
12 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Machine Learning & Data Types Guide
No ratings yet
Machine Learning & Data Types Guide
22 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
JEE Math: Sequences & Series
100% (1)
JEE Math: Sequences & Series
131 pages
Data Science Distributions & Models
100% (1)
Data Science Distributions & Models
5 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
ASSIGHNMENT No 1
No ratings yet
ASSIGHNMENT No 1
11 pages
Unit 4 Basics of Feature Engineering
100% (1)
Unit 4 Basics of Feature Engineering
33 pages
2021 INDU372 Final
No ratings yet
2021 INDU372 Final
11 pages
Data Mining
No ratings yet
Data Mining
33 pages
Module 6 Sampling Theory
No ratings yet
Module 6 Sampling Theory
16 pages
bk9 8
No ratings yet
bk9 8
42 pages
Department of Statistics Module Code: STA1142: School of Mathematical and Natural Sciences
No ratings yet
Department of Statistics Module Code: STA1142: School of Mathematical and Natural Sciences
19 pages
Unit-11 IGNOU STATISTICS
No ratings yet
Unit-11 IGNOU STATISTICS
23 pages
Digital Remote Sensing Imagery Guide
No ratings yet
Digital Remote Sensing Imagery Guide
8 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
StatsTests04 PDF
No ratings yet
StatsTests04 PDF
32 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
My Notes
No ratings yet
My Notes
15 pages
Excel Functions Lesson Plan TLE 6
No ratings yet
Excel Functions Lesson Plan TLE 6
6 pages
GE - Math Module 4
No ratings yet
GE - Math Module 4
36 pages
Statistical Analysis for Production
No ratings yet
Statistical Analysis for Production
36 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Wasscdpstudy
No ratings yet
Wasscdpstudy
9 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
Lesson 3 Methods of Summarizing Data
No ratings yet
Lesson 3 Methods of Summarizing Data
40 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
Introduction to Business Statistics
No ratings yet
Introduction to Business Statistics
54 pages
Advanced Statistics Revision Guide
No ratings yet
Advanced Statistics Revision Guide
43 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
STATISTICS Midterm
No ratings yet
STATISTICS Midterm
3 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
69 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
Research 1: Quarter 1 - Module 11: Analyzing and Interpreting Data
50% (2)
Research 1: Quarter 1 - Module 11: Analyzing and Interpreting Data
29 pages
Toxicology: Reporters: Evangeline V. Capili Sweet Gilleen P. Borromeo Victoria P. Borja
No ratings yet
Toxicology: Reporters: Evangeline V. Capili Sweet Gilleen P. Borromeo Victoria P. Borja
50 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
Grade 11 Statistics Module: Random Variables
No ratings yet
Grade 11 Statistics Module: Random Variables
12 pages
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
No ratings yet
Difference Between (Median, Mean, Mode, Range, Midrange) (Descriptive Statistics)
11 pages
18bge14a U3
No ratings yet
18bge14a U3
18 pages
Machine Learning - Lec4 - 5
No ratings yet
Machine Learning - Lec4 - 5
41 pages
Lecture Slides - Inferential Statistics
100% (1)
Lecture Slides - Inferential Statistics
42 pages
Statistics: Central Tendency & Data Analysis
No ratings yet
Statistics: Central Tendency & Data Analysis
22 pages
Numerical Methods for CSE Students
No ratings yet
Numerical Methods for CSE Students
10 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
Activity 1 Tapuz Nicole P.
No ratings yet
Activity 1 Tapuz Nicole P.
2 pages

Machine Learning Notes

Uploaded by

Machine Learning Notes

Uploaded by

MACHINE LEARNING 19/5/25

TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON

DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE

Data types-> Quantitative , Qualitative

Qualitative -> category

Sampling techniques -> Probablistic and Non-probablistic

Probablistic-> random, statistical, systematic, clustering. Chances of being

another type of data is -> structured (csv), semi-structured , non structured.

Sampling bias-> if any case of

Skewness & kertosis -> normal, right, left,

mode <- median <- mean

sci-py Scientific python

highly skewed, moderately skewed, normally skewed.

kertosis is how sharp it is? leptokartic, meso, platyokertotic

empirical rule -> if 68% data falls in

Variance, Standard Deviation(mostly preferred), co-variance(captures direction

mean > median > mode

CENTRAL LIMIT THEOREM (CLT) =>

impute (fill null values), mean, median, mode.

Probability Distribution -> Discrete, Continuous. Binomial Bernoulli

Transformation -> if data is skewed we can't apply ml models on it. (Skewed

FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min

H/W = Hypothesis testing , Traditional Programming vs Machine learning.

types of ML -> supervised, unsupervised, reinforcement.

ML lifecycle. - business understanding, data collection, model selection, training,

Data drift(tomorrow H/W)

Feature Engineering -> Better/Accurate Performance.

imputation, encoding, Scaling, Normalization of data, Binning(grping values unto

in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),

feature.scaling-> Normalizes or standardizes numerical features to a specific range

Data -> Numerical ; Category-> nominal, ordinal

how outliers come is by mistake at the time of collecting the data,

techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=

Curse of Dimension -> if features are increasing then it is difficult for ml to

Feature Selection -> filter VIF, wrapper, embedd (random forest)

LINEAR REGRESSIONtypes: Simple (independent & dependent columns)

Assumptions of Linear Regression:

to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic

How to solve linear regression?

(apply statsmodel and multiple polynomial )

Simple Linear Regression Model working: y=mx+c

Evaluation Metric for Regression : use Performance instead of Accuracy(only for

You might also like