MACHINE LEARNING 19/5/25
STATISTICS
TYPES - > DESCRIPTIVE(DESCRIBES THE DATA) & INFERENTIAL(APPLIES SOME OPERATION ON
SAMPLE DATA TO PREDICT OP)
DESCRIPTIVE 4 WAYS -> MEASURE DISPERSION(STANDARD DEVIATION, VARIAnce), MEASURE
POSITION(percentile 100, quantile equal parts by any values, decile 10, quartile
4), CENTRAL TENDENCY(mean, median,mode), FREQUENCY
INFERENCE -> sample is subset of population. takes sample dataset to predict the
total outcome of all the population.
Data types-> Quantitative , Qualitative
Quantitative ->
Descriptive is countable, Continuous product price
Qualitative -> category
Sampling techniques -> Probablistic and Non-probablistic
Probablistic-> random, statistical, systematic, clustering. Chances of being
selected is same
Non P-> Chances of being selected is not equal. Convenience, snowball, consecutive,
quota
another type of data is -> structured (csv), semi-structured , non structured.
Sampling bias-> if any case of
Skewness & kertosis -> normal, right, left,
mode <- median <- mean
sci-py Scientific python
highly skewed, moderately skewed, normally skewed.
kertosis is how sharp it is? leptokartic, meso, platyokertotic
empirical rule -> if 68% data falls in
21/05/2025
Variance, Standard Deviation(mostly preferred), co-variance(captures direction
only), correlation (relation between two variables, captures both direction and
strength) (x & y correlation is 0 means less correlation. x&y =1 means highly
correlated, x&y is -1 then negatively highly correlation),
mean > median > mode
CENTRAL LIMIT THEOREM (CLT) =>
Sampling theorem
Law of Large Number ->
impute (fill null values), mean, median, mode.
median is not sensitive to outliers. to fill the impute values use median of the
dataset numbers.
Probability Distribution -> Discrete, Continuous. Binomial Bernoulli
uniform distribution implies that all outcomes within a specific range have an
equal probability of occurring, while a normal distribution (also known as the
Gaussian distribution) describes a probability distribution where most data points
cluster around the mean, tapering off symmetrically toward the extremes
Distribution types:
Binomial, Bernoulli
Normal, Uniformed, Continuous
Transformation -> if data is skewed we can't apply ml models on it. (Skewed
transformation -> normal transformation) POWER TRANSFORMATION,
FIVE NUMBER SUMMARY -> max, Q3, Median, Q1, Min
H/W = Hypothesis testing , Traditional Programming vs Machine learning.
types of ML -> supervised, unsupervised, reinforcement.
Batch learning, online learning.
Problem we face in ML-> missing values, bias, imbalance, choosing the right algo,
getting a labelled data
ML lifecycle. - business understanding, data collection, model selection, training,
evaluation, deployment.
Data drift(tomorrow H/W)
when to use ML and DL -> if limited data points t\are there then ML is preferred
else DL is used.
Feature Engineering -> Better/Accurate Performance.
imputation, encoding, Scaling, Normalization of data, Binning(grping values unto
bins or buckets),
in feature eng -> feat.constructuction, feat.transformation, feat.selection(imp),
feat.extraction.
feature.scaling-> Normalizes or standardizes numerical features to a specific range
or distribution.
Standardization: Transforms features to have a mean of 0 and a standard deviation
of 1 AKA z-score.
Normalization: Min-max, Max abs, min normalization, robust
Data -> Numerical ; Category-> nominal, ordinal
ENCODING -> textual data to numerical data ml can't process text. like a matrix,
for n values conider only n-1 cols in the matrix.
scikit-learn, skilearn.
how outliers come is by mistake at the time of collecting the data,
why outliers? -> statistical, biasness,
techniques to detect outliers -> IQR (interquartile range) iqr=q3-q1 lowerbound=
q1-1.5*iqr. upperbound = q3+ 1.*iqr any value less than lowebound an dgreater than
upper bound is outliers; Z-scores, sorting, graphing, scatter plot(visual way).
how outliers are treated -> imputation, remove, transformation, capping, binning,
Curse of Dimension -> if features are increasing then it is difficult for ml to
find pattern.
problems when many feature's are there -> overfitting, time,complexity, performance
will decrease(inaccurate predictions)
problem is having many fatures -> dimension reducing techniques (PCA for linear
datas, TSNE for non-linear datas) ;
Feature Selection -> filter VIF, wrapper, embedd (random forest)
23/5/25
LINEAR REGRESSIONtypes: Simple (independent & dependent columns)
, Multiple (independent & dependent columns)
, Polynomial (independent columns)
Assumptions of Linear Regression:
Linearity => if one variable changes the other variable changes in the same amount,
plots can be used
Normality => it follows normal distribution, quantile plots
Independence => ACF , ARIMA((Autoregressive Integrated Moving Average)
Homoscedasticity(same variance) => he variance of error terms (residuals) should be
consistent across all levels of the independent variables, to remove this non-
linearity can use NOVA
to remove non-linearity -> Transformation( Power, Mathematical ->Logarithmic
transformations )
How to solve linear regression?
1) closed form solution -> OLS uses mathematical formula (library used are
statsmodel and scikit learn)
2) non-closed solution ->Gradient Descendent uses approximate
(apply statsmodel and multiple polynomial )
Simple Linear Regression Model working: y=mx+c
a) calculate x bar y bar
b) calculate m, c. m is slope c is intercept
Gradient Descent -> start with guess, calculate the error, calculate the
gradient(go through videos), update the value of m1 b, Repeat find min value for
cost price.
Gradient descent has three main types: batch, stochastic, and mini-batch.
Evaluation Metric for Regression : use Performance instead of Accuracy(only for
classification problem).
Mean Absolute Error (MAE),
Mean Squared Error (MSE) gives result in square unit form,
Root Mean Squared Error (RMSE) root of MSE,
R-squared (Coefficient of Determination) R2 Square ,
Mean Absolute Percentage Error (MAPE),
Adjusted-R square 85%