0% found this document useful (0 votes)

9 views12 pages

Machine Learning Study Notes - Quick Review Guide

ML TE SPPU study notes

Uploaded by

singhnikhilsingh21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views12 pages

Machine Learning Study Notes - Quick Review Guide

ML TE SPPU study notes

Uploaded by

singhnikhilsingh21

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Machine Learning Study Notes - Quick Review

Guide
Unit I: Introduction to Machine Learning

What is Machine Learning?

Machine Learning (ML) is a subset of AI that enables computers to learn patterns from data and
improve performance automatically without explicit programming.
Key Difference:
Traditional Programming: Rules + Data → Output
Machine Learning: Data + Output → Learns Rules (Model)

Types of Machine Learning

1. Supervised Learning
Definition: Trained with labeled data (input + correct output)
Examples:
Email spam detection
House price prediction
Medical diagnosis

2. Unsupervised Learning
Definition: Only input data provided, no labels
Goal: Find hidden patterns or groups
Examples:
Customer segmentation
Market basket analysis
Anomaly detection
3. Reinforcement Learning
Definition: Agent learns through trial-and-error with rewards/penalties
Examples:
Self-driving cars
Game playing (Chess, Go)
Robot navigation

4. Semi-Supervised Learning
Definition: Uses small amount of labeled data + large amount of unlabeled data
Examples:
Medical image classification
Speech recognition systems

Data Formats in ML
1. NHWC: (Batch Size, Height, Width, Channels) - Images
2. NCHW: (Batch Size, Channels, Height, Width) - Images (PyTorch)
3. NCDHW: (Batch Size, Channels, Depth, Height, Width) - Videos
4. NDHWC: (Batch Size, Depth, Height, Width, Channels) - Videos (TensorFlow)

Unit II: Feature Engineering

Data Preprocessing

Handling Missing Values

1. Deletion Methods:
Row deletion (few missing rows)
Column deletion (too many missing values)
2. Imputation Methods:
Mean/Median/Mode: Replace with statistical measures
Forward/Backward Fill: Use previous/next values
Constant Value: Replace with fixed value (0, "Unknown")
KNN Imputation: Predict using nearest neighbors
Regression Imputation: Use regression models
Normalization and Scaling
Min-Max Scaling (Normalization)
Formula: X_scaled = (X - X_min)/(X_max - X_min)
Range: [1]
Example: For values
Min = 20, Max = 100
Scaled: [0, 0.125, 0.375, 0.75, 1]
Z-Score Normalization (Standardization)
Formula: Z = (X - μ)/σ
Result: Mean = 0, Standard Deviation = 1
Example: For AGE
Mean (μ) = 33
Standard Deviation (σ) = 10.83
Z-scores: [-1.39, -1.02, -0.74, 0.83, -0.46, 0.92, 0.00, 0.18, 2.12, -0.46]

Dimensionality Reduction

Principal Component Analysis (PCA)

Steps:
1. Standardize data (scale features)
2. Compute covariance matrix (feature relationships)
3. Find eigenvalues and eigenvectors (principal components)
4. Sort by eigenvalues (importance ranking)
5. Select top k components (desired variance)
6. Transform data (project onto new space)
Benefits:
Reduces dimensionality
Removes redundancy
Reduces noise
Enables visualization
Kernel PCA
Purpose: Handles non-linear data using kernel trick
Steps:
1. Select kernel function (Linear, Polynomial, RBF)
2. Compute kernel matrix K
3. Center the kernel matrix
4. Eigenvalue decomposition
5. Select principal components
6. Transform data
Advantages:
Handles non-linear datasets
Better for complex data patterns
Flexible kernel choices
Disadvantages:
Computationally expensive
Harder to interpret
Kernel parameter selection crucial

Feature Selection

Filter Methods
Statistical measures without ML algorithms
Examples: Correlation, Chi-square, Information gain
Advantages: Fast, simple, algorithm-independent
Disadvantages: Ignores feature interactions

Wrapper Methods
Use ML algorithm to evaluate feature subsets:
1. Forward Selection:
Start with no features
Add features one by one
Keep features that improve performance
2. Backward Elimination:
Start with all features
Remove least important features
Stop when performance degrades
3. Recursive Feature Elimination (RFE):
Train model on all features
Rank by importance
Remove least important, repeat

Feature Extraction Techniques

Local Binary Pattern (LBP)

Process:
1. Select center pixel
2. Compare with 8 neighbors
3. If neighbor ≥ center → 1, else → 0
4. Form binary pattern (clockwise from top-left)
5. Convert to decimal
Example: For 3x3 patch with center = 35
Neighbors:
Binary: 11000000
Decimal: 192
Applications:
Face recognition
Texture classification
Medical image analysis

Matrix Factorization
Concept: Decompose user-item matrix into smaller factor matrices
User features: Preferences (likes action, comedy)
Item features: Properties (is action, is comedy)
Goal: Predict missing ratings
Content-Based Filtering:
Recommends based on item features
Example: User likes "Inception" (Sci-Fi, Action) → Recommend "Interstellar" (Sci-Fi)
Key Formulas to Remember
1. Min-Max Scaling: (X - X_min)/(X_max - X_min)
2. Z-Score: (X - μ)/σ
3. Standard Deviation: σ = √[Σ(X - μ)²/N]
4. Mean: μ = ΣX/N

Quick Tips for Exam

1. PCA vs Kernel PCA: PCA for linear data, Kernel PCA for non-linear
2. Filter vs Wrapper: Filter uses statistics, Wrapper uses ML algorithms
3. Supervised vs Unsupervised: Labeled vs unlabeled data
4. Missing Values: Choose method based on data size and importance
5. Scaling: Min-Max for range, Z-score for mean=0, std=1 [1]

Common Applications
PCA: Image compression, visualization
LBP: Face recognition, texture analysis
Feature Selection: Model optimization, reducing overfitting
Normalization: Neural networks, KNN, gradient descent algorithms
This comprehensive guide covers all major topics from both units. Focus on understanding
concepts, formulas, and when to apply each technique.
⁂

Machine Learning & Feature Engineering -

Complete Study Notes
UNIT I: Machine Learning Fundamentals

What is Machine Learning?

Definition: ML enables computers to learn patterns from data and improve performance
automatically without explicit programming
Key Difference:
Traditional Programming: Rules + Data → Output
Machine Learning: Data + Output → Learns Rules (Model)
Applications in Data Science
1. Data Analysis & Pattern Recognition (fraud detection)
2. Predictive Modeling (stock prices, sales forecasting)
3. Natural Language Processing (chatbots, sentiment analysis)
4. Image & Video Analysis (medical imaging, self-driving cars)
5. Recommendation Systems (Netflix, Amazon)
6. Clustering & Segmentation (customer groups)
7. Automation (data cleaning, feature selection)

Types of Machine Learning

1. Supervised Learning
Definition: Trained with labeled data (input + correct output)
Examples:
Email spam detection
House price prediction
Medical diagnosis

2. Unsupervised Learning
Definition: Only input data provided, finds hidden patterns
Examples:
Customer segmentation
Market basket analysis
Anomaly detection

3. Reinforcement Learning
Definition: Agent learns through trial-and-error with rewards/penalties
Examples:
Self-driving cars
Game playing (Chess, Go)
Robot control
4. Semi-Supervised Learning
Definition: Uses small labeled data + large unlabeled data
Examples:
Medical image classification
Speech recognition

Data Formats in ML
1. NHWC: (Batch Size, Height, Width, Channels) - TensorFlow default
2. NCHW: (Batch Size, Channels, Height, Width) - PyTorch default
3. NCDHW: (Batch Size, Channels, Depth, Height, Width) - 5D videos
4. NDHWC: (Batch Size, Depth, Height, Width, Channels) - 5D videos

UNIT II: Feature Engineering

Principal Component Analysis (PCA)

What is PCA?
Dimensionality reduction technique
Represents high-dimensional data in fewer dimensions while preserving important
information

Steps in PCA:
1. Standardize the data (scale features)
2. Compute covariance matrix (shows how features vary together)
3. Find eigenvalues and eigenvectors (directions of maximum variance)
4. Sort and select top k principal components
5. Transform data to new coordinate system

Benefits:
Removes redundancy
Reduces noise
Enables visualization (2D/3D)
Reduces computational complexity
Kernel PCA

Purpose:
Handles non-linear data where standard PCA fails
Uses kernel trick to work in higher-dimensional space

Process:
1. Select kernel function (Linear, Polynomial, RBF)
2. Compute kernel matrix K(i,j) = k(xi, xj)
3. Center the kernel matrix
4. Eigenvalue decomposition
5. Select top eigenvectors
6. Transform data

Advantages: Handles non-linear relationships

Disadvantages: Computationally expensive, harder to interpret

Feature Selection Techniques

What is Feature Selection?

Process of selecting most important features
Removes irrelevant/redundant features

Filter Methods (Statistical Measures):

Correlation Coefficient: Removes highly correlated features
Chi-Square Test: Tests independence with target
Information Gain: Measures feature's information about target
Advantages: Fast, simple, algorithm-independent
Disadvantages: Considers features individually

Wrapper Methods:
1. Forward Selection
Start with no features
Add features one by one that improve performance

2. Backward Elimination
Start with all features
Remove least important features iteratively

3. Recursive Feature Elimination (RFE)

Train model, rank features by importance
Remove least important, repeat

Handling Missing Values

Techniques:
1. Deletion: Remove rows/columns with missing values
2. Imputation:
Mean/Median/Mode replacement
Forward/Backward fill (time-series)
Constant value replacement
3. Advanced Methods:
KNN Imputation: Use similar samples
Regression Imputation: Predict missing values

Data Scaling & Normalization

Min-Max Scaling (Normalization)

Formula: Xscaled = (X - Xmin)/(Xmax - Xmin)
Range: [3]
Example: For data : Scaled = [0, 0.125, 0.375, 0.75, 1]

Z-Score Normalization (Standardization)

Formula: Z = (X - μ)/σ
Result: Mean = 0, Standard Deviation = 1
Example: For AGE data, calculate mean and standard deviation, then apply formula
Local Binary Pattern (LBP)

Purpose: Texture description for images

Process:
1. Take center pixel
2. Compare with 8 neighbors
3. Write 1 if neighbor ≥ center, 0 if neighbor < center
4. Convert binary sequence to decimal
Example: Center = 35, neighbors create binary pattern 11000000 → decimal 192

Applications:
Face recognition
Texture classification
Medical image analysis

Matrix Factorization

Concept:
Decomposes User-Item rating matrix into smaller factor matrices
Predicts missing ratings by finding hidden patterns

Content-Based Filtering:
Recommends items based on item features and user preferences
Example: User likes Inception (Sci-Fi, Action) → Recommend Interstellar (Sci-Fi)

Quick Formula Reference

Statistical Measures
Mean (μ): Σx/n
Standard Deviation (σ): √(Σ(x-μ)²/n)
Z-Score: (x-μ)/σ
Min-Max: (x-min)/(max-min)
PCA Key Concepts
Eigenvalues: Amount of variance captured
Eigenvectors: Directions of maximum variance
Principal Components: New axes in transformed space

Important Points for Exam

1. PCA vs Kernel PCA: Standard PCA for linear data, Kernel PCA for non-linear
2. Filter vs Wrapper: Filter uses statistics, Wrapper uses ML algorithms
3. Missing Value Strategies: Choose based on data type and amount missing
4. Scaling Importance: Essential for distance-based algorithms (KNN, SVM)
5. LBP Applications: Primarily texture and pattern recognition
6. Recommendation Types: Content-based vs Collaborative filtering

Common Exam Calculations

Min-Max scaling with given range
Z-score normalization step-by-step
LBP binary pattern conversion
PCA component selection based on variance
This comprehensive guide covers all major topics from both units with practical examples and
formulas essential for your test preparation.
⁂

1. u-2.pdf
2. ML-1.pdf
3. u-2.pdf
4. ML-1.pdf

ML Notes All
No ratings yet
ML Notes All
32 pages
CS ML Unit 2
No ratings yet
CS ML Unit 2
24 pages
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
No ratings yet
Machine Learning: Dr. Jagan. T Professor Department of ECE, GRIET
69 pages
Machine Learning
No ratings yet
Machine Learning
48 pages
FML PT 1
No ratings yet
FML PT 1
25 pages
Machine Learning Engineer Interview Preparation Guide
No ratings yet
Machine Learning Engineer Interview Preparation Guide
14 pages
ML Mod 4 & 6 Pyq
No ratings yet
ML Mod 4 & 6 Pyq
11 pages
Machine Learning (Feature Engineering)
No ratings yet
Machine Learning (Feature Engineering)
10 pages
Azencott BioML
No ratings yet
Azencott BioML
87 pages
ML Notes
No ratings yet
ML Notes
15 pages
ML Unit 2 CLS Notes
No ratings yet
ML Unit 2 CLS Notes
38 pages
Dimensionality Reduction Guide
No ratings yet
Dimensionality Reduction Guide
15 pages
Unit - II MLT
No ratings yet
Unit - II MLT
75 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Is Zc415 (Data Mining BITS-WILP)
No ratings yet
Is Zc415 (Data Mining BITS-WILP)
4 pages
PR Mse2-1
No ratings yet
PR Mse2-1
11 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Unit 1,2,3
No ratings yet
Unit 1,2,3
30 pages
10-2 Data Analysis and Pre-Processing Part 4 PDF
No ratings yet
10-2 Data Analysis and Pre-Processing Part 4 PDF
23 pages
Aiml Model
No ratings yet
Aiml Model
13 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
Machine Learning for Nigerian Languages
No ratings yet
Machine Learning for Nigerian Languages
67 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
VIVA
No ratings yet
VIVA
5 pages
A Short Guide For Feature Engineering and Feature Selection
No ratings yet
A Short Guide For Feature Engineering and Feature Selection
32 pages
Top 10 Machine Learning Algo PDF
No ratings yet
Top 10 Machine Learning Algo PDF
15 pages
Final 1
No ratings yet
Final 1
6 pages
Sent-Machine Learning For Data Science
100% (1)
Sent-Machine Learning For Data Science
463 pages
Data Mining Lab Guide
No ratings yet
Data Mining Lab Guide
58 pages
Comprehensive ML Course Guide
No ratings yet
Comprehensive ML Course Guide
4 pages
Machine Learning Theory Updated
No ratings yet
Machine Learning Theory Updated
8 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
ML Lab
No ratings yet
ML Lab
44 pages
Unit 3
No ratings yet
Unit 3
50 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Unit 1 - Intro - ML
No ratings yet
Unit 1 - Intro - ML
20 pages
Final ML
No ratings yet
Final ML
2 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
DM - MOD - 1 Part III
No ratings yet
DM - MOD - 1 Part III
12 pages
Unit 6aics
No ratings yet
Unit 6aics
25 pages
MACHINE LEARNING Notes
No ratings yet
MACHINE LEARNING Notes
8 pages
7 Data Preprocessing Steps in Machine Learning
No ratings yet
7 Data Preprocessing Steps in Machine Learning
5 pages
Artificial Intelligence
100% (1)
Artificial Intelligence
47 pages
MSDSModule 2
No ratings yet
MSDSModule 2
35 pages
Machine: Learning
No ratings yet
Machine: Learning
24 pages
ML Algorithms Comprehensive Study
No ratings yet
ML Algorithms Comprehensive Study
9 pages
Unit 3
No ratings yet
Unit 3
55 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Module-2 C3-C4
No ratings yet
Module-2 C3-C4
66 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
No ratings yet
Weak AI Generative AI Strong AI:-Machine Learning Tutorial 1.supervised Leaning 2.un Supervised Learning 3.reinforcement Learning
53 pages
Manual Data
No ratings yet
Manual Data
13 pages
ML Lec 4
No ratings yet
ML Lec 4
9 pages
MMC102 - Module 4 - Notes
No ratings yet
MMC102 - Module 4 - Notes
39 pages
ML Revision
No ratings yet
ML Revision
207 pages
T1 Scheme 24 25
No ratings yet
T1 Scheme 24 25
5 pages
Unit 4
No ratings yet
Unit 4
121 pages
Daa U1
No ratings yet
Daa U1
1 page
ML 1
No ratings yet
ML 1
1 page
Secure Hybrid Web-Based Messenger - Protocols, Architecture, and Access Control
No ratings yet
Secure Hybrid Web-Based Messenger - Protocols, Architecture, and Access Control
5 pages
Hadoop Dsbda
No ratings yet
Hadoop Dsbda
20 pages
Collections and Java8
No ratings yet
Collections and Java8
114 pages
DSBDA Mini Project - Ipynb - Colab
No ratings yet
DSBDA Mini Project - Ipynb - Colab
22 pages
Diagnostic Test Variant 2
No ratings yet
Diagnostic Test Variant 2
3 pages
Cognitive Learning Strategies Guide
No ratings yet
Cognitive Learning Strategies Guide
1 page
Integumentary Physical Therapy
No ratings yet
Integumentary Physical Therapy
3 pages
A Quantitative Evaluation of Shame Resilience Theory
No ratings yet
A Quantitative Evaluation of Shame Resilience Theory
2 pages
Consti Concepts Citizanship Suffrage
No ratings yet
Consti Concepts Citizanship Suffrage
7 pages
05 Table Space
No ratings yet
05 Table Space
32 pages
A Story On Aanvikshiki (How Ton Think)
No ratings yet
A Story On Aanvikshiki (How Ton Think)
16 pages
Biomedical Pharmaceutical Sciences With Patient Care Correlations Full Download
0% (1)
Biomedical Pharmaceutical Sciences With Patient Care Correlations Full Download
403 pages
Pediatric Neuropsychology Tool Update
No ratings yet
Pediatric Neuropsychology Tool Update
6 pages
Grade 6 Quiz Bee Guidelines 2023
No ratings yet
Grade 6 Quiz Bee Guidelines 2023
3 pages
ALS A E Reviewer Mock Test LS4 Life Skills
No ratings yet
ALS A E Reviewer Mock Test LS4 Life Skills
31 pages
Exam Results
No ratings yet
Exam Results
2 pages
Job Interview: Listening Practice
No ratings yet
Job Interview: Listening Practice
3 pages
Anne Vosser: Expert Casting Director
No ratings yet
Anne Vosser: Expert Casting Director
3 pages
4530 - CIP Interim Report - Ruchi
No ratings yet
4530 - CIP Interim Report - Ruchi
15 pages
BIOL 1310 Syllabus Fall 2023 Robert Morris University
No ratings yet
BIOL 1310 Syllabus Fall 2023 Robert Morris University
4 pages
XXX Ref E-BOT Brochure
No ratings yet
XXX Ref E-BOT Brochure
8 pages
The Monogamy Gap Men Love and The Reality of Cheating 1st Edition Eric Anderson Full Chapters Included
No ratings yet
The Monogamy Gap Men Love and The Reality of Cheating 1st Edition Eric Anderson Full Chapters Included
129 pages
Zoology
No ratings yet
Zoology
5 pages
Medical Language For Modern Health Care 5th Edition Basco Test Bank Available Instantly
No ratings yet
Medical Language For Modern Health Care 5th Edition Basco Test Bank Available Instantly
329 pages
Intonation System. Tench
No ratings yet
Intonation System. Tench
11 pages
Unit 12
No ratings yet
Unit 12
44 pages
Adobe Solution Partner Program Datasheet
No ratings yet
Adobe Solution Partner Program Datasheet
2 pages
Challenges in VTU Ph.D. Coursework
100% (2)
Challenges in VTU Ph.D. Coursework
8 pages
Common and Proper Nouns Lesson Plan
80% (5)
Common and Proper Nouns Lesson Plan
4 pages
Catherine Hoblin-5
No ratings yet
Catherine Hoblin-5
1 page
Practicum Portfolio: Ana M. Alberte Teacher I
No ratings yet
Practicum Portfolio: Ana M. Alberte Teacher I
21 pages
NTS NAT Paper Pattern and Questions Distribution
No ratings yet
NTS NAT Paper Pattern and Questions Distribution
11 pages
13-14 Midterm-Final Exam Semester 1 Bell Schedule Memo
No ratings yet
13-14 Midterm-Final Exam Semester 1 Bell Schedule Memo
6 pages
Officer Tryout Leadership Questions
No ratings yet
Officer Tryout Leadership Questions
5 pages

Machine Learning Study Notes - Quick Review Guide

Uploaded by

Machine Learning Study Notes - Quick Review Guide

Uploaded by

Machine Learning Study Notes - Quick Review

What is Machine Learning?

Types of Machine Learning

Unit II: Feature Engineering

Handling Missing Values

Principal Component Analysis (PCA)

Feature Extraction Techniques

Local Binary Pattern (LBP)

Quick Tips for Exam

Machine Learning & Feature Engineering -

What is Machine Learning?

Types of Machine Learning

UNIT II: Feature Engineering

Principal Component Analysis (PCA)

Advantages: Handles non-linear relationships

Disadvantages: Computationally expensive, harder to interpret

Feature Selection Techniques

What is Feature Selection?

Filter Methods (Statistical Measures):

3. Recursive Feature Elimination (RFE)

Handling Missing Values

Data Scaling & Normalization

Min-Max Scaling (Normalization)

Z-Score Normalization (Standardization)

Purpose: Texture description for images

Quick Formula Reference

Important Points for Exam

Common Exam Calculations

You might also like