Machine Learning Study Notes - Quick Review
Guide
Unit I: Introduction to Machine Learning
What is Machine Learning?
Machine Learning (ML) is a subset of AI that enables computers to learn patterns from data and
improve performance automatically without explicit programming.
Key Difference:
Traditional Programming: Rules + Data → Output
Machine Learning: Data + Output → Learns Rules (Model)
Types of Machine Learning
1. Supervised Learning
Definition: Trained with labeled data (input + correct output)
Examples:
Email spam detection
House price prediction
Medical diagnosis
2. Unsupervised Learning
Definition: Only input data provided, no labels
Goal: Find hidden patterns or groups
Examples:
Customer segmentation
Market basket analysis
Anomaly detection
3. Reinforcement Learning
Definition: Agent learns through trial-and-error with rewards/penalties
Examples:
Self-driving cars
Game playing (Chess, Go)
Robot navigation
4. Semi-Supervised Learning
Definition: Uses small amount of labeled data + large amount of unlabeled data
Examples:
Medical image classification
Speech recognition systems
Data Formats in ML
1. NHWC: (Batch Size, Height, Width, Channels) - Images
2. NCHW: (Batch Size, Channels, Height, Width) - Images (PyTorch)
3. NCDHW: (Batch Size, Channels, Depth, Height, Width) - Videos
4. NDHWC: (Batch Size, Depth, Height, Width, Channels) - Videos (TensorFlow)
Unit II: Feature Engineering
Data Preprocessing
Handling Missing Values
1. Deletion Methods:
Row deletion (few missing rows)
Column deletion (too many missing values)
2. Imputation Methods:
Mean/Median/Mode: Replace with statistical measures
Forward/Backward Fill: Use previous/next values
Constant Value: Replace with fixed value (0, "Unknown")
KNN Imputation: Predict using nearest neighbors
Regression Imputation: Use regression models
Normalization and Scaling
Min-Max Scaling (Normalization)
Formula: X_scaled = (X - X_min)/(X_max - X_min)
Range: [1]
Example: For values
Min = 20, Max = 100
Scaled: [0, 0.125, 0.375, 0.75, 1]
Z-Score Normalization (Standardization)
Formula: Z = (X - μ)/σ
Result: Mean = 0, Standard Deviation = 1
Example: For AGE
Mean (μ) = 33
Standard Deviation (σ) = 10.83
Z-scores: [-1.39, -1.02, -0.74, 0.83, -0.46, 0.92, 0.00, 0.18, 2.12, -0.46]
Dimensionality Reduction
Principal Component Analysis (PCA)
Steps:
1. Standardize data (scale features)
2. Compute covariance matrix (feature relationships)
3. Find eigenvalues and eigenvectors (principal components)
4. Sort by eigenvalues (importance ranking)
5. Select top k components (desired variance)
6. Transform data (project onto new space)
Benefits:
Reduces dimensionality
Removes redundancy
Reduces noise
Enables visualization
Kernel PCA
Purpose: Handles non-linear data using kernel trick
Steps:
1. Select kernel function (Linear, Polynomial, RBF)
2. Compute kernel matrix K
3. Center the kernel matrix
4. Eigenvalue decomposition
5. Select principal components
6. Transform data
Advantages:
Handles non-linear datasets
Better for complex data patterns
Flexible kernel choices
Disadvantages:
Computationally expensive
Harder to interpret
Kernel parameter selection crucial
Feature Selection
Filter Methods
Statistical measures without ML algorithms
Examples: Correlation, Chi-square, Information gain
Advantages: Fast, simple, algorithm-independent
Disadvantages: Ignores feature interactions
Wrapper Methods
Use ML algorithm to evaluate feature subsets:
1. Forward Selection:
Start with no features
Add features one by one
Keep features that improve performance
2. Backward Elimination:
Start with all features
Remove least important features
Stop when performance degrades
3. Recursive Feature Elimination (RFE):
Train model on all features
Rank by importance
Remove least important, repeat
Feature Extraction Techniques
Local Binary Pattern (LBP)
Process:
1. Select center pixel
2. Compare with 8 neighbors
3. If neighbor ≥ center → 1, else → 0
4. Form binary pattern (clockwise from top-left)
5. Convert to decimal
Example: For 3x3 patch with center = 35
Neighbors:
Binary: 11000000
Decimal: 192
Applications:
Face recognition
Texture classification
Medical image analysis
Matrix Factorization
Concept: Decompose user-item matrix into smaller factor matrices
User features: Preferences (likes action, comedy)
Item features: Properties (is action, is comedy)
Goal: Predict missing ratings
Content-Based Filtering:
Recommends based on item features
Example: User likes "Inception" (Sci-Fi, Action) → Recommend "Interstellar" (Sci-Fi)
Key Formulas to Remember
1. Min-Max Scaling: (X - X_min)/(X_max - X_min)
2. Z-Score: (X - μ)/σ
3. Standard Deviation: σ = √[Σ(X - μ)²/N]
4. Mean: μ = ΣX/N
Quick Tips for Exam
1. PCA vs Kernel PCA: PCA for linear data, Kernel PCA for non-linear
2. Filter vs Wrapper: Filter uses statistics, Wrapper uses ML algorithms
3. Supervised vs Unsupervised: Labeled vs unlabeled data
4. Missing Values: Choose method based on data size and importance
5. Scaling: Min-Max for range, Z-score for mean=0, std=1 [1]
Common Applications
PCA: Image compression, visualization
LBP: Face recognition, texture analysis
Feature Selection: Model optimization, reducing overfitting
Normalization: Neural networks, KNN, gradient descent algorithms
This comprehensive guide covers all major topics from both units. Focus on understanding
concepts, formulas, and when to apply each technique.
⁂
Machine Learning & Feature Engineering -
Complete Study Notes
UNIT I: Machine Learning Fundamentals
What is Machine Learning?
Definition: ML enables computers to learn patterns from data and improve performance
automatically without explicit programming
Key Difference:
Traditional Programming: Rules + Data → Output
Machine Learning: Data + Output → Learns Rules (Model)
Applications in Data Science
1. Data Analysis & Pattern Recognition (fraud detection)
2. Predictive Modeling (stock prices, sales forecasting)
3. Natural Language Processing (chatbots, sentiment analysis)
4. Image & Video Analysis (medical imaging, self-driving cars)
5. Recommendation Systems (Netflix, Amazon)
6. Clustering & Segmentation (customer groups)
7. Automation (data cleaning, feature selection)
Types of Machine Learning
1. Supervised Learning
Definition: Trained with labeled data (input + correct output)
Examples:
Email spam detection
House price prediction
Medical diagnosis
2. Unsupervised Learning
Definition: Only input data provided, finds hidden patterns
Examples:
Customer segmentation
Market basket analysis
Anomaly detection
3. Reinforcement Learning
Definition: Agent learns through trial-and-error with rewards/penalties
Examples:
Self-driving cars
Game playing (Chess, Go)
Robot control
4. Semi-Supervised Learning
Definition: Uses small labeled data + large unlabeled data
Examples:
Medical image classification
Speech recognition
Data Formats in ML
1. NHWC: (Batch Size, Height, Width, Channels) - TensorFlow default
2. NCHW: (Batch Size, Channels, Height, Width) - PyTorch default
3. NCDHW: (Batch Size, Channels, Depth, Height, Width) - 5D videos
4. NDHWC: (Batch Size, Depth, Height, Width, Channels) - 5D videos
UNIT II: Feature Engineering
Principal Component Analysis (PCA)
What is PCA?
Dimensionality reduction technique
Represents high-dimensional data in fewer dimensions while preserving important
information
Steps in PCA:
1. Standardize the data (scale features)
2. Compute covariance matrix (shows how features vary together)
3. Find eigenvalues and eigenvectors (directions of maximum variance)
4. Sort and select top k principal components
5. Transform data to new coordinate system
Benefits:
Removes redundancy
Reduces noise
Enables visualization (2D/3D)
Reduces computational complexity
Kernel PCA
Purpose:
Handles non-linear data where standard PCA fails
Uses kernel trick to work in higher-dimensional space
Process:
1. Select kernel function (Linear, Polynomial, RBF)
2. Compute kernel matrix K(i,j) = k(xi, xj)
3. Center the kernel matrix
4. Eigenvalue decomposition
5. Select top eigenvectors
6. Transform data
Advantages: Handles non-linear relationships
Disadvantages: Computationally expensive, harder to interpret
Feature Selection Techniques
What is Feature Selection?
Process of selecting most important features
Removes irrelevant/redundant features
Filter Methods (Statistical Measures):
Correlation Coefficient: Removes highly correlated features
Chi-Square Test: Tests independence with target
Information Gain: Measures feature's information about target
Advantages: Fast, simple, algorithm-independent
Disadvantages: Considers features individually
Wrapper Methods:
1. Forward Selection
Start with no features
Add features one by one that improve performance
2. Backward Elimination
Start with all features
Remove least important features iteratively
3. Recursive Feature Elimination (RFE)
Train model, rank features by importance
Remove least important, repeat
Handling Missing Values
Techniques:
1. Deletion: Remove rows/columns with missing values
2. Imputation:
Mean/Median/Mode replacement
Forward/Backward fill (time-series)
Constant value replacement
3. Advanced Methods:
KNN Imputation: Use similar samples
Regression Imputation: Predict missing values
Data Scaling & Normalization
Min-Max Scaling (Normalization)
Formula: Xscaled = (X - Xmin)/(Xmax - Xmin)
Range: [3]
Example: For data : Scaled = [0, 0.125, 0.375, 0.75, 1]
Z-Score Normalization (Standardization)
Formula: Z = (X - μ)/σ
Result: Mean = 0, Standard Deviation = 1
Example: For AGE data, calculate mean and standard deviation, then apply formula
Local Binary Pattern (LBP)
Purpose: Texture description for images
Process:
1. Take center pixel
2. Compare with 8 neighbors
3. Write 1 if neighbor ≥ center, 0 if neighbor < center
4. Convert binary sequence to decimal
Example: Center = 35, neighbors create binary pattern 11000000 → decimal 192
Applications:
Face recognition
Texture classification
Medical image analysis
Matrix Factorization
Concept:
Decomposes User-Item rating matrix into smaller factor matrices
Predicts missing ratings by finding hidden patterns
Content-Based Filtering:
Recommends items based on item features and user preferences
Example: User likes Inception (Sci-Fi, Action) → Recommend Interstellar (Sci-Fi)
Quick Formula Reference
Statistical Measures
Mean (μ): Σx/n
Standard Deviation (σ): √(Σ(x-μ)²/n)
Z-Score: (x-μ)/σ
Min-Max: (x-min)/(max-min)
PCA Key Concepts
Eigenvalues: Amount of variance captured
Eigenvectors: Directions of maximum variance
Principal Components: New axes in transformed space
Important Points for Exam
1. PCA vs Kernel PCA: Standard PCA for linear data, Kernel PCA for non-linear
2. Filter vs Wrapper: Filter uses statistics, Wrapper uses ML algorithms
3. Missing Value Strategies: Choose based on data type and amount missing
4. Scaling Importance: Essential for distance-based algorithms (KNN, SVM)
5. LBP Applications: Primarily texture and pattern recognition
6. Recommendation Types: Content-based vs Collaborative filtering
Common Exam Calculations
Min-Max scaling with given range
Z-score normalization step-by-step
LBP binary pattern conversion
PCA component selection based on variance
This comprehensive guide covers all major topics from both units with practical examples and
formulas essential for your test preparation.
⁂
1. u-2.pdf
2. ML-1.pdf
3. u-2.pdf
4. ML-1.pdf