Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views5 pages

Day 4 - Preprocessing, Model Code

Uploaded by

cpusingpython
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Day 4 - Preprocessing, Model Code

Uploaded by

cpusingpython
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Feb 25, 2025

Class #4:

—-----------------------------------------------------------------------------------------------------------------------------------------​

📌 Outliers: Outliers are data points that are significantly different from the rest of the data.
​ In a Dataset:
Data: [10, 12, 11, 13, 1000]
Here, 1000 is an outlier because it is much larger than the other values.

—-----------------------------------------------------------------------------------------------------------------------------------------​

📌 Normalization
-​ Normalization is a technique used to scale or transform data into a specific range.
-​ It helps in making different features (variables) comparable and improves the performance of
machine learning algorithms.

​ ✒️ Types of Normalization:
1.​ Min-Max Normalization: Scales data to a fixed range, (usually 0 to 1 or -1 to 1)..

2.​ Z-Score Standardization: Transforms data to have a mean of 0 and a standard deviation of 1.

✒️ Why do we need normalization for multiple features? :


-​ Avoids Dominance: Ensures no feature disproportionately influences the model.
-​ Speeds up Convergence: Helps gradient descent reach the minimum faster.
-​ Improves Accuracy: Makes distance-based models more reliable.
-​ Prevents Numerical Instability: Avoids calculation errors due to large values.

✒️ Normalization with single features :


-​ Helps with extreme values, speeding up learning.
-​ Makes hyperparameter tuning easier.

✒️ Effects with Activation Functions :

—-----------------------------------------------------------------------------------------------------------------------------------------

📌 Preprocessing
The process of preparing raw data for analysis by cleaning, transforming, and organizing it to improve
model accuracy and efficiency.

📌 Tokenization, Encoding, and Embedding


✒️ Tokenization: The process of breaking down text into smaller units called tokens, such as words or
subwords.
Example: "Machine learning is fun" → ['Machine', 'learning', 'is', 'fun']
✒️ Encoding: Converting tokens into numerical representations (integers) that models can process.
Example: ['Machine', 'learning', 'is', 'fun'] → [1, 2, 3, 4]

✒️ Embeddings: Embeddings are continuous vector representations of words or categories that


capture their meanings and relationships in a lower-dimensional space.

How it works:
-​ Words or categories are represented as vectors in a multi-dimensional space
-​ Similar words have vectors that are closer together
​ ​
​ ​ Example: Vector('king') - Vector('man') + Vector('woman') ≈ Vector('queen')
​ ​
​ ​ Commonly used Embedding Models: Word2Vec, GloVe, BERT

—-----------------------------------------------------------------------------------------------------------------------------------------

​ ✒️ When we perform Loss Calculation then:


-​ Binary Cross-Entropy works well with targets like 0 and 1, common in classification.
-​ MSE and MAE are sensitive to the scale of the target values. If targets have large ranges,
normalization can improve performance and stability.

—-----------------------------------------------------------------------------------------------------------------------------------------

📌 Convolution
Convolution is an operation that applies a filter (kernel) to an input (like an image) to extract features
such as edges, textures, or patterns.

✒️ Modes of Convolution:
-​ Valid Convolution:
-​ No padding is added, so the output is smaller than the input.
-​ The filter moves horizontally and vertically without going outside the input's borders.

Example: Input size 5x5 convolved with a 3x3 filter results in a 3x3 output.

-​ Same Convolution:
-​ Padding is added to keep the output size the same as the input.
-​ The filter slides horizontally and vertically, including padded borders.

Example: Input size 5x5 with a 3x3 filter and padding of 1 results in a 5x5 output.

-​ Full Convolution:
-​ Padding is added so that every element of the input is visited by the filter, resulting in a
larger output.
-​ The filter starts outside the input boundary, moving horizontally and vertically over fully
padded edges.

Example: Input size 5x5 with a 3x3 filter produces a 7x7 output.

—-----------------------------------------------------------------------------------------------------------------------------------------

📌 Keras Overview
-​ Keras: High-level API for building neural networks, integrated into TensorFlow.

✒️ Common Keras APIs:

✒️ Keras Components:
✒️ A Simple model Lifecycle:

You might also like