0% found this document useful (0 votes)

19 views6 pages

Practical 6

Practical

Uploaded by

anuja.jadhav.ai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

Practical 6

Practical

Uploaded by

anuja.jadhav.ai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Practical 6 - Scaling the data set using Standardization.

Standardization:
Standardization, also known as Z-score normalization, is a data preprocessing
technique that transforms the distribution of a feature (or set of features) such
that the mean of the observed values is 0 and the standard deviation is 1. This
technique is essential when machine learning algorithms are sensitive to the
scale of input features. For instance, algorithms like Support Vector Machines
(SVM), k-Nearest Neighbors (k-NN), and Neural Networks are distance-based
and require all features to be on the same scale.

Formula for Standardization:

After standardization:
- The mean becomes 0.
- The standard deviation becomes 1.

Why Standardization is Important:

1. Equal Feature Contribution: In many machine learning models, features with
larger ranges may dominate the model, making it more challenging for the
algorithm to learn from other features. Standardization helps ensure that each
feature contributes equally.
2. Faster Convergence: In optimization algorithms like gradient descent,
standardizing data can speed up convergence since all features will be on a
similar scale.
3. Better Performance: Many algorithms work more effectively when features
are normally distributed (mean = 0, std = 1).

When to Use Standardization:

- Distance-Based Algorithms: Algorithms like SVM, k-NN, and k-means
clustering are distance-based and work better when all features are on the same
scale.
- Gradient-Based Algorithms: Neural networks and linear/logistic regression use
gradient descent. Standardization helps these algorithms converge faster.
- Features with Different Units: If your features are measured in different units
(e.g., height in centimeters and weight in kilograms), standardization helps
bring them to a common scale.

Example
Let’s consider a dataset where we have two features: age and salary. Age
typically ranges from 20 to 60, while salary can range from $30,000 to
$120,000. Due to the large difference in scales, salary will dominate any
distance calculations. Standardizing both features ensures that age and salary
contribute equally to the model.

Practical Example in Python:

Below is a step-by-step guide to standardizing a dataset using Python and the
`StandardScaler` class from the `scikit-learn` library.

Step 1: Import Required Libraries

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
Step 2: Create a Dataset
Let’s create a small dataset with two features, `age` and `salary`, to demonstrate
the effect of standardization.
# Create a sample dataset
data = {'Age': [22, 25, 47, 52, 46],
'Salary': [20000, 32000, 45000, 58000, 79000]}
df = pd.DataFrame(data)

Step 3: Apply Standardization

Use `StandardScaler` to standardize the dataset. This will transform each feature
so that the mean is 0 and the standard deviation is 1.
# Initialize the StandardScaler
scaler = StandardScaler()

# Fit the scaler to the data and transform it

standardized_data = scaler.fit_transform(df)

# Convert the standardized data back to a DataFrame

standardized_df = pd.DataFrame(standardized_data, columns=['Age', 'Salary'])
print("\nStandardized Dataset:\n", standardized_df)

Step 4: View Results

In this step, you'll observe that the transformed data has a mean of 0 and a
standard deviation of 1.
# Check mean and std deviation after standardization
mean = standardized_df.mean()
std_dev = standardized_df.std()

print("\nMean after Standardization:\n", mean)

print("\nStandard Deviation after Standardization:\n", std_dev)
Full Code Example:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Step 1: Create a sample dataset

data = {'Age': [22, 25, 47, 52, 46],
'Salary': [20000, 32000, 45000, 58000, 79000]}

df = pd.DataFrame(data)
print("Original Dataset:\n", df)

# Step 2: Initialize the StandardScaler

scaler = StandardScaler()

# Step 3: Fit the scaler to the data and transform it

standardized_data = scaler.fit_transform(df)

# Step 4: Convert the standardized data back to a DataFrame

standardized_df = pd.DataFrame(standardized_data, columns=['Age', 'Salary'])
print("\nStandardized Dataset:\n", standardized_df)

# Step 5: Check mean and std deviation after standardization

mean = standardized_df.mean()
std_dev = standardized_df.std()

print("\nMean after Standardization:\n", mean)

print("\nStandard Deviation after Standardization:\n", std_dev)
Output:

1. Original Dataset:
Age Salary
0 22 20000
1 25 32000
2 47 45000
3 52 58000
4 46 79000

2. Standardized Dataset:
Age Salary
0 -1.566699 -1.569105
1 -1.350157 -1.036600
2 0.261458 -0.318282
3 0.696526 0.400036
4 1.958872 2.524951

3. Mean and Standard Deviation after Standardization:

Mean after Standardization:
Age 3.330669e-16
Salary 0.000000e+00
dtype: float64

Standard Deviation after Standardization:

Age 1.118034
Salary 1.118034
dtype: float64
Explanation of Results:
- The transformed dataset now has values with a mean of 0 and standard
deviation of 1.
- Both `Age` and `Salary` are now standardized, ensuring that both features
contribute equally to any machine learning algorithm applied on this data.

Conclusion:
Standardization is a crucial preprocessing step, especially when dealing with
features of different scales. By transforming features to have a mean of 0 and a
standard deviation of 1, it ensures that algorithms sensitive to the magnitude of
input values perform better and converge faster.

Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF
100% (1)
Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF
24 pages
Concrete Mix Design Guide
No ratings yet
Concrete Mix Design Guide
10 pages
The Worldwide Offshore Accident Databank (WOAD)
No ratings yet
The Worldwide Offshore Accident Databank (WOAD)
5 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Standardization Campusx
No ratings yet
Standardization Campusx
4 pages
ML - Week 04
No ratings yet
ML - Week 04
33 pages
Standar Ization
No ratings yet
Standar Ization
7 pages
Data Normalization Machine Learning
No ratings yet
Data Normalization Machine Learning
5 pages
Data Preprocessing: Normalize vs. Standardize
No ratings yet
Data Preprocessing: Normalize vs. Standardize
10 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
ML Unit 2
No ratings yet
ML Unit 2
90 pages
Data Preprocessing PT 2
No ratings yet
Data Preprocessing PT 2
7 pages
04 - Data Normalization in Python - en
No ratings yet
04 - Data Normalization in Python - en
1 page
Feature Engineering for BE Students
No ratings yet
Feature Engineering for BE Students
91 pages
TOPIC 3 Pima Indian
No ratings yet
TOPIC 3 Pima Indian
16 pages
Lec 7
No ratings yet
Lec 7
9 pages
Data Preparation
No ratings yet
Data Preparation
11 pages
Machine Learning Feature Scaling
No ratings yet
Machine Learning Feature Scaling
26 pages
Normalization Vs Standardization
No ratings yet
Normalization Vs Standardization
2 pages
Standardization & Normalization In: ML With Python Example
No ratings yet
Standardization & Normalization In: ML With Python Example
8 pages
21BDS0357 VL2024250504577 Ast02
No ratings yet
21BDS0357 VL2024250504577 Ast02
5 pages
Unit 3-2
No ratings yet
Unit 3-2
15 pages
Standardisation Vs Normalisation
No ratings yet
Standardisation Vs Normalisation
6 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Data Standardization & Normalization
No ratings yet
Data Standardization & Normalization
3 pages
Standardization Vs Normalization in Pattern Recognition
No ratings yet
Standardization Vs Normalization in Pattern Recognition
1 page
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
ML Normalization Techniques - Overview & Practical Guide
No ratings yet
ML Normalization Techniques - Overview & Practical Guide
5 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
Data Preprocessing
No ratings yet
Data Preprocessing
11 pages
Lecture 2.3 Data Normalization
No ratings yet
Lecture 2.3 Data Normalization
7 pages
Week 10
No ratings yet
Week 10
50 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
5 Preprocessing
No ratings yet
5 Preprocessing
44 pages
Feature Scaling Notes
No ratings yet
Feature Scaling Notes
4 pages
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
No ratings yet
Presentation #1 Data Mining Minahel Khan BSIT (E) 22!11!1
7 pages
Data Preprocessing
No ratings yet
Data Preprocessing
49 pages
Feature Scaling in Machine Learning
No ratings yet
Feature Scaling in Machine Learning
4 pages
4.4. Data Standardization - Ipynb - Colaboratory
No ratings yet
4.4. Data Standardization - Ipynb - Colaboratory
1 page
Normalization: Normalization Techniques at A Glance
No ratings yet
Normalization: Normalization Techniques at A Glance
5 pages
Data Scaling
No ratings yet
Data Scaling
5 pages
Summary Chap 1 & 2
No ratings yet
Summary Chap 1 & 2
5 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Step 06 - Data Preprocessing
No ratings yet
Step 06 - Data Preprocessing
10 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Conversation Normalization
No ratings yet
Conversation Normalization
2 pages
Seven Lab Instruction
No ratings yet
Seven Lab Instruction
38 pages
Data Processing
No ratings yet
Data Processing
19 pages
3 1 Chapter 3 Normalization
No ratings yet
3 1 Chapter 3 Normalization
22 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Mini 4
No ratings yet
Mini 4
9 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Résumé-Analyse Des Données Resumee Resumee
No ratings yet
Résumé-Analyse Des Données Resumee Resumee
4 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
Data Normalization
No ratings yet
Data Normalization
7 pages
Feature Scaling Techniques: Machine Learning
No ratings yet
Feature Scaling Techniques: Machine Learning
27 pages
1.lab 1 Manual
No ratings yet
1.lab 1 Manual
20 pages
Data Normalization in Data Mining
No ratings yet
Data Normalization in Data Mining
8 pages
PMA Unit-2 PDF
No ratings yet
PMA Unit-2 PDF
19 pages
Rf-Controlled Beach Cleaner Vehicle
No ratings yet
Rf-Controlled Beach Cleaner Vehicle
13 pages
ITO-Catalog General 2016
No ratings yet
ITO-Catalog General 2016
22 pages
Edpm 2
No ratings yet
Edpm 2
17 pages
Micro-Gasification Cooking With Gas From Biomass
100% (3)
Micro-Gasification Cooking With Gas From Biomass
100 pages
Syntax-Directed Translation & Runtime Concepts
No ratings yet
Syntax-Directed Translation & Runtime Concepts
7 pages
HP Sampling Catalog
No ratings yet
HP Sampling Catalog
21 pages
Siemens Washing Machine Guide
No ratings yet
Siemens Washing Machine Guide
8 pages
MCS-Magnum Version 17 Manual
100% (2)
MCS-Magnum Version 17 Manual
244 pages
1000 KW Mitsubishi Diesel Generator Set - Non EPA - TP-M1000-T1-60 PDF
No ratings yet
1000 KW Mitsubishi Diesel Generator Set - Non EPA - TP-M1000-T1-60 PDF
5 pages
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
No ratings yet
Numpy and Matplotlib: Purushothaman.V.N March 10, 2011
27 pages
Holistic Exam-2006-1 PDF
No ratings yet
Holistic Exam-2006-1 PDF
9 pages
Title: Automatic Load Sharing of Transformer: A Synopsis Report On
100% (1)
Title: Automatic Load Sharing of Transformer: A Synopsis Report On
5 pages
RFT - Specifications (En)
No ratings yet
RFT - Specifications (En)
4 pages
Simrit Seal Profile
No ratings yet
Simrit Seal Profile
5 pages
Pod Handler
No ratings yet
Pod Handler
68 pages
Linking Competitive Strategies With Human Resource Practices
No ratings yet
Linking Competitive Strategies With Human Resource Practices
14 pages
Memory Decoding for Engineers
No ratings yet
Memory Decoding for Engineers
19 pages
DALIMA - Brochure 2017
No ratings yet
DALIMA - Brochure 2017
23 pages
Autocalve de 24 LITROS
No ratings yet
Autocalve de 24 LITROS
5 pages
Sizing Medical Gas Piping
100% (2)
Sizing Medical Gas Piping
6 pages
RS9 GPL 4 HN JNJ O1 Dy
No ratings yet
RS9 GPL 4 HN JNJ O1 Dy
14 pages
General Instruction To Candidates 2025 1
No ratings yet
General Instruction To Candidates 2025 1
9 pages
Tiffany & Co. Social Media Strategy
100% (1)
Tiffany & Co. Social Media Strategy
22 pages
Visi User Button: Document Date: Document Revision
No ratings yet
Visi User Button: Document Date: Document Revision
25 pages
Public Domain Book Access Guide
No ratings yet
Public Domain Book Access Guide
551 pages
Circuit-Breaker Selection Guide
No ratings yet
Circuit-Breaker Selection Guide
114 pages
Wroute
100% (1)
Wroute
231 pages