0% found this document useful (0 votes)

54 views30 pages

Pandas

Pandas is a Python library for data analysis and manipulation, created by Wes McKinney in 2008. It provides data structures like Series and DataFrame for handling one-dimensional and two-dimensional data, respectively, and includes functions for data cleaning, transformation, and visualization. The document covers various functionalities of Pandas, including reading CSV and JSON files, handling missing values, and performing data integration and manipulation.

Uploaded by

rishavranjan1607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views30 pages

Pandas

Uploaded by

rishavranjan1607

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Pandas

• Pandas is a Python library.

• Pandas is used to analyze data.
• Pandas is a Python library used for working with data
sets.
• It has functions for analyzing, cleaning, exploring, and
manipulating data.
• The name "Pandas" has a reference to both "Panel
Data", and "Python Data Analysis" and was created by
Wes McKinney in 2008.

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 2

What is a Series?
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.
• Example
• Create a simple Pandas Series from a list:
• import pandas as pd

a = [1, 7, 2]

myvar = pd.Series(a)

print(myvar)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 3
What is a DataFrame?
• A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional
array, or a table with rows and columns.
• Example
• Create a simple Pandas DataFrame:
• import pandas as pd

data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}

#load data into a DataFrame object:

df = pd.DataFrame(data)

print(df)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 4
Read CSV Files
• A simple way to store big data sets is to use CSV files (comma separated files).
• CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
• In our examples we will be using a CSV file called 'data.csv'.
• Example
• Load the CSV into a DataFrame:
• import pandas as pd

df = pd.read_csv('data.csv')

print(df.to_string())
• Tip: use to_string() to print the entire DataFrame.
• Example
• Print the DataFrame without the to_string() method:
• import pandas as pd

df = pd.read_csv('data.csv')

print(df)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 5

Read JSON
• Big data sets are often stored, or extracted as JSON.
• JSON is plain text, but has the format of an object, and is well
known in the world of programming, including Pandas.
• In our examples we will be using a JSON file called 'data.json'.
• Example
• Load the JSON file into a DataFrame:
• import pandas as pd

df = pd.read_json('data.json')

print(df.to_string())
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 6
Analyzing DataFrames
• Viewing the Data
• One of the most used method for getting a quick overview of the DataFrame, is
the head() method.
• The head() method returns the headers and a specified number of rows, starting from the
top.
• Example
• Get a quick overview by printing the first 10 rows of the DataFrame:
• import pandas as pd

df = pd.read_csv('data.csv')

print(df.head(10))
• Print the first 5 rows of the DataFrame:
• import pandas as pd

df = pd.read_csv('data.csv')

print(df.head())
• Example
• Print the last 5 rows of the DataFrame:
• print(df.tail())
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 7
Cleaning Data
• Data cleaning means fixing bad data in your data set.
• Bad data could be:
Empty cells
Data in wrong format
Wrong data
Duplicates
• Our Data Set
• The data set contains some empty cells ("Date" in row 22, and
"Calories" in row 18 and 28).
• The data set contains wrong format ("Date" in row 26).
• The data set contains wrong data ("Duration" in row 7).
• The data set contains duplicates (row 11 and 12).
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 8
Cleaning Empty Cells
• Example
• Return a new Data Frame with no empty cells:
• import pandas as pd

df = pd.read_csv('data3.csv')

new_df = df.dropna()

print(new_df.to_string())
• Example
• Remove all rows with NULL values:
• import pandas as pd

df = pd.read_csv('data3.csv')

df.dropna(inplace = True)

print(df.to_string())
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 9
• Example
• Replace NULL values with the number 130:
• import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

• Example
• Replace NULL values in the "Calories" columns with the number
130:
• import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 10
Replace Using Mean, Median, or
Mode
• A common way to replace empty cells, is to calculate the mean,
median or mode value of the column.
• Pandas uses the mean() median() and mode() methods to calculate
the respective values for a specified column:
• Example
• Calculate the MEAN, and replace any empty values with it:
• import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 11
• Example
• Calculate the MEDIAN, and replace any empty values with it:
• import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

• Example
• Calculate the MODE, and replace any empty values with it:
• import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 12
Cleaning Data of Wrong Format
• Data of Wrong Format
• Cells with data of wrong format can make it difficult, or even impossible, to analyze
data.
• To fix it, you have two options: remove the rows, or convert all cells in the columns into
the same format.
• Convert Into a Correct Format
• In our Data Frame, we have two cells with the wrong format. Check out row 22 and 26,
the 'Date' column should be a string that represents a date:
• Example
• Convert to date:
• import pandas as pd

df = pd.read_csv('data.csv')

df['Date'] = pd.to_datetime(df['Date'])

print(df.to_string())
• Example
• Remove rows with a NULL value in the "Date" column:
• df.dropna(subset=['Date'], inplace = True)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 13
Fixing Wrong Data
• Wrong Data
• "Wrong data" does not have to be "empty cells" or "wrong format", it can just be wrong, like if someone
registered "199" instead of "1.99".
• Sometimes you can spot wrong data by looking at the data set, because you have an expectation of what it
should be.
• If you take a look at our data set, you can see that in row 7, the duration is 450, but for all the other rows the
duration is between 30 and 60.
• It doesn't have to be wrong, but taking in consideration that this is the data set of someone's workout
sessions, we conclude with the fact that this person did not work out in 450 minutes.
• Replacing Values
• One way to fix wrong values is to replace them with something else.
• In the example, it is most likely a typo, and the value should be "45" instead of "450", and we could just
insert "45" in row 7:
• Example
• Set "Duration" = 45 in row 7:
• df.loc[7, 'Duration'] = 45
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 14
• Example
• Loop through all values in the "Duration" column.
• If the value is higher than 120, set it to 120:
• for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
• Removing Rows
• Another way of handling wrong data is to remove the rows that contains
wrong data.
• This way you do not have to find out what to replace them with, and
there is a good chance you do not need them to do your analyses.
• Example
• Delete rows where "Duration" is higher than 120:
• for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 15
Removing Duplicates
• By taking a look at our test data set, we can assume that row 11 and 12 are duplicates.
• To discover duplicates, we can use the duplicated() method.
• The duplicated() method returns a Boolean values for each row:
• Example
• Returns True for every row that is a duplicate, otherwise False:
• print(df.duplicated())
• Removing Duplicates
• To remove duplicates, use the drop_duplicates() method.
• Example
• Remove all duplicates:
• df.drop_duplicates(inplace = True)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 16
Imputation Techniques(Handling
Missing Data)
• In data preprocessing, missing values can cause problems in analysis
and modeling. Imputation is the process of filling in missing values
with estimated ones.
Checking for Missing Values
• Before imputing, we check for missing values in a DataFrame

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 17

Checking for Missing Values

• import pandas as pd

• # Sample dataset with missing values

• data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
• 'Age': [25, None, 30, 22], # Missing age for Bob
• 'Salary': [50000, 60000, None, 55000]} # Missing salary for Charlie

• df = pd.DataFrame(data)

• # Check for missing values

• print(df.isnull().sum())
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 18
Imputation Techniques in Pandas

• Mean, Median, and Mode Imputation (for Numerical Data)

# Fill missing values in 'Age' with Mean

• df['Age'].fillna(df['Age'].mean(), inplace=True)

• # Fill missing values in 'Salary' with Median

• df['Salary'].fillna(df['Salary'].median(), inplace=True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 19

Data Transformation
• Data transformation is a crucial step in data preprocessing, where
data is converted, modified, or reshaped to improve model
performance or analysis.
Scaling and Normalization
• Used to bring numerical values into a specific range for better model
performance.
Min-Max Scaling (Normalization)
• Brings values between 0 and 1.

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 20

cont
• import pandas as pd
• from sklearn.preprocessing import MinMaxScaler

• # Sample dataset
• df = pd.DataFrame({'Salary': [40000, 45000, 60000, 100000, 200000]})

• # Apply Min-Max Scaling

• scaler = MinMaxScaler()
• df['Salary_Normalized'] = scaler.fit_transform(df[['Salary']])
• print(df)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 21
Standardization (Z-Score Scaling)

• Converts data to have mean = 0 and standard deviation = 1.

• from sklearn.preprocessing import StandardScaler

• scaler = StandardScaler()
• df['Salary_Standardized'] = scaler.fit_transform(df[['Salary']])
• print(df)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 22

Data Integration and
Manipulation
• Data integration and manipulation are essential for combining
multiple datasets, transforming data, and preparing it for analysis or
machine learning.
• Data Integration (Merging & Joining DataFrames)
• Combining multiple datasets is common in real-world scenarios.
Pandas provides functions like merge(), concat() and join() for this.

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 23

Cont..

• import pandas as pd

• # Sample DataFrames
• df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25,
30, 35]})
• df2 = pd.DataFrame({'ID': [1, 2, 4], 'Salary': [50000, 60000, 70000]})

• # Inner Join (only matching rows)

• merged_df = pd.merge(df1, df2, on='ID', how='inner')
• print(merged_df)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 24
Concatenating DataFrames
• Row-wise concatenation
• df3 = pd.DataFrame({'ID': [4, 5], 'Name': ['David', 'Eve'], 'Age': [40,
28]})
• df_concat = pd.concat([df1, df3], ignore_index=True)
• print(df_concat)
• Column-wise concatenation
• df4 = pd.DataFrame({'City': ['NY', 'LA', 'SF']})
• df_col_concat = pd.concat([df1, df4], axis=1)
• print(df_col_concat)
04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 25
Joining DataFrames (Index-based
Merge)
• df1.set_index('ID', inplace=True)
• df2.set_index('ID', inplace=True)
• df_joined = df1.join(df2, how='inner') # Similar to merge but uses
index
• print(df_joined)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 26

Data Manipulation
• Handling Missing Values
• df.fillna(df.mean(), inplace=True) # Fill missing values with column
mean
• df.dropna(inplace=True) # Remove rows with missing values
• Changing Data Types
• df['Age'] = df['Age'].astype(float) # Convert Age column to float
• Renaming Columns
• df.rename(columns={'Name': 'Full_Name'}, inplace=True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 27

Cont.
• Filtering & Selecting Data
• df_filtered = df[df['Age'] > 30] # Select rows where Age > 30
• Grouping & Aggregation
• df.groupby('City')['Salary'].mean() # Get average salary by city
• df.pivot_table(values='Salary', index='City', aggfunc='sum') # Pivot
table
• Adding New Columns
• df['Bonus'] = df['Salary'] * 0.10 # Adding a 10% bonus column

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 28

Plotting
• Plotting
• Pandas uses the plot() method to create diagrams.
• We can use Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen.
• Read more about Matplotlib in our Matplotlib Tutorial.
• Example
• Import pyplot from Matplotlib and visualize our DataFrame:
• import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot()

plt.show()
• Scatter Plot
• Specify that you want a scatter plot with the kind argument:
• kind = 'scatter'
• A scatter plot needs an x- and a y-axis.
• In the example below we will use "Duration" for the x-axis and "Calories" for the y-axis.
• Include the x and y arguments like this:
• x = 'Duration', y = 'Calories'

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 29

• Example
• import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')

df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')

plt.show()
• Histogram
• Use the kind argument to specify that you want a histogram:
• kind = 'hist'
• A histogram needs only one column.
• A histogram shows us the frequency of each interval, e.g. how many workouts
lasted between 50 and 60 minutes?
• In the example below we will use the "Duration" column to create the histogram:
• Example
• df["Duration"].plot(kind = 'hist')

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 30

THESIS. Villanueva R.
No ratings yet
THESIS. Villanueva R.
27 pages
Databricks Data Engineer Associate Practice
No ratings yet
Databricks Data Engineer Associate Practice
12 pages
Pandas Notes Basic To Advance
No ratings yet
Pandas Notes Basic To Advance
21 pages
New Syllabus of Diploma in Sports Performance Analysis
No ratings yet
New Syllabus of Diploma in Sports Performance Analysis
32 pages
20 SQL Exercises For Practice: Table Structure and Schema
100% (5)
20 SQL Exercises For Practice: Table Structure and Schema
12 pages
12 IP Notes On Series
No ratings yet
12 IP Notes On Series
5 pages
AI+ Sales Executive Summary
No ratings yet
AI+ Sales Executive Summary
16 pages
Pandas
No ratings yet
Pandas
27 pages
Exploring The Role of Mathematics in Hairstyle
No ratings yet
Exploring The Role of Mathematics in Hairstyle
10 pages
SAP IDOC Tutorial
No ratings yet
SAP IDOC Tutorial
12 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
Pandas
No ratings yet
Pandas
8 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
GIS Manual
100% (1)
GIS Manual
37 pages
HANA Cleaner
No ratings yet
HANA Cleaner
34 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Pandas Methods
No ratings yet
Pandas Methods
6 pages
Pandas Notes
No ratings yet
Pandas Notes
6 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Response To Color Literature Review With Cross-Cultural Marketing Perspective
No ratings yet
Response To Color Literature Review With Cross-Cultural Marketing Perspective
22 pages
Pandas Notes Design
No ratings yet
Pandas Notes Design
5 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
HTML Note Imp HTML
No ratings yet
HTML Note Imp HTML
165 pages
Pandas
No ratings yet
Pandas
13 pages
HTML Tutorial
No ratings yet
HTML Tutorial
42 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
19 pages
HTML
No ratings yet
HTML
12 pages
Abramson - Inmon Vs Kimball
100% (3)
Abramson - Inmon Vs Kimball
32 pages
Pandas 6 1716219621
No ratings yet
Pandas 6 1716219621
17 pages
Pandas
No ratings yet
Pandas
86 pages
Complete HTML XML JS CSS WT-Course-Material
No ratings yet
Complete HTML XML JS CSS WT-Course-Material
174 pages
Module1-Cheat-Sheet-LINE PLOT
No ratings yet
Module1-Cheat-Sheet-LINE PLOT
3 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
10 pages
EDA With Pandas CheatSheet
No ratings yet
EDA With Pandas CheatSheet
3 pages
HTML Notes
No ratings yet
HTML Notes
96 pages
The Racers Life
No ratings yet
The Racers Life
74 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
Q2 Lesson 3
No ratings yet
Q2 Lesson 3
41 pages
Pandas
No ratings yet
Pandas
14 pages
Brazilian Electronic Tax Payment Guide
No ratings yet
Brazilian Electronic Tax Payment Guide
51 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
HTML-Notes 1
No ratings yet
HTML-Notes 1
27 pages
Python Data Analysis Basics
No ratings yet
Python Data Analysis Basics
246 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
40 NumPy and Pandas Interview Questions With Answers 1740141557
No ratings yet
40 NumPy and Pandas Interview Questions With Answers 1740141557
6 pages
Major Project Presentation Harsh
No ratings yet
Major Project Presentation Harsh
12 pages
Data Analytics with Python Course
No ratings yet
Data Analytics with Python Course
2 pages
Chapter - 6 Dictionary
100% (2)
Chapter - 6 Dictionary
25 pages
Unit-I: Introduction To HTML
No ratings yet
Unit-I: Introduction To HTML
221 pages
HTML Notes
No ratings yet
HTML Notes
22 pages
BishkhaJoshi Report
No ratings yet
BishkhaJoshi Report
43 pages
Class 6 Pandas
No ratings yet
Class 6 Pandas
13 pages
Form HH Food Account and Record Method
No ratings yet
Form HH Food Account and Record Method
5 pages
DB2A Mock Test-2
No ratings yet
DB2A Mock Test-2
9 pages
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
100% (1)
Data Analysis With Pandas - Aggregates in Pandas Cheatsheet - Codecademy
2 pages
1 - Interactive Data Visualization With Bokeh
No ratings yet
1 - Interactive Data Visualization With Bokeh
31 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
87 pages
Computer Science Essentials Guide
No ratings yet
Computer Science Essentials Guide
15 pages
WT Course Material
No ratings yet
WT Course Material
88 pages
HTML Basics and Tags Guide
No ratings yet
HTML Basics and Tags Guide
44 pages
Dubai Mo01
No ratings yet
Dubai Mo01
7 pages
Pandas in Python 16sept2022
No ratings yet
Pandas in Python 16sept2022
8 pages
Datavischeatsheet
No ratings yet
Datavischeatsheet
2 pages
Hot Topics For Research Papers in Computer Science
No ratings yet
Hot Topics For Research Papers in Computer Science
8 pages
Pandas Guide for Data Science
No ratings yet
Pandas Guide for Data Science
42 pages
ER-to-Relational Mapping Guide
No ratings yet
ER-to-Relational Mapping Guide
14 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Employee Engagement Kit
No ratings yet
Employee Engagement Kit
57 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
Pandas Guide for Beginners
No ratings yet
Pandas Guide for Beginners
18 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
4 pages
Cis
No ratings yet
Cis
9 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Copying Schemas
No ratings yet
Copying Schemas
4 pages
Understanding - FM - Data 23
No ratings yet
Understanding - FM - Data 23
35 pages
Business Intelligence 1: Assignment No. 2
No ratings yet
Business Intelligence 1: Assignment No. 2
4 pages
HTML Cheat Sheet - Copie
No ratings yet
HTML Cheat Sheet - Copie
9 pages
Pandas
No ratings yet
Pandas
4 pages
Development of NLP Powered Semantic Analysis For Document Understanding
No ratings yet
Development of NLP Powered Semantic Analysis For Document Understanding
4 pages
Master of Library and Information SCIENCE (Revised) 1-1 - 7) FI Term-End Examination December, 2019 Mli-101: Information, Communication and Society
No ratings yet
Master of Library and Information SCIENCE (Revised) 1-1 - 7) FI Term-End Examination December, 2019 Mli-101: Information, Communication and Society
4 pages

Pandas

Uploaded by

Pandas

Uploaded by

Pandas

• Pandas is a Python library.

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 2

#load data into a DataFrame object:

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 5

df.fillna(130, inplace = True)

df["Calories"].fillna(130, inplace = True)

df["Calories"].fillna(x, inplace = True)

df["Calories"].fillna(x, inplace = True)

df["Calories"].fillna(x, inplace = True)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 17

• # Sample dataset with missing values

• # Check for missing values

• Mean, Median, and Mode Imputation (for Numerical Data)

# Fill missing values in 'Age' with Mean

• # Fill missing values in 'Salary' with Median

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 19

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 20

• # Apply Min-Max Scaling

• Converts data to have mean = 0 and standard deviation = 1.

• from sklearn.preprocessing import StandardScaler

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 22

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 23

• # Inner Join (only matching rows)

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 26

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 27

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 28

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 29

df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')

04/04/2025 Pandas. Prof Himanshu Bhusan Mohapatra 30

You might also like