0% found this document useful (0 votes)

43 views5 pages

Data Wrangling - Jupyter Notebook

The document is a Jupyter Notebook that demonstrates data exploration and manipulation using the pandas library in Python. It includes creating DataFrames, handling missing values, filtering data, merging DataFrames, and removing duplicates. Various operations are performed on student and car sales data, showcasing techniques like grouping and mapping for data wrangling.

Uploaded by

amitdhoundiyal2810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views5 pages

Data Wrangling - Jupyter Notebook

Uploaded by

amitdhoundiyal2810

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [7]:

# Data exploration, here we assign the data, and then we visualize the data in a tabular format.

# Import pandas package

import pandas as pd

# Assign data
data = {'Name': ['Jai', 'Princi', 'Gaurav',
'Anuj', 'Ravi', 'Natasha', 'Riya'],
'Age': [17, 17, 18, 17, 18, 17, 17],
'Gender': ['M', 'F', 'M', 'M', 'M', 'F', 'F'],
'Marks': [90, 76, 'NaN', 74, 65, 'NaN', 71]}

# Convert into DataFrame

df = pd.DataFrame(data)

# Display data
print(df)

Name Age Gender Marks

0 Jai 17 M 90
1 Princi 17 F 76
2 Gaurav 18 M NaN
3 Anuj 17 M 74
4 Ravi 18 M 65
5 Natasha 17 F NaN
6 Riya 17 F 71

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 1/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [23]:

# Compute average
c = avg = 0
for ele in df["Marks"]:
if str(ele).isnumeric():
c += 1
avg += ele
avg /= c

# Replace missing values

df = df.replace(to_replace="NaN",
value=avg)

# Display data
print(df)

---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'Marks'

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)

~\AppData\Local\Temp\ipykernel_103800\4025425810.py in <module>
1 # Compute average
2 #c = avg = 0
----> 3 for ele in df["Marks"]:
4 if str(ele).isnumeric():
5 c += 1

~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)

3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
-> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'Marks'

In [13]:

# Categorize gender
df['Gender'] = df['Gender'].map({'M': 0,
'F': 1, }).astype(float)

# Display data
print(df)

Name Age Gender Marks

0 Jai 17 NaN 90.0
1 Princi 17 NaN 76.0
2 Gaurav 18 NaN 75.2
3 Anuj 17 NaN 74.0
4 Ravi 18 NaN 65.0
5 Natasha 17 NaN 75.2
6 Riya 17 NaN 71.0

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 2/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [14]:

# Filter top scoring students

df = df[df['Marks'] >= 75]

# Remove age row

df = df.drop(['Age'], axis=1)

# Display data
print(df)

Name Gender Marks

0 Jai NaN 90.0
1 Princi NaN 76.0
2 Gaurav NaN 75.2
5 Natasha NaN 75.2

In [15]:

# Wrangling Data Using Merge Operation

# Merge operation is used to merge raw data and into the desired format.
# Syntax for merging pd.merge( data_frame1,data_frame2, on="field ")

# import module
import pandas as pd

# creating DataFrame for Student Details

details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105, 106,
107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot',
'Pooja', 'Rahul', 'Nikita',
'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})

# printing details
print(details)

ID NAME BRANCH
0 101 Jagroop CSE
1 102 Praveen CSE
2 103 Harjot CSE
3 104 Pooja CSE
4 105 Rahul CSE
5 106 Nikita CSE
6 107 Saurabh CSE
7 108 Ayush CSE
8 109 Dolly CSE
9 110 Mohit CSE

In [16]:

# Import module
import pandas as pd

# Creating Dataframe for Fees_Status

fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL',
'9000', '15000', 'NIL',
'4500', '1800', '250', 'NIL']})

# Printing fees_status
print(fees_status)

ID PENDING
0 101 5000
1 102 250
2 103 NIL
3 104 9000
4 105 15000
5 106 NIL
6 107 4500
7 108 1800
8 109 250
9 110 NIL

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 3/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [17]:

# WRANGLING DATA USING MERGE OPERATION:

# Creating Dataframe
details = pd.DataFrame({
'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'NAME': ['Jagroop', 'Praveen', 'Harjot',
'Pooja', 'Rahul', 'Nikita',
'Saurabh', 'Ayush', 'Dolly', "Mohit"],
'BRANCH': ['CSE', 'CSE', 'CSE', 'CSE', 'CSE',
'CSE', 'CSE', 'CSE', 'CSE', 'CSE']})

# Creating Dataframe
fees_status = pd.DataFrame(
{'ID': [101, 102, 103, 104, 105,
106, 107, 108, 109, 110],
'PENDING': ['5000', '250', 'NIL',
'9000', '15000', 'NIL',
'4500', '1800', '250', 'NIL']})

# Merging Dataframe
print(pd.merge(details, fees_status, on='ID'))

ID NAME BRANCH PENDING

0 101 Jagroop CSE 5000
1 102 Praveen CSE 250
2 103 Harjot CSE NIL
3 104 Pooja CSE 9000
4 105 Rahul CSE 15000
5 106 Nikita CSE NIL
6 107 Saurabh CSE 4500
7 108 Ayush CSE 1800
8 109 Dolly CSE 250
9 110 Mohit CSE NIL

In [18]:

# wrangling data using grouping method

# Creating Data
car_selling_data = {'Brand': ['Maruti', 'Maruti', 'Maruti',
'Maruti', 'Hyundai', 'Hyundai',
'Toyota', 'Mahindra', 'Mahindra',
'Ford', 'Toyota', 'Ford'],
'Year': [2010, 2011, 2009, 2013,
2010, 2011, 2011, 2010,
2013, 2010, 2010, 2011],
'Sold': [6, 7, 9, 8, 3, 5,
2, 8, 7, 2, 4, 2]}

# Creating Dataframe of car_selling_data

df = pd.DataFrame(car_selling_data)
print(df)

Brand Year Sold

0 Maruti 2010 6
1 Maruti 2011 7
2 Maruti 2009 9
3 Maruti 2013 8
4 Hyundai 2010 3
5 Hyundai 2011 5
6 Toyota 2011 2
7 Mahindra 2010 8
8 Mahindra 2013 7
9 Ford 2010 2
10 Toyota 2010 4
11 Ford 2011 2

In [19]:

# Group the data when year = 2010

grouped = df.groupby('Year')
print(grouped.get_group(2010))

Brand Year Sold

0 Maruti 2010 6
4 Hyundai 2010 3
7 Mahindra 2010 8
9 Ford 2010 2
10 Toyota 2010 4

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 4/5
2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

In [20]:

# Wrangling data by removing Duplication

# DataFrame.duplicated(subset=None, keep='first')
# Initializing Data
student_data = {'Name': ['Amit', 'Praveen', 'Jagroop',
'Rahul', 'Vishal', 'Suraj',
'Rishab', 'Satyapal', 'Amit',
'Rahul', 'Praveen', 'Amit'],

'Roll_no': [23, 54, 29, 36, 59, 38,

12, 45, 34, 36, 54, 23],

'Email': ['[email protected]', '[email protected]',

'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]',
'[email protected]', '[email protected]']}

# Creating Dataframe of Data

df = pd.DataFrame(student_data)

# Printing Dataframe
print(df)

Name Roll_no Email

0 Amit 23 [email protected]
1 Praveen 54 [email protected]
2 Jagroop 29 [email protected]
3 Rahul 36 [email protected]
4 Vishal 59 [email protected]
5 Suraj 38 [email protected]
6 Rishab 12 [email protected]
7 Satyapal 45 [email protected]
8 Amit 34 [email protected]
9 Rahul 36 [email protected]
10 Praveen 54 [email protected]
11 Amit 23 [email protected]

In [21]:

# Here df.duplicated() list duplicate Entries in ROllno.

# So that ~(NOT) is placed in order to get non duplicate values.
non_duplicate = df[~df.duplicated('Roll_no')]

# printing non-duplicate values

print(non_duplicate)

Name Roll_no Email

In [ ]:

localhost:8892/notebooks/Untitled10.ipynb?kernel_name=python3 5/5

CRTP Exam Update
No ratings yet
CRTP Exam Update
10 pages
The Hollywood Standard
0% (2)
The Hollywood Standard
32 pages
Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Git 203 Assignment 1
No ratings yet
Git 203 Assignment 1
2 pages
Ideophones, Mimetics and Expressives - (2019)
100% (2)
Ideophones, Mimetics and Expressives - (2019)
337 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
DKV Card Specification - V - 1 - 21-1
No ratings yet
DKV Card Specification - V - 1 - 21-1
10 pages
Dataframe Cheat Sheet
No ratings yet
Dataframe Cheat Sheet
2 pages
DSBDL Pract 2
No ratings yet
DSBDL Pract 2
6 pages
Sakina Assign1 Batch3
No ratings yet
Sakina Assign1 Batch3
8 pages
Application Log, Deletion of Logs (BALDAT Management and Utilisation)
No ratings yet
Application Log, Deletion of Logs (BALDAT Management and Utilisation)
6 pages
Conditional Formatting Practice File (Excel) - Questions
No ratings yet
Conditional Formatting Practice File (Excel) - Questions
130 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
File Ip
No ratings yet
File Ip
22 pages
Advanced Data Validation - Answers
No ratings yet
Advanced Data Validation - Answers
5 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Pandas 2 Complete Notes Class XII
No ratings yet
Pandas 2 Complete Notes Class XII
18 pages
Pandas
No ratings yet
Pandas
27 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
2.1.2 Structure Chart (MT-L)
No ratings yet
2.1.2 Structure Chart (MT-L)
8 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Unit 4
No ratings yet
Unit 4
27 pages
12 Pandas
100% (1)
12 Pandas
21 pages
Practical File Python
No ratings yet
Practical File Python
25 pages
Introduction To Data Structures
No ratings yet
Introduction To Data Structures
3 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
No ratings yet
Lab Session 06: Perform Following Operations Using Pandas Lab Session 06: Perform Following Operations Using Pandas
5 pages
12 Pandas
No ratings yet
12 Pandas
14 pages
GR12 Record Programs 6TH Onwards
No ratings yet
GR12 Record Programs 6TH Onwards
18 pages
Author Marcia G. Berger's New Book "When Hope Is Deferred" Is A Compelling Novel Set During The Reign of King Herod That Explores The Power of Hope Amidst Despair
No ratings yet
Author Marcia G. Berger's New Book "When Hope Is Deferred" Is A Compelling Novel Set During The Reign of King Herod That Explores The Power of Hope Amidst Despair
3 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
Data Loading - Jupyter Notebook
No ratings yet
Data Loading - Jupyter Notebook
15 pages
About Blockchain Technology
No ratings yet
About Blockchain Technology
10 pages
Exp 3
No ratings yet
Exp 3
10 pages
Python Pandas - 2 2020-21
No ratings yet
Python Pandas - 2 2020-21
21 pages
E Health Campaigns Assignment
No ratings yet
E Health Campaigns Assignment
14 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Hol 2225 02 Net - PDF - en
No ratings yet
Hol 2225 02 Net - PDF - en
262 pages
Eden's Bridge Songs
No ratings yet
Eden's Bridge Songs
6 pages
Mobile App Portfolio
No ratings yet
Mobile App Portfolio
56 pages
About Illustrator Theory
No ratings yet
About Illustrator Theory
3 pages
Part A Assignment - No - 1
No ratings yet
Part A Assignment - No - 1
7 pages
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
No ratings yet
Gender Autonomy As A Feminist Premise of Identity and Its Impact Upon Female Protagonists in Fictional Narratives
7 pages
Vantika Kamra's Practical File 12 Diamond (26600872)
No ratings yet
Vantika Kamra's Practical File 12 Diamond (26600872)
46 pages
Python Pandas-DataFrames Complete - Jupyter Notebook
No ratings yet
Python Pandas-DataFrames Complete - Jupyter Notebook
34 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Unit 1 - DE
No ratings yet
Unit 1 - DE
44 pages
Series 1
No ratings yet
Series 1
408 pages
Advance Operations On Dataframes: Create A Dataframe With Following Values
No ratings yet
Advance Operations On Dataframes: Create A Dataframe With Following Values
3 pages
PDF&Rendition 1
No ratings yet
PDF&Rendition 1
47 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
CHAITYAVANDAN
No ratings yet
CHAITYAVANDAN
4 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Even Students
No ratings yet
Even Students
36 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
Data Integration and Missing Values Analysis
No ratings yet
Data Integration and Missing Values Analysis
23 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Networking Commands
No ratings yet
Networking Commands
2 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Agent-Based Cloud Service Design
No ratings yet
Agent-Based Cloud Service Design
29 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Ip Practical
No ratings yet
Ip Practical
23 pages
Practical File ANKIT RAJ CLASS 12-F
No ratings yet
Practical File ANKIT RAJ CLASS 12-F
48 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Data Frame Notes3
No ratings yet
Data Frame Notes3
39 pages
Luxury Living at Sainamaha Panvel
No ratings yet
Luxury Living at Sainamaha Panvel
9 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
DHP Journal
No ratings yet
DHP Journal
29 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Python Pandas Assignment Guide
No ratings yet
Python Pandas Assignment Guide
9 pages
List of Practical Ip065 Xii Session 2025 CKC Academy
No ratings yet
List of Practical Ip065 Xii Session 2025 CKC Academy
19 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Creating An Object Save Location For The Object Management Workbench - Document 626181.1
No ratings yet
Creating An Object Save Location For The Object Management Workbench - Document 626181.1
6 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
New01 Intro
No ratings yet
New01 Intro
11 pages
English 8 Quarter 1 Concept Notes 1
No ratings yet
English 8 Quarter 1 Concept Notes 1
18 pages
dn015f NOISE
No ratings yet
dn015f NOISE
2 pages
Kartikeya Strota
No ratings yet
Kartikeya Strota
6 pages
多邻国常用动词不规则变化表
No ratings yet
多邻国常用动词不规则变化表
3 pages
SEQA Session 4.1
No ratings yet
SEQA Session 4.1
86 pages
LOCHHEAD "How Does It Work: Challenges To Analytic Explanation"
100% (1)
LOCHHEAD "How Does It Work: Challenges To Analytic Explanation"
23 pages
Study Skills for Students
No ratings yet
Study Skills for Students
10 pages
Unix IPC for Developers
No ratings yet
Unix IPC for Developers
15 pages
HandsOn Solutions
No ratings yet
HandsOn Solutions
41 pages
The Songs of Yig, Edited by Allen Mackey
No ratings yet
The Songs of Yig, Edited by Allen Mackey
19 pages
Chapter 23
100% (1)
Chapter 23
48 pages

Data Wrangling - Jupyter Notebook

Uploaded by

Data Wrangling - Jupyter Notebook

Uploaded by

2/6/23, 5:11 PM Untitled10 - Jupyter Notebook

# Import pandas package

# Convert into DataFrame

Name Age Gender Marks

# Replace missing values

KeyError Traceback (most recent call last)

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)

Name Age Gender Marks

# Filter top scoring students

# Remove age row

Name Gender Marks

# Wrangling Data Using Merge Operation

# creating DataFrame for Student Details

# Creating Dataframe for Fees_Status

# WRANGLING DATA USING MERGE OPERATION:

ID NAME BRANCH PENDING

# wrangling data using grouping method

# Creating Dataframe of car_selling_data

Brand Year Sold

# Group the data when year = 2010

Brand Year Sold

# Wrangling data by removing Duplication

'Roll_no': [23, 54, 29, 36, 59, 38,

'Email': ['[email protected]', '[email protected]',

# Creating Dataframe of Data

Name Roll_no Email

# Here df.duplicated() list duplicate Entries in ROllno.

# printing non-duplicate values

Name Roll_no Email

You might also like

~\anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key)