0% found this document useful (0 votes)

139 views1 page

Data Wrangling Cheat Sheet

This document provides a summary of common data wrangling techniques in Pandas including selecting, filtering, merging, reshaping, pivoting, indexing, joining, concatenating, and handling duplicate data. It includes examples of how to select columns based on conditions, merge datasets, pivot tables, set/reset indexes, reindex, forward and backward fill missing data, join, concatenate vertically and horizontally, work with multi-indexes, melt/unstack data, and check for duplicate values. The document is a cheat sheet for learning data wrangling online.

Uploaded by

cuneyt noksan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views1 page

Data Wrangling Cheat Sheet

Uploaded by

cuneyt noksan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

> Advanced Indexing Also see NumPy Arrays > Combining Data

Python For Data Science

Selecting
>>> df3.loc[:,(df3>1).any()] #Select cols with any vals >1

Data Wrangling in Pandas Cheat Sheet >>>

>>>
>>>
df3.loc[:,(df3>1).all()] #Select cols with vals > 1

df3.loc[:,df3.isnull().any()] #Select cols with NaN

df3.loc[:,df3.notnull().all()] #Select cols without NaN

Learn Data Wrangling online at www.DataCamp.com Indexing With isin()

>>> df[(df.Country.isin(df2.Type))] #Find same elements

>>> df3.filter(items=”a”,”b”]) #Filter on values

Merge
>>> df.select(lambda x: not x%5) #Select specific elements

Where >>> pd.merge(data1,

data2,

> Reshaping Data >>> s.where(s > 0) #Subset the data

Query
how='left',

on='X1')

>>> df6.query('second > first') #Query DataFrame

Pivot >>> pd.merge(data1,

data2,

>>> df3= df2.pivot(index='Date', #Spread rows into columns

Setting/Resetting Index how='right',

on='X1')
columns='Type',

values='Value') >>> df.set_index('Country') #Set the index

>>> df4 = df.reset_index() #Reset the index

>>> pd.merge(data1,

>>> df = df.rename(index=str, #Rename

data2,

DataFrame columns={"Country":"cntry",
how='inner',

"Capital":"cptl",
on='X1')
"Population":"ppltn"})
>>> pd.merge(data1,

Reindexing data2,

how='outer',

on='X1')
Pivot Table >>> s2 = s.reindex(['a','c','d','e','b'])

Forward Filling Backward Filling

>>> df4 = pd.pivot_table(df2, #Spread rows into

columns values='Value',
>>> df.reindex(range(4),
>>> s3 = s.reindex(range(5),

index='Date',
method='ffill') method='bfill') Join
columns='Type']) Country Capital Population
0 3

0 Belgium Brussels 11190846

1 3
>>> data1.join(data2, how='right')
1 India New Delhi 1303171035
2 3

Stack / Unstack 2 Brazil Brasília 207847528

3 3

3 Brazil Brasília 207847528 4 3

Concatenate
>>> stacked = df5.stack() #Pivot a level of column labels

>>> stacked.unstack() #Pivot a level of index labels

MultiIndexing Vertical
>>> s.append(s2)
>>> arrays = [np.array([1,2,3]),

np.array([5,4,3])]
Horizontal/Vertical
>>> df5 = pd.DataFrame(np.random.rand(3, 2), index=arrays)

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])

>>> tuples = list(zip(*arrays))

>>> pd.concat([data1, data2], axis=1, join='inner')

>>> index = pd.MultiIndex.from_tuples(tuples,

names=['first', 'second'])

>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)

Melt >>> df2.set_index(["Date", "Type"])

> Dates
> Duplicate Data
>>> pd.melt(df2, #Gather columns into rows

id_vars=["Date"],

value_vars=["Type", "Value"],
>>> df2['Date']= pd.to_datetime(df2['Date'])

value_name="Observations") >>> df2['Date']= pd.date_range('2000-1-1',

>>> s3.unique() #Return unique values

periods=6,

>>> df2.duplicated('Type') #Check duplicates

freq='M')

>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates

>>> dates = [datetime(2012,5,1), datetime(2012,5,2)]

>>> df.index.duplicated() #Check index duplicates >>> index = pd.DatetimeIndex(dates)

>>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')

> Grouping Data

> Visualization Also see Matplotlib
Aggregation
> Iteration >>> df2.groupby(by=['Date','Type']).mean()

>>> import matplotlib.pyplot as plt

>>> s.plot()
>>> df2.plot()

>>> df4.groupby(level=0).sum()

>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) >>> plt.show() >>> plt.show()

>>> df.iteritems() #(Column-index, Series) pairs

>>> df.iterrows() #(Row-index, Series) pairs

Transformation
>>> customSum = lambda x: (x+x%2)

> Missing Data

>>> df4.groupby(level=0).transform(customSum)

>>> df.dropna() #Drop NaN values

>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value

>>> df2.replace("a", "f") #Replace values with others

Learn Data Skills Online at www.DataCamp.com

Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
No ratings yet
Gopal Sahastranaam Stotram Path Vidhi Labh Hindi 225
22 pages
Pandas Data Wrangling Cheat Sheet
No ratings yet
Pandas Data Wrangling Cheat Sheet
2 pages
Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data
No ratings yet
Python For Data Science: Advanced Indexing Data Wrangling in Pandas Cheat Sheet Combining Data
1 page
Non-Conformance Report (NCR) : Part-I: To Be Completed by The Initiator
100% (1)
Non-Conformance Report (NCR) : Part-I: To Be Completed by The Initiator
2 pages
Quiz - Cloud Security and Virtualization - Attempt Review
No ratings yet
Quiz - Cloud Security and Virtualization - Attempt Review
4 pages
PC Pro - May 2024 UK
No ratings yet
PC Pro - May 2024 UK
132 pages
Marketing Models MKTG 343 Bass Forecasting Model: Lecturer: Muhammad Asim
No ratings yet
Marketing Models MKTG 343 Bass Forecasting Model: Lecturer: Muhammad Asim
12 pages
Data WranglingGUIA PYTHON-05
No ratings yet
Data WranglingGUIA PYTHON-05
1 page
Machine Learning for Business Growth
No ratings yet
Machine Learning for Business Growth
21 pages
Python Training Course in Hyderabad
100% (1)
Python Training Course in Hyderabad
10 pages
Big Data
No ratings yet
Big Data
9 pages
Biometrics in Secure E-Transaction
No ratings yet
Biometrics in Secure E-Transaction
22 pages
Test Your Understanding - Constructor (Copy) - Attempt Review
100% (1)
Test Your Understanding - Constructor (Copy) - Attempt Review
7 pages
Bibliography: Web Grabber
No ratings yet
Bibliography: Web Grabber
12 pages
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
No ratings yet
Mathematical Foundations of Machine Learning: (NMAG 469, FALL TERM 2018-2019)
74 pages
Intro To Jupyter Notebooks
No ratings yet
Intro To Jupyter Notebooks
44 pages
Cyber Security Trends in Modern Automobile Industry/sector
100% (1)
Cyber Security Trends in Modern Automobile Industry/sector
51 pages
Django Project Setup Guide
No ratings yet
Django Project Setup Guide
3 pages
Lab Manual - Explore The OSI and TCP-IP Models in Action (Instructor)
No ratings yet
Lab Manual - Explore The OSI and TCP-IP Models in Action (Instructor)
5 pages
Unit 1
No ratings yet
Unit 1
83 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Gpucoder Ug
No ratings yet
Gpucoder Ug
560 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
Decision Modeling
No ratings yet
Decision Modeling
519 pages
Panda Programs
No ratings yet
Panda Programs
40 pages
Study Material For Reference
No ratings yet
Study Material For Reference
35 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
28 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
104 pages
Market Basket Analysis With Association Rules
No ratings yet
Market Basket Analysis With Association Rules
15 pages
Fleet Fuel Optimization Model
100% (1)
Fleet Fuel Optimization Model
70 pages
Mitel 6863 SIP Phone Datasheet
No ratings yet
Mitel 6863 SIP Phone Datasheet
3 pages
Critical Thinking Skills and Inferences
No ratings yet
Critical Thinking Skills and Inferences
10 pages
Phishing Seminar
No ratings yet
Phishing Seminar
19 pages
Android Developer Virtual Internship
No ratings yet
Android Developer Virtual Internship
16 pages
Chapter 6 Introduction To Predictive Analytics
100% (1)
Chapter 6 Introduction To Predictive Analytics
46 pages
Python Data Science Cookbook - (Preface) PDF
No ratings yet
Python Data Science Cookbook - (Preface) PDF
8 pages
Data Analyst - CX Assignment
No ratings yet
Data Analyst - CX Assignment
3 pages
Tabular Data - Deep Learning Is Not All You Need
No ratings yet
Tabular Data - Deep Learning Is Not All You Need
13 pages
12 - CS - CSV File
No ratings yet
12 - CS - CSV File
4 pages
CDPI 2445 Industry Update January 2023
No ratings yet
CDPI 2445 Industry Update January 2023
22 pages
Image Alt Attribute
100% (2)
Image Alt Attribute
86 pages
Typovision GMBH Agency Presentation
No ratings yet
Typovision GMBH Agency Presentation
28 pages
Splunk Peak Threat Hunting Framework
No ratings yet
Splunk Peak Threat Hunting Framework
31 pages
Salesforce Platform Developer I and II Certification Training Brochure
No ratings yet
Salesforce Platform Developer I and II Certification Training Brochure
15 pages
Bokeh Cheat Sheet
No ratings yet
Bokeh Cheat Sheet
1 page
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
AI Boost for YouTube Long-Form Videos
No ratings yet
AI Boost for YouTube Long-Form Videos
26 pages
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II Part-II-1
65 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Class Material - 1
No ratings yet
Class Material - 1
66 pages
Walmart's Sales Data Analysis - A Big Data
No ratings yet
Walmart's Sales Data Analysis - A Big Data
6 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Data Dictionary Crosswalk
No ratings yet
Data Dictionary Crosswalk
119 pages
MD Python GUI Designer
No ratings yet
MD Python GUI Designer
15 pages
Childhood Asthma Prediction Model
No ratings yet
Childhood Asthma Prediction Model
9 pages
Yelp Project Report
100% (1)
Yelp Project Report
80 pages
Group B10 Klockner
No ratings yet
Group B10 Klockner
10 pages
Data Analysis
No ratings yet
Data Analysis
16 pages
TensorFlow Vs Theano Vs Torch Vs Keras - Deep Learning Library
No ratings yet
TensorFlow Vs Theano Vs Torch Vs Keras - Deep Learning Library
10 pages
Data Science Career Boost
No ratings yet
Data Science Career Boost
39 pages
Simplified Post-Omnichannel Campaign Planning
No ratings yet
Simplified Post-Omnichannel Campaign Planning
8 pages
PG Certification in Data Science Online
No ratings yet
PG Certification in Data Science Online
8 pages
Predictive Analytics Summit 2016 Jakarta
No ratings yet
Predictive Analytics Summit 2016 Jakarta
4 pages
Bosch Sensortec Product Overview
100% (1)
Bosch Sensortec Product Overview
16 pages
Capgemini - Starbucks (Tacking The Starbuck Experience Digital)
No ratings yet
Capgemini - Starbucks (Tacking The Starbuck Experience Digital)
8 pages
Clevered Brochure 6-8 Years
No ratings yet
Clevered Brochure 6-8 Years
24 pages
DRIVE BCSD EtherCAT Installation Manual
No ratings yet
DRIVE BCSD EtherCAT Installation Manual
72 pages
Data Science Engineering Full Time Program Brochure
No ratings yet
Data Science Engineering Full Time Program Brochure
21 pages
Data Science Questions and Answers
No ratings yet
Data Science Questions and Answers
4 pages
Apparel Trends for Young Consumers
No ratings yet
Apparel Trends for Young Consumers
15 pages
PowerBI Insights for Resellers
No ratings yet
PowerBI Insights for Resellers
3 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Abstraction and Interface
No ratings yet
Abstraction and Interface
17 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
Data Analytics Process
No ratings yet
Data Analytics Process
9 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
47 pages
Examining The New Customer Journey Ebook - Infillion PDF
No ratings yet
Examining The New Customer Journey Ebook - Infillion PDF
15 pages
Data Visualization in Python
No ratings yet
Data Visualization in Python
7 pages
Unix Operating System Lab Lab Practical Exercise-1
No ratings yet
Unix Operating System Lab Lab Practical Exercise-1
17 pages
SciPy for Data Scientists
No ratings yet
SciPy for Data Scientists
1 page
Ca DPQ Qa
No ratings yet
Ca DPQ Qa
8 pages
HP-UX Reference (11i v3 07 - 02) - 3 Library Functions A-M (Vol 6)
No ratings yet
HP-UX Reference (11i v3 07 - 02) - 3 Library Functions A-M (Vol 6)
808 pages
Range Sensor-Based Assistive Technology Solutions For People With Visual Impairment: A Review
No ratings yet
Range Sensor-Based Assistive Technology Solutions For People With Visual Impairment: A Review
5 pages
Aerohive PPSK User Management Guide
No ratings yet
Aerohive PPSK User Management Guide
29 pages
Headless CMS Implementation Guide
No ratings yet
Headless CMS Implementation Guide
7 pages
IT Infrastructure Setup Guide
No ratings yet
IT Infrastructure Setup Guide
29 pages
Icp Lab #4: Input and Output Devices
No ratings yet
Icp Lab #4: Input and Output Devices
10 pages
Big Data & IoT Insights for Businesses
No ratings yet
Big Data & IoT Insights for Businesses
22 pages
Image To Caption Generator
No ratings yet
Image To Caption Generator
7 pages
Lec 01 Introduction
No ratings yet
Lec 01 Introduction
116 pages
Novel Image Fusion Technique Based On DWT & MSVD
No ratings yet
Novel Image Fusion Technique Based On DWT & MSVD
6 pages
Ez Win Answer Codm
No ratings yet
Ez Win Answer Codm
65 pages
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
No ratings yet
Supervised Vs Unsupervised Learning What S The Difference IBM 24062021 035331pm
9 pages

Data Wrangling Cheat Sheet

Uploaded by

Data Wrangling Cheat Sheet

Uploaded by

> Advanced Indexing Also see NumPy Arrays > Combining Data

Python For Data Science

Data Wrangling in Pandas Cheat Sheet >>>

df3.loc[:,df3.isnull().any()] #Select cols with NaN

df3.loc[:,df3.notnull().all()] #Select cols without NaN

Learn Data Wrangling online at www.DataCamp.com Indexing With isin()

>>> df3.filter(items=”a”,”b”]) #Filter on values

Where >>> pd.merge(data1,

> Reshaping Data >>> s.where(s > 0) #Subset the data

>>> df6.query('second > first') #Query DataFrame

>>> df3= df2.pivot(index='Date', #Spread rows into columns

values='Value') >>> df.set_index('Country') #Set the index

>>> df4 = df.reset_index() #Reset the index

>>> df = df.rename(index=str, #Rename

Forward Filling Backward Filling

0 Belgium Brussels 11190846

Stack / Unstack 2 Brazil Brasília 207847528

3 Brazil Brasília 207847528 4 3

>>> stacked.unstack() #Pivot a level of index labels

>>> pd.concat([s,s2],axis=1, keys=['One','Two'])

>>> tuples = list(zip(*arrays))

>>> pd.concat([data1, data2], axis=1, join='inner')

>>> df6 = pd.DataFrame(np.random.rand(3, 2), index=index)

Melt >>> df2.set_index(["Date", "Type"])

value_name="Observations") >>> df2['Date']= pd.date_range('2000-1-1',

>>> s3.unique() #Return unique values

>>> df2.duplicated('Type') #Check duplicates

>>> df2.drop_duplicates('Type', keep='last') #Drop duplicates

>>> df.index.duplicated() #Check index duplicates >>> index = pd.DatetimeIndex(dates)

>>> index = pd.date_range(datetime(2012,2,1), end, freq='BM')

> Grouping Data

>>> import matplotlib.pyplot as plt

>>> df4.groupby(level=0).agg({'a':lambda x:sum(x)/len(x), 'b': np.sum}) >>> plt.show() >>> plt.show()

>>> df.iterrows() #(Row-index, Series) pairs

> Missing Data

>>> df.dropna() #Drop NaN values

>>> df3.fillna(df3.mean()) #Fill NaN values with a predetermined value

>>> df2.replace("a", "f") #Replace values with others

You might also like