Pandas

Pandas is a Python package for data manipulation and analysis that allows users to import, clean, transform and visualize data in the form of labeled data structures called DataFrames that are similar to spreadsheets; common tasks involve importing and exporting CSV files, calculating summary statistics, handling missing values, and merging DataFrames from multiple sources.

Uploaded by

karthikeyan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views20 pages

Pandas

Uploaded by

karthikeyan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Introduction to Pandas

Pandas
• Pandas is a software package written for the Python
programming language for data manipulation and analysis.
• For more information about pandas, go to the website of
http://pandas.pydata.org/.
Travel Time Index Dataset
• We will be utilizing the Travel Time Index (TTI) dataset in this lecture. TTI serves as a metric
for average travel conditions, offering insights into the extent to which travel times are
extended during congestion in comparison to periods of light traffic. For more
comprehensive information about this dataset, please refer to our publication, which can be
accessed at the following link: https://ascelibrary.org/doi/abs/10.1061/9780784484876.040
• Download travel time index (TTI) data from
– https://uh.edu/tech/cm-lab/hourly_tti.csv
• Download weather data from the same city:
– https://uh.edu/tech/cm-lab/weather.csv
• Here is a link to a Google Colab notebook that shows how to use Pandas to work with and
study the TTI dataset:
– https://colab.research.google.com/drive/1IDsXyqocsJzJ42pDxxYKwyzYZ-Fg_Siw?usp=sharing
What is the Travel Time Index (TTI)?
• Travel time index is a metric used to measure the relative travel
time on a road network compared to the ideal or free-flow
travel time.
• TTI is typically expressed as a ratio or percentage.
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒
𝑇𝑇𝐼 =
𝑓𝑟𝑒𝑒 𝑓𝑙𝑜𝑤 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒
• A TTI value of 1.3, for example, indicates a 20-minute free-flow
trip requires 26 minutes.
90th or 95th percentile travel times
• The 90th or 95th percentile travel times serve as a
straightforward method to gauge the reliability of travel
durations. They provide an estimation of the extent of delays
on specific routes during peak traffic periods, particularly on
the heaviest traffic days.
The 90th or 95th percentile travel times
gauge travel reliability, estimating delays
on busy routes during peak traffic,
especially on heavy traffic days.

The buffer index advises travelers to allocate extra time

for trips to ensure timely arrival. For example, with a 40
percent buffer index, a 20-minute trip would require an
additional 8 minutes, totaling 28 minutes for a 95 percent
on-time arrival.
95𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒
𝐵𝐼 = ∗ 100
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑡𝑟𝑎𝑣𝑒𝑙𝑡 𝑖𝑚𝑒

The planning time index includes total travel time

needed for on-time arrival, while the buffer index
adds extra time. A planning time index of 1.60
means a 15-minute trip requires 24 minutes for 95
percent on-time arrival.
Topics
Topics Video Colab Link
Introduction to Pandas Video N.A.
Dataframe Video N.A.
Create a dataframe Video link
Import and Export CSV file Video link
Import from google spreadsheet N.A. link
Summary statistics of dataframe Video link
Count unique values of a column Video link
Missing data Video link
Delete columns Video link
Create a standard date time format Video link
Merge two datasets through a common column Video link
Groupby N.A. link
Lagged variable N.A. link
Different ways to access external data in
Google Colab
• In Google Colab, users have several convenient ways to access external data. These methods
include:
– Drag and Drop Files: Users can easily upload files from their local computer to Colab by simply
dragging and dropping them into the Colab environment. This provides a straightforward and
user-friendly way to import data.
– Import Data from Google Drive: Another method is to import data from Google Drive. To do this,
follow these steps:
• Ensure that the file you want to access is shared with "Anyone with the link.“
• Copy the file ID from the shared link of the file in Google Drive.
• In Colab, you can use the ‘gdown’ library to download the file using its file ID. For example:
!gdown https://drive.google.com/uc?id=YOUR_FILE_ID
– Fetching Data from the Web: Data can also be obtained directly from the web by downloading or
scraping it. You can use the ‘wget’ command to download data from a URL and save it to your
Colab environment. For instance:
!wget -O test.csv https://www.example.com/data.csv
• Example: https://colab.research.google.com/drive/1LCXDGbWvM3ozUwSpAGc4UgjZk4gO7vRQ?usp=sharing
DataFrame

• DataFrame is a 2-dimensional labeled

data structure with columns of potentially
different types. You can think of it like a
spreadsheet. Most of the data we use in
this course is in the format of DataFrame.
Import and export csv file from a URL link
• There are two methods for handling CSV data from a URL link.
– Method 1: Download the CSV file to Google Colab, then read it with
Pandas.

– Method 2: Read CSV directly from a URL link.

• Example link
Summary Statistics
• Summary statistics are essential for gaining quick insights into your dataset.
• Pandas provides an easy way to calculate these statistics using the describe()
function.

• Example link: https://colab.research.google.com/drive/1b4t6AGeHzQEZR-

50EeFlLfBLlMOavl5B?usp=share_link#scrollTo=yKVlaaQmXNZL
Displaying Column Names in Pandas
• Knowing the column names in your dataset is crucial for data
manipulation and analysis.
• In Pandas, we can easily display all the column names: use the
‘.columns’ attribute of your DataFrame.
Counting and Displaying Unique Values in a
Column with Pandas
• Counting and displaying unique values within a
column is essential for understanding the
diversity of your data. In Pandas, we can easily
achieve this.
• To count and display unique values in a Pandas
DataFrame column:
– Use the .unique() function to get the unique
values.
– Use the .nunique() function to count them.
• Example link:
– https://colab.research.google.com/drive/1hZUhcf_tn
mRx3SizbG7sjYnmwy6F414D?usp=share_link#scrollTo
=t5dO41wqYmM_
Removing Missing Values using .dropna() in
Pandas
• Handling missing values is a crucial
step in data preprocessing.
• Pandas provides the .dropna() function
to remove rows containing missing
values.
• To remove rows with missing values:
– Use df.dropna() without any additional
arguments.
– This will remove any row containing at
least one missing value.
• Example link:
– https://colab.research.google.com/drive/1
9pJ4Q9UCeF-
8DWFLe_VKUSF3sUfP2GON?usp=share_lin
k#scrollTo=3GN4R9U4c4yw
Creating New Columns Based on Existing Data
in Pandas
• Often, you may need to create new
columns in your dataset based on
values from existing columns.
Pandas provides a straightforward
way to achieve this.
• To create a new column based on
existing data:
– Use the DataFrame assignment
operator (=) to define the new column.
– Utilize operations or functions with
existing columns.
Datetime Format
• Standardizing datetime data formats to
a common standard is essential for
ensuring consistency in data analysis.
To standardize datetime data format
(e.g., 9/13/2017, September 13, 2017,
2017-9-13), we can use the
to_datetime().
• Example link:
– https://colab.research.google.com/
drive/1oGDEUkwDsFQJyM8vZOIFLC
MpgikZx2aT?usp=share_link#scrollT
o=kn2pDmSle71S
Groupby
• In some cases, we want to split the data into
subsets and apply some functionality on each
subset. Grouping and aggregating data is a
fundamental operation in data analysis,
allowing us to gain insights from structured
data. Pandas' groupby() function is a powerful
tool for achieving this. It enables us to group
data based on a specific column's values,
facilitating subsequent analysis and summary
statistics for each group.
• Example link:
– https://colab.research.google.com/drive/1
vKax-
2pIZJSel_wQ51ddKZSLJ74gBJgr?usp=sharin
g#scrollTo=lM1WSNMt8T0M
Combining Two DataFrames
• Combining data from multiple sources is a common task in
data analysis. Pandas provides the powerful merge() function
for merging DataFrames.
Convert categorical variable into dummy/indicator
variables
• A lot of real world data are discrete or categorical. For
example, the weekday variable takes discrete values of 0, 1, 2,
..., 6. To use machine learning models in sklearn, we need to
convert these categorical variables into dummy variables using
get_dummies().
Exercises
1. Create a weekday column for the TTI dataframe.
2. Create a new column showing previous hour’s tti value.
3. Create a new column showing last year’s tti value (on the
same hour of the same day of the same month).

Ship Security Alert Option Operation Manual: Inmarsat-C Mobile Earth Station
100% (3)
Ship Security Alert Option Operation Manual: Inmarsat-C Mobile Earth Station
78 pages
Comprehensive Pandas Guide
No ratings yet
Comprehensive Pandas Guide
171 pages
Python Data Analysis Guide
100% (3)
Python Data Analysis Guide
72 pages
Shashank Bodduna: Informatics Practices Project XII
No ratings yet
Shashank Bodduna: Informatics Practices Project XII
20 pages
Python Data Analysis Libraries Guide
100% (1)
Python Data Analysis Libraries Guide
43 pages
DotNetNuke 7.0.6 SuperUser Manual
No ratings yet
DotNetNuke 7.0.6 SuperUser Manual
1,413 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Introduction To Data Science
100% (6)
Introduction To Data Science
227 pages
Configuration Guide - IDirect 4.3
No ratings yet
Configuration Guide - IDirect 4.3
9 pages
Python & Pandas for Beginners
No ratings yet
Python & Pandas for Beginners
7 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
47 pages
Data Cleaning Course Notes
No ratings yet
Data Cleaning Course Notes
27 pages
Pandas PDF
No ratings yet
Pandas PDF
171 pages
OS/2 Programming Guide: 4777 Magnetic Stripe Unit and 4778 PIN-Pad Magnetic Stripe Reader
No ratings yet
OS/2 Programming Guide: 4777 Magnetic Stripe Unit and 4778 PIN-Pad Magnetic Stripe Reader
132 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
European Computer Driving Licence (ECDL) and International Computer Driving Licence (ICDL) European Question and Test Base (EQTB)
No ratings yet
European Computer Driving Licence (ECDL) and International Computer Driving Licence (ICDL) European Question and Test Base (EQTB)
52 pages
Python Pandas Beginner's Guide
No ratings yet
Python Pandas Beginner's Guide
45 pages
Microsoft Visual C++
100% (1)
Microsoft Visual C++
197 pages
Getting Started Guide IPM 8
No ratings yet
Getting Started Guide IPM 8
24 pages
Efficient Data Preparation: With Python
No ratings yet
Efficient Data Preparation: With Python
19 pages
Pandas
No ratings yet
Pandas
41 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
More On Pandas
No ratings yet
More On Pandas
51 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
12 pages
Network Scanner Tool User's Guide
No ratings yet
Network Scanner Tool User's Guide
42 pages
Labs For Book
0% (2)
Labs For Book
8 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Session 4
No ratings yet
Session 4
22 pages
AQA-COMP1-W-MS-Jun09 Version 1.1
No ratings yet
AQA-COMP1-W-MS-Jun09 Version 1.1
31 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
Python Libraries for Data Science
No ratings yet
Python Libraries for Data Science
96 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Data Analytics and Reporting - Notes Unit 1 and 2
No ratings yet
Data Analytics and Reporting - Notes Unit 1 and 2
11 pages
Daniel Sam Joseph: Informatics Practices Project XII
No ratings yet
Daniel Sam Joseph: Informatics Practices Project XII
20 pages
Troubleshooting SCCM OSD Errors
No ratings yet
Troubleshooting SCCM OSD Errors
9 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Python Data Analysis Tutorial
No ratings yet
Python Data Analysis Tutorial
47 pages
1 Security Fundamentals
No ratings yet
1 Security Fundamentals
14 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
CE880 Lecture3 Slides
No ratings yet
CE880 Lecture3 Slides
44 pages
Talent Test Junior It Operations Engineer
No ratings yet
Talent Test Junior It Operations Engineer
7 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
39 pages
Palak
No ratings yet
Palak
10 pages
Midterm Exam It 111
No ratings yet
Midterm Exam It 111
3 pages
Lecture 7 Working With Pandas
No ratings yet
Lecture 7 Working With Pandas
15 pages
Realtek Wi-Fi SDK For Android KK 4 4
No ratings yet
Realtek Wi-Fi SDK For Android KK 4 4
15 pages
Welcome To Reason Pianos: - License Text File
No ratings yet
Welcome To Reason Pianos: - License Text File
3 pages
Rest of The Ip Project
No ratings yet
Rest of The Ip Project
26 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Excel HLOOKUP Function
No ratings yet
Excel HLOOKUP Function
4 pages
Python For Data Analysis Edgar
No ratings yet
Python For Data Analysis Edgar
49 pages
Dev Record Final
No ratings yet
Dev Record Final
34 pages
Visual Basic6
No ratings yet
Visual Basic6
31 pages
IL Attorney General Computer Forensics Report Summary - Annabel Melongo, Save-A-Life Foundation
100% (2)
IL Attorney General Computer Forensics Report Summary - Annabel Melongo, Save-A-Life Foundation
14 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Zoom Recording Instructions
No ratings yet
Zoom Recording Instructions
4 pages
Https Upload Download Guide
No ratings yet
Https Upload Download Guide
25 pages
Computer Software Concept
No ratings yet
Computer Software Concept
11 pages
CA - 274 - Health Care Provider Directory
No ratings yet
CA - 274 - Health Care Provider Directory
61 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
No ratings yet
Introduction To Data Analytics: Instructor: Parisa Pouladzadeh Email: Parisa - Pouladzadeh@humber - Ca
81 pages
Data Science Lab Group Submission
No ratings yet
Data Science Lab Group Submission
13 pages
Lecture Week4
No ratings yet
Lecture Week4
50 pages
V150 USB File Access Guide-D08-00-059 A00
No ratings yet
V150 USB File Access Guide-D08-00-059 A00
2 pages
Data Visulization Chapter 2
No ratings yet
Data Visulization Chapter 2
24 pages
Ssc-I Final Model Paper Cs 2023-24 (Solution)
No ratings yet
Ssc-I Final Model Paper Cs 2023-24 (Solution)
10 pages
Vtuber Avatar Setup Guide for OBS
No ratings yet
Vtuber Avatar Setup Guide for OBS
4 pages
Lecture Week2
No ratings yet
Lecture Week2
72 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Version
No ratings yet
Version
113 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Week 2 - Data Exploration
No ratings yet
Week 2 - Data Exploration
8 pages
Removable Storage Devices
No ratings yet
Removable Storage Devices
7 pages
Focal Easy Guide To Premiere Pro For New Users and Professionals 1st Edition Tim Kolb Instant Download
No ratings yet
Focal Easy Guide To Premiere Pro For New Users and Professionals 1st Edition Tim Kolb Instant Download
61 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Pandas
No ratings yet
Pandas
50 pages
Data Handling Module
No ratings yet
Data Handling Module
10 pages
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
No ratings yet
W04L01 - FA23 - AIC270 - Programming For AI - Syed Ahmed
66 pages
DEV Project
No ratings yet
DEV Project
11 pages
AI Student HandbookXII 2025-26!8!20
No ratings yet
AI Student HandbookXII 2025-26!8!20
13 pages
24UAD315 DEV Final Record
No ratings yet
24UAD315 DEV Final Record
49 pages

Pandas

Uploaded by

Pandas

Uploaded by

Introduction to Pandas

The buffer index advises travelers to allocate extra time

The planning time index includes total travel time

• DataFrame is a 2-dimensional labeled

– Method 2: Read CSV directly from a URL link.

• Example link: https://colab.research.google.com/drive/1b4t6AGeHzQEZR-

You might also like