0% found this document useful (0 votes)

10 views20 pages

Pandas

The document provides an overview of the Pandas library, detailing its key structures: Series (1D) and DataFrame (2D), along with their attributes and examples of creation, indexing, and operations. It also covers Index objects, data input/output methods, and data inspection techniques, emphasizing practical applications for data manipulation and analysis. Key differences between Series and DataFrames are highlighted, along with examples of reading and writing data from various sources.

Uploaded by

Raaz gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views20 pages

Pandas

Uploaded by

Raaz gaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

Pandas

1. Series: The Fundamental 1D Structure

A Series is a one-dimensional labeled array capable of holding any data type.

Key Attributes:

 values: Underlying data array

 index: Axis labels

 dtype: Data type

 name: Series name

 size: Number of elements

Example 1: Basic Series Creation

python

import pandas as pd

import numpy as np

# Create series from list with default index

s1 = pd.Series([10, 20, 30, 40])

print("s1:\n", s1)

# Create series with custom index

s2 = pd.Series([1.1, 2.2, 3.3], index=['a', 'b', 'c'], name='temperatures')

print("\ns2:\n", s2)

Output:

text

s1:

0 10
1 20

2 30

3 40

dtype: int64

s2:

a 1.1

b 2.2

c 3.3

Name: temperatures, dtype: float64

Example 2: Series from Dictionary (Automatic Index Alignment)

python

population_dict = {'New York': 8.4, 'London': 9.0, 'Tokyo': 13.9}

pop_series = pd.Series(population_dict)

print("Population Series:\n", pop_series)

# Add new city (demonstrating index alignment)

new_data = {'London': 9.1, 'Paris': 2.2}

pop_series = pop_series.add(pd.Series(new_data), fill_value=0)

print("\nUpdated Population:\n", pop_series)

Output:

text

Population Series:

New York 8.4

London 9.0

Tokyo 13.9
dtype: float64

Updated Population:

London 9.1

New York 8.4

Paris 2.2

Tokyo 13.9

dtype: float64

Example 3: Vectorized Operations & Filtering

python

# Vectorized operations

squares = pd.Series([1, 4, 9, 16, 25])

sqrt = np.sqrt(squares)

print("Square Roots:\n", sqrt)

# Boolean filtering

filtered = sqrt[sqrt > 2]

print("\nValues > 2:\n", filtered)

# Index-based operations

reindexed = filtered.reindex([0, 2, 4], fill_value=0)

print("\nReindexed:\n", reindexed)

Output:

text

Square Roots:

0 1.0
1 2.0

2 3.0

3 4.0

4 5.0

dtype: float64

Values > 2:

2 3.0

3 4.0

4 5.0

dtype: float64

Reindexed:

0 0.0

2 3.0

4 5.0

dtype: float64

2. DataFrame: The Primary 2D Structure

A DataFrame is a two-dimensional labeled data structure with columns of potentially different

types.

Key Attributes:

 index: Row labels

 columns: Column labels

 dtypes: Column data types

 shape: (rows, columns) dimensions

 values: Underlying data array

Example 1: DataFrame Creation

python

# From dictionary of lists

data = {

'City': ['London', 'New York', 'Tokyo'],

'Population (M)': [9.2, 8.4, 13.9],

'Country': ['UK', 'USA', 'Japan']

df = pd.DataFrame(data, index=['A', 'B', 'C'])

print("DataFrame:\n", df)

# Add new column

df['GDP (B)'] = [465, 1940, 1020]

print("\nWith GDP:\n", df)

Output:

text

DataFrame:

City Population (M) Country

A London 9.2 UK

B New York 8.4 USA

C Tokyo 13.9 Japan

With GDP:

City Population (M) Country GDP (B)

A London 9.2 UK 465

B New York 8.4 USA 1940

C Tokyo 13.9 Japan 1020

Example 2: Indexing and Selection

python

# Column selection

print("Cities:\n", df['City'])

# Row selection by label

print("\nRow B:\n", df.loc['B'])

# Row selection by position

print("\nFirst row:\n", df.iloc[0])

# Boolean indexing

print("\nLarge cities:\n", df[df['Population (M)'] > 10])

# Multi-axis selection

print("\nSpecific selection:\n", df.loc['C', ['City', 'GDP (B)']])

Output:

text

Cities:

A London

B New York

C Tokyo

Name: City, dtype: object

Row B:

City New York

Population (M) 8.4

Country USA

GDP (B) 1940

Name: B, dtype: object

First row:

City London

Population (M) 9.2

Country UK

GDP (B) 465

Name: A, dtype: object

Large cities:

City Population (M) Country GDP (B)

C Tokyo 13.9 Japan 1020

Specific selection:

City Tokyo

GDP (B) 1020

Name: C, dtype: object

Example 3: Handling Missing Data

python

# Create DF with missing values

df_missing = pd.DataFrame({

'A': [1, None, 3],

'B': [np.nan, 5, 6],

'C': [7, 8, None]

})

print("Original:\n", df_missing)

# Fill missing with column mean

df_filled = df_missing.fillna(df_missing.mean())

print("\nFilled:\n", df_filled)

# Drop rows with any missing values

df_dropped = df_missing.dropna()

print("\nDropped:\n", df_dropped)

Output:

text

Original:

A B C

0 1.0 NaN 7.0

1 NaN 5.0 8.0

2 3.0 6.0 NaN

Filled:

A B C

0 1.0 5.5 7.0

1 2.0 5.0 8.0

2 3.0 6.0 7.5

Dropped:

Empty DataFrame

Columns: [A, B, C]

Index: []

Example 4: Advanced Operations

python

# Add calculated column

df['GDP per Capita'] = df['GDP (B)'] * 1000 / (df['Population (M)'] * 1e6)

print("With GDP per Capita:\n", df)

# Apply function across rows

df['Size Category'] = df.apply(

lambda row: 'Large' if row['Population (M)'] > 10 else 'Medium',

axis=1

print("\nWith Size Category:\n", df)

# Transpose data

print("\nTransposed:\n", df.T)

Output:

text

With GDP per Capita:

City Population (M) Country GDP (B) GDP per Capita

A London 9.2 UK 465 50.543478

B New York 8.4 USA 1940 230.952381

C Tokyo 13.9 Japan 1020 73.381295

With Size Category:

City Population (M) Country GDP (B) GDP per Capita Size Category

A London 9.2 UK 465 50.543478 Medium

B New York 8.4 USA 1940 230.952381 Medium

C Tokyo 13.9 Japan 1020 73.381295 Large

Transposed:

A B C

City London New York Tokyo

Population (M) 9.2 8.4 13.9

Country UK USA Japan

GDP (B) 465 1940 1020

GDP per Capita 50.543478 230.952381 73.381295

Size Category Medium Medium Large

3. Index Objects: The Core Labeling System

Pandas Index objects are immutable arrays implementing advanced set operations.

Key Features:

 Immutable

 Supports set operations (union, intersection, difference)

 Can be hierarchical (MultiIndex)

Example:

python
# Create custom index

index = pd.Index(['X', 'Y', 'Z'], name='coordinates')

# Set operations

index1 = pd.Index(['A', 'B', 'C'])

index2 = pd.Index(['B', 'C', 'D'])

print("Union:", index1.union(index2))

print("Intersection:", index1.intersection(index2))

print("Difference:", index1.difference(index2))

# Hierarchical index (MultiIndex)

multi_index = pd.MultiIndex.from_tuples(

[('Group1', 'A'), ('Group1', 'B'), ('Group2', 'A')],

names=['Group', 'Subgroup']

print("\nMultiIndex:\n", multi_index)

Output:

text

Union: Index(['A', 'B', 'C', 'D'], dtype='object')

Intersection: Index(['B', 'C'], dtype='object')

Difference: Index(['A'], dtype='object')

MultiIndex:

MultiIndex([('Group1', 'A'),

('Group1', 'B'),

('Group2', 'A')],
names=['Group', 'Subgroup'])

Key Differences Between Series and DataFrames

Feature Series DataFrame

Dimensions 1D 2D

Data Homogeneous Heterogeneous (per column)

Indexing Single index Row and column indexes

Data Access Single [] operator Multiple [] or .loc/.iloc

Memory Efficiency Higher Lower

Use Cases Single variable analysis Multi-variable datasets

Data I/O & Data Inspection

Covering pd.read_csv(), pd.read_excel(), df.to_csv(), and key inspection methods with practical
examples.

1. Reading Data

CSV Files: pd.read_csv()

Essential Parameters & Usage:

python

df = pd.read_csv(

'data.csv',

sep=',', # Delimiter

header=0, # Header row (set None for no header)

index_col='id', # Column to use as index

usecols=['col1','col2'], # Columns to read

dtype={'price': float}, # Data type specification

parse_dates=['date'], # Parse dates

na_values=['NA', '?'], # Custom missing value markers

skiprows=5, # Skip first 5 rows

nrows=1000, # Read only first 1000 rows

encoding='latin-1' # Handle special characters

Example:

python

# Read sales data

sales_df = pd.read_csv(

'sales.csv',

parse_dates=['order_date'],

na_values=['MISSING'],

dtype={'order_id': 'string', 'quantity': int}

print("Data shape:", sales_df.shape)

Excel Files: pd.read_excel()

Key Parameters:

python

df = pd.read_excel(

'data.xlsx',

sheet_name='Sheet1', # Sheet name or index

header=1, # Start reading from row 1

usecols='A:D,F', # Column range (Excel-style)

skipfooter=3, # Skip last 3 rows

engine='openpyxl' # Required for .xlsx files

)

Example:

python

# Read multi-sheet Excel

with pd.ExcelFile('financials.xlsx') as xls:

q1_df = pd.read_excel(xls, 'Q1_Sales')

expenses_df = pd.read_excel(xls, 'Q1_Expenses')

print("Q1 Sales Columns:", q1_df.columns.tolist())

2. Writing Data

CSV Export: df.to_csv()

Essential Parameters:

python

df.to_csv(

'output.csv',

index=False, # Exclude index

columns=['col1','col3'], # Selected columns

header=['First','Third'], # Custom column names

na_rep='NULL', # Missing value representation

float_format='%.2f', # Format floats

encoding='utf-8-sig', # Add BOM for Excel compatibility

date_format='%Y-%m-%d' # Date formatting

Example:

python
# Process and export data

sales_df['total'] = sales_df['unit_price'] * sales_df['quantity']

sales_df.to_csv(

'processed_sales.csv',

index=False,

columns=['order_id', 'order_date', 'total'],

na_rep='MISSING',

float_format='%.0f' # No decimal places

3. Data Inspection Methods

Initial Glance: df.head() & df.tail()

python

print("First 3 rows:")

print(sales_df.head(3)) # First n rows (default 5)

print("\nLast 2 rows:")

print(sales_df.tail(2)) # Last n rows

Sample Output:

text

First 3 rows:

order_id order_date unit_price quantity

0 ORD001 2023-01-15 49.99 2

1 ORD002 2023-01-16 19.95 1

2 ORD003 2023-01-16 199.00 NA

Last 2 rows:

order_id order_date unit_price quantity

98 ORD099 2023-03-30 14.99 3

99 ORD100 2023-03-31 79.99 1

Structural Overview: df.info()

python

sales_df.info(verbose=True, show_counts=True)

Output:

text

RangeIndex: 100 entries, 0 to 99

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 order_id 100 non-null string

1 order_date 100 non-null datetime64[ns]

2 unit_price 97 non-null float64

3 quantity 96 non-null Int64

dtypes: Int64(1), datetime64[ns](1), float64(1), string(1)

memory usage: 3.2 KB

Key insights:

 Detects 3 missing values in unit_price

 Shows 4 missing values in quantity

 Confirms correct data types

 Reports memory usage

Statistical Summary: df.describe()

python

print(sales_df.describe(include='all', datetime_is_numeric=True))

Output:

text

order_id order_date unit_price quantity

count 100 100 97.000000 96.000000

unique 100 NaN NaN NaN

top ORD076 NaN NaN NaN

freq 1 NaN NaN NaN

mean NaN 2023-02-14 12:00 76.492474 2.187500

min NaN 2023-01-15 00:00 5.990000 1.000000

25% NaN 2023-01-28 00:00 24.990000 1.000000

50% NaN 2023-02-15 00:00 49.990000 2.000000

75% NaN 2023-03-15 00:00 99.990000 3.000000

max NaN 2023-03-31 00:00 499.000000 10.000000

std NaN NaN 87.312073 1.782456

Key insights:

 Categorical columns: Unique counts, top values

 Numeric columns: Mean, percentiles, min/max

 Datetime columns: First/last dates, distribution

 Missing values: Excluded from calculations

4. Practical Workflow Example

python

# 1. Read data from multiple sources

sales_data = pd.read_csv('sales.csv', parse_dates=['date'])

inventory = pd.read_excel('inventory.xlsx', sheet_name='Stock')

# 2. Initial inspection

print("Sales Data Overview:")

sales_data.info()

print("\nMissing Values Check:")

print(sales_data.isna().sum())

# 3. Data cleaning

sales_data.fillna({'quantity': 1}, inplace=True)

sales_data.dropna(subset=['product_id'], inplace=True)

# 4. Advanced inspection

print("\nStatistical Summary:")

print(sales_data.describe(percentiles=[0.1, 0.9]))

# 5. Export processed data

sales_data.to_csv('cleaned_sales.csv', index=False)

5. Pro Tips for Efficient Inspection

1. Custom Summary Function:

python

def quick_summary(df):

return pd.DataFrame({

'dtype': df.dtypes,

'missing': df.isna().sum(),
'unique': df.nunique(),

'min': df.min(numeric_only=True),

'max': df.max(numeric_only=True)

})

print(quick_summary(sales_data))

2. Memory Optimization:

python

# Downcast numeric columns

sales_data['quantity'] = pd.to_numeric(sales_data['quantity'], downcast='integer')

# Convert to categorical

sales_data['category'] = sales_data['category'].astype('category')

3. Large File Sampling:

python

# Read 1% random sample of large CSV

sample_df = pd.read_csv('big_data.csv', skiprows=lambda x: x>0 and random.random() > 0.01)

Key Takeaways:

Task Method Key Insight

Read CSV pd.read_csv() Handle encoding, dates, missing values

Read Excel pd.read_excel() Specify sheets, skip headers/footers

Export Data df.to_csv() Control index, columns, missing value rep

Initial Inspection df.head()/tail() Verify data ingestion

Structural Analysis df.info() Check dtypes, memory, missing values

Statistical Summary df.describe() Understand distributions, outliers

Task Method Key Insight

Advanced Inspection Custom functions Create tailored data quality checks

MR 20 Nissan PDF
88% (8)
MR 20 Nissan PDF
1,639 pages
Unit 2
No ratings yet
Unit 2
81 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python 2.1.2
No ratings yet
Python 2.1.2
7 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Class 12 Panda Project
No ratings yet
Class 12 Panda Project
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
L32, 33 Pandas
No ratings yet
L32, 33 Pandas
7 pages
Day08-Pandas-Tutorial: Pandas - by Punith V T
No ratings yet
Day08-Pandas-Tutorial: Pandas - by Punith V T
8 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Pandas
No ratings yet
Pandas
36 pages
Making Salts
No ratings yet
Making Salts
29 pages
Pandas
No ratings yet
Pandas
63 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
No ratings yet
Pandas Basics Cheat Sheet Python For Data Science: Retrieving Series/Dataframe Information
1 page
Pandas
No ratings yet
Pandas
13 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Dataframe
No ratings yet
Dataframe
19 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Pandas
No ratings yet
Pandas
27 pages
Pandas
No ratings yet
Pandas
44 pages
1.1.1 Binary Systems Worksheet
No ratings yet
1.1.1 Binary Systems Worksheet
5 pages
Pandas
No ratings yet
Pandas
8 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Physics Revision for A-Level Students
No ratings yet
Physics Revision for A-Level Students
12 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Mastercam PDF
0% (1)
Mastercam PDF
2 pages
Getting Start With Pandas
No ratings yet
Getting Start With Pandas
11 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas & PyNumS Essentials
No ratings yet
Pandas & PyNumS Essentials
10 pages
Sports Acoustics
No ratings yet
Sports Acoustics
43 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pinto - pm2 - Session 4 - Shared Slides
No ratings yet
Pinto - pm2 - Session 4 - Shared Slides
78 pages
Lab 9
No ratings yet
Lab 9
9 pages
Pandas
No ratings yet
Pandas
26 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
API 577 Closed Books Questions Answers
100% (6)
API 577 Closed Books Questions Answers
3 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
MODULE - Range and Kernel
No ratings yet
MODULE - Range and Kernel
23 pages
NetSDK Programming Manual
No ratings yet
NetSDK Programming Manual
49 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Pandas
No ratings yet
Pandas
5 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
DAVIE Peterbilt
100% (2)
DAVIE Peterbilt
103 pages
Cloud Security Notes
No ratings yet
Cloud Security Notes
16 pages
Pandas
No ratings yet
Pandas
7 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Lab in Industrial, Plant and Animal Biotechnology Student's Manual
No ratings yet
Lab in Industrial, Plant and Animal Biotechnology Student's Manual
58 pages
Fex Guide
No ratings yet
Fex Guide
60 pages
Pandas Research
No ratings yet
Pandas Research
14 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
Mechatronics Engineering Curriculum
No ratings yet
Mechatronics Engineering Curriculum
10 pages
Subject IP
No ratings yet
Subject IP
9 pages
Pharmaceuticals 18 00217
No ratings yet
Pharmaceuticals 18 00217
25 pages
S220 Loader Service Guide
No ratings yet
S220 Loader Service Guide
29 pages
General Anisotropic Elasticity: Abstract This Chapter Is An Introduction To General Anisotropic Elasticity, I.E. To The
100% (1)
General Anisotropic Elasticity: Abstract This Chapter Is An Introduction To General Anisotropic Elasticity, I.E. To The
56 pages
Object Oriented Programming Assignment
No ratings yet
Object Oriented Programming Assignment
24 pages
Mongoose
No ratings yet
Mongoose
8 pages
Mineral Processing with CrossFlow
No ratings yet
Mineral Processing with CrossFlow
2 pages
Middlewares Deep Dive
No ratings yet
Middlewares Deep Dive
7 pages
Antilock Brake System 4f
No ratings yet
Antilock Brake System 4f
24 pages
HTTP Header and Cookie
No ratings yet
HTTP Header and Cookie
6 pages
Coccinia Grandis
No ratings yet
Coccinia Grandis
9 pages
MONGODB Relationships
No ratings yet
MONGODB Relationships
5 pages
Chapter 18
No ratings yet
Chapter 18
9 pages
7 Network Flows: Objectives
No ratings yet
7 Network Flows: Objectives
14 pages
NM Release Notes en
No ratings yet
NM Release Notes en
11 pages
SR4850 Product Datasheet
No ratings yet
SR4850 Product Datasheet
9 pages
Power Systems Engineers Guide
No ratings yet
Power Systems Engineers Guide
7 pages
CAPA Test 1 2014 Regular
No ratings yet
CAPA Test 1 2014 Regular
3 pages
Big Data Computing: Week 8 Quiz
No ratings yet
Big Data Computing: Week 8 Quiz
3 pages
Abg10 2 Abg 35 2 Multiturn Bevel Gearbox Technical Datasheet en
No ratings yet
Abg10 2 Abg 35 2 Multiturn Bevel Gearbox Technical Datasheet en
2 pages
Edexcel IGCSE Chemistry 4CH0 Section B7
No ratings yet
Edexcel IGCSE Chemistry 4CH0 Section B7
2 pages