Informatics Practices
Class-XII
Data Handling
Using Pandas-I
(Introduction of Pandas, Data Structure
of Pandas, Operations on Series Object)
By Sumit Sir...(Shri Jain Public School) 1
Introduction to Python libraries: Pandas
• Pandas is the most popular python library that is
used for data analysis. It provides highly
optimized performance with back-end source
code is purely written in C or Python.
• Pandas has derived its name from “panel data
system”, which is an econometrics term for
multidimensional, structures data sets.
• The main author of Pandas is Wes McKinney.
• Pandas is an open source, BSD library built for
Python programming language. Pandas offers
high-performance, easy-to-use data structures
and data analysis tools.
By Sumit Sir...(Shri Jain Public School) 2
Why Pandas?
• It can reader write in many different data formats (integer, float, double, etc.)
• It can calculate in all ways data is organized i.e., across rows and down columns.
• It can easily select subsets of data from bulky data sets and even combine multiple
datasets together.
• It has functionality to find and fill missing data.
• It allows you to apply operations to independent groups within the data.
• It supports reshaping of data into different forms.
• Time series-specific functionality: date range generation and frequency
conversion, moving window statistics and date shifting.
• Size mutability: columns can be inserted and deleted from DataFrame and higher
dimensional objects.
• Robust IO tools for loading data from flat files (CSV and delimited), Excel files,
databases.
• Make it easy to convert ragged, differently-indexed data in other Python and
NumPy data structures into DataFrame objects.
• Flexible reshaping and pivoting of data sets.
• It supports visualization by integrating matplotlib and seaborn etc. libraries.
• The two primary data structures of pandas, Series (1-dimensional) and DataFrame
(2-dimensional), handle the vast majority of typical use cases in finance, statistics,
social science, and many areas of engineering.
By Sumit Sir...(Shri Jain Public School) 3
Pandas Data structures
• A data structure is a particular way of storing
and organizing data in a computer so as to
apply a specific type of functionality on them.
• Pandas offers many data structures to handle
variety of data. In which the rows and columns
can be identified and accessed with labels
rather than simple integer indices.
• When we use pandas library must import:
import pandas as pd
By Sumit Sir...(Shri Jain Public School) 4
1. Series
• A series is a Pandas data structure that
represents a one – dimensional array-like
object containing an array of data (of any
Numpy data type) and an associated array of
data labels called its index.
e.g. Some Series type objects
Index Data Index Data Index Data
0 21 Jan 31 A' 91
1 23 Feb 29 B' 81
2 18 Mar 31 C' 71
3 25 Apr 30 D' 61
By Sumit Sir...(Shri Jain Public School) 5
Creating Series Objects
1. Create empty Series Object by using just the series() with no parameter: e.g.
obj3=pd.Series()
print(obj3)
2. Creating non – empty Series objects
syntax: <series object> = pd.Series(data,index=idx)
Where data is the data part of the Series object, it can be one of the following:
a) A Python sequence b) An array c) Dictionary d) scalar value
i) obj1=pd.Series(range(5))
print(obj1)
ii) obj2=pd.Series([3.5, 5., 6.5, 8.,np.NaN])
print(objj2)
iii) nda1=np.arange(3,13,3.5) # must import numpy as np
print(nda1)
ser1=pd.Series(nda1) # using numpy array
print(ser1)
iv) obj5=pd.Series({‘Jan’ : 31, ‘Feb’: 29, ‘Mar’:31}) # using dictionary
print(obj5)
v) medalswon=pd.Series(10,index=range(0,1)) #using scalar value
print(medalswon)
medals2=pd.Series(15,index=range(1,6,2)) #using scalar value
print(medals2)
ser2=pd.Series(‘Yet to start’, index=[‘Bikaner’ , ’Delhi’, ’Pune’] #using scalar value
print(ser2)
By Sumit Sir...(Shri Jain Public School) 6
Creating Series Objects-Adding Functionality
i) Specifying/Adding NaN values in a Series object:
ser2=pd.Series([6.5,np.NaN,2.34])
print(ser2)
ii) Specify index as well as data with Series():
arr==[31,28,31,30]
mon=[“Jan”,”Feb”,”Mar”,”Apr”]
obj3=pd.Series(data=arr, index=mon)
print(obj3)
iii) Using a mathematical function
a=np.arange(9,13)
obj7=pd.Series(index=a, data=a*2)
print(obj7)
By Sumit Sir...(Shri Jain Public School) 7
Series Object Attributes
By Sumit Sir...(Shri Jain Public School) 8
Example of Series using attributes
By Sumit Sir...(Shri Jain Public School) 9
Accessing a Series Object and its Elements
Here some Series objects (obj5, obj6, obj7, obj8)
obj5 obj6
Feb 28 0 11
Jan 31 1 14
Mar 31 2 17
3 20
4 23
obj7 obj8
9 18 9 81
10 20 10 100
11 22 11 121
12 24 12 144
By Sumit Sir...(Shri Jain Public School) 10
• Accessing Individual Elements
e.g.
print(obj6[3])
print(obj5[‘Feb’])
• Extracting Slices from Series Object:
Slicing takes place position wise and not the index wise
in a series object i.e. first element gets the position as
0, second element gets the position as 1 and so on.
e.g.
print(obj5[1:])
print(obj6[2:5])
print(obj7[0: : 2])
print(obj7[: : -1])
By Sumit Sir...(Shri Jain Public School) 11
• Modifying Elements of Series Object:
e.g.
obj5[‘Feb’]=29
obj6[2:4]=-15.75
obj7.index=[‘a’, ’b’, ‘c’, ‘d’]
*Series objects are value-mutable but size-immutable objects.
• The head() and tail() functions:
The head() function is used to fetch first n rows from a pandas
object and tail() function returns last n rows from a pandas object.
syntax: <pandas object>.head([n])
<pandas object>.tail([n])
print(obj5.head())
if you do not provide any value for n, then head() and tail() will
return first 5 and last 5 rows respectively of pandas object.
print(obj6.tail(3))
By Sumit Sir...(Shri Jain Public School) 12
• Vector Operations on Series Object:
e.g.
print(obj5+2)
print(obj6>15)
obj8=obj7**2
print(obj8)
• Arithmetic On Series Objects: The operation is performed only on the
matching indexes.
e.g.
obj1 obj2 obj3 obj4
0 0 0 3.5 a 1.5 0 3.5
b 12.75 1 5
1 1 1 5
c 24 2 6.5
2 2 2 6.5 d 35.25 3 8
3 3 3 8 e 46.5 4 14.2
4 4 4 np.NaN 5 15.18
6 19.25
7 20.24
By Sumit Sir...(Shri Jain Public School) 13
• Since objects obj1 and obj2 have matching indexes (both have indexes in the range
0 to 4) it successfully carries out given arithmetic operation on corresponding
items of matching indexes.
print(obj1+obj2)
print(obj1*obj3)
• But if you try to perform operation on objects that have some or all non –
matching indexes, then it will add values of matching indexes, if any and for non-
matching indexes of both the objects, it will return the result as Not a Number i.e.
NaN. It represents missing data.
print(obj1+obj4)
print(obj2-obj3)
• You can store the result of object arithmetic in another object, which will also be a
Series object.
obj9=obj1+obj2
• Some additional operations on Series Objects:
• Reindexing method: Create a similar object but with a different order of same
indexes.
n1=obj3.reindex([‘e’, ‘d’, ‘c’, ‘b’, ‘a’])
print(n1)
• Dropping Entries from an Axis:
You can remove that entry from series object using drop().
obj3 = obj3.drop(‘c’)
By Sumit Sir...(Shri Jain Public School) 14