Course Id :INT 213
PART II-PANDAS
I INTRODUCTION
Pandas
• Pandas is probably the most powerful • To work with Pandas, we need to
library in Data analysis. import the module.
• It provides high-performance tools for
import pandas as pd
data manipulation and analysis.
• Furthermore, it is very effective at
converting data formats and querying
data out of databases.
• The two main data structures of Pandas
are :
series
data frame
II SERIES
Series
• A series in Pandas is a one-dimensional array which is labeled.
• You can imagine it to be the data science equivalent of an ordinary Python
dictionary.
• In order to create a series, we use the constructor of the Series class. The first
parameter that we pass is a list full of values (in this case numbers). The second
parameter is the list of the indices or keys.
Series
Changing the index of an element
• import pandas as pd • import pandas as pd
• s=pd.Series([1,2,3,4,5], • s=pd.Series([1,2,3,4,5],index=['a','b','c
[ 'a','b','c','d',’e’]) ','d','e’])
• print(s) • print(s)
Converting Dictionaries into Series
• Since series and dictionaries are • import pandas as pd
quite similar, we can easily myDict = {'A':10, 'B':20, 'C':30}
convert our Python dictionaries
into Pandas series. series=pd.Series(myDict)
print(series)
print(series['A'])
Changing the index of the element
• import pandas as pd • import pandas as pd
myDict = {'A':10, 'B':20, 'C':30} myDict = {'A':10, 'B':20, 'C':30}
series=pd.Series(myDict,index=['C','A','B’]) series=pd.Series(myDict,index=['X','Y','Z’])
print(series) print(series)
print(series['B']) print(series['X'])
Accessing a value
• import pandas as pd • import pandas as pd
s =pd.Series([1,2,3,4,5]) s=pd.Series([1,2,3,4,5],['a','b','c','d','e’])
print(s) print(s)
print(s[1]) print(s['c’])
print(s[2])
What is the output of the code?
What is the output of the code?
What is the output of the code?
III DATAFRAME
Dataframe
• In contrast to the series, a data frame is not one-dimensional but
multi-dimensional and looks like a table.
• You can imagine it to be like an Excel table or a data base table.
Dataframe
• import pandas as pd
data = {'Name': ['Anna', 'Bob', 'Charles’],
'Age': [24, 32, 35],
'Height': [176, 187, 175]
}
d=pd.DataFrame (data)
print(d)
Dataframes
Accessing a value
• import pandas as pd
data = {'Name': ['Anna', 'Bob', 'Charles'], 'Age': [24, 32, 35],'Height':
[176, 187, 175]}
d=pd.DataFrame (data)
print(d)
print(d['Name'][1])
Extracting selected columns
• import pandas as pd
• data = {'Name': ['Anna', 'Bob', 'Charles'], 'Age': [24, 32, 35],'Height':
[176, 187, 175]}
• d=pd.DataFrame (data)
• print(d)
• print(d[['Name','Height']])
IV DATAFRAME FUNCTIONS
(A) Basic Functions
How to use basic functions?
(B) Statistical Functions
How to use statistical functions?
(C) Numpy Functions
• Instead of using the built-in Pandas functions, we can also use the
methods we already know.
• For this, we just use the apply function of the data frame and then
pass our desired method.
How to use Numpy function in pandas?
V ITERATING
Iterating
• Iterating over data frames is quite easy with Pandas. We can either do
it in the classic way or use specific functions for it.
How to iterate in pandas?
VI SORTING
(A) Sorting by Index
Inplace parameter
• When we use functions that manipulate our data frame, we don’t
actually change it but we return a manipulated copy. If we wanted to
apply the changes on the actual data frame, we would need to do it
like this:
df = df.sort_index()
• But Pandas offers us another alternative as well. This alternative is
the parameter inplace. When this parameter is set to True, the
changes get applied to our actual data frame.
• df.sort_index(inplace=True)
Inplace parameter
(B) Sort by Column
VII CSV FILE
Reading data from csv file
Writing data into csv files
VIII JOINING AND MERGING
Merging
Joining
Inner join
Left join
Right join
Outer join
IX QUERYING DATA
Extracting Selected Data
X Working on CSV files
CSV file 1: ’ worldcup.csv’
CSV file 2: ’ worldcup1.csv’
Extracting selected columns
Extracting selected rows
Merging the files