PANDAS DATAFRAMES
Pandas DataFrame-a Two-dimensional size-mutable, heterogeneous tabular data structure.
Tabular data structure has rows and columns.Pandas DataFrame is similar to excel sheet.
Data Frame can be created using Dictionary & List.
In [ ]: 1 #Create a dataframe from a list
In [1]: 1 import pandas as pd
2 games_list = ['Cricket', 'Volleyball', 'Judo', 'Hockey']
3 df= pd.DataFrame(games_list,index =['G1','G2','G3','G4'])
4 df
Out[1]: 0
G1 Cricket
G2 Volleyball
G3 Judo
G4 Hockey
In [1]: 1 #Create a dataframe from a dictionary with default index
2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],
4 "Age":[19,23,20,18],
5 "Class":[12,11,12,12]}
6 df=pd.DataFrame(dict1)
7 df
Out[1]: Name Age Class
0 Riya 19 12
1 Rishab 23 11
2 Isha 20 12
3 Rahul 18 12
In [7]: 1 #Create a dataframe from a dictionary with custom index
2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
4 df=pd.DataFrame(dict1,index=["P1","P2","P3","P4"])
5 df
Out[7]: Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18
In [26]: 1 import pandas as pd
2 dic={'Rollno':[1,2,3,4,5,6],
3 'Name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
4 'UT1':[24,18,20,22,15,20],
5 'UT2':[24,17,22,20,20,15],
6 'UT3':[20,19,18,24,18,22],
7 'UT4':[22,22,24,20,22,24]
8 }
9 df=pd.DataFrame(dic,index=["P1","P2","P3","P4","P5","P6"])
10 df
Out[26]: Rollno Name UT1 UT2 UT3 UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24
In [32]: 1 df.index
Out[32]: Index(['P1', 'P2', 'P3', 'P4', 'P5', 'P6'], dtype='object')
In [31]: 1 df.info
Out[31]: <bound method DataFrame.info of Rollno Name UT1 UT2 UT3
UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24>
In [30]: 1 df.count()
Out[30]: Rollno 6
Name 6
UT1 6
UT2 6
UT3 6
UT4 6
dtype: int64
In [27]: 1 df.ndim
Out[27]: 2
In [28]: 1 df.shape
Out[28]: (6, 6)
In [29]: 1 df.columns
Out[29]: Index(['Rollno', 'Name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')
In [11]: 1 #Create a dataframe from a dictionary with custom index
2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
4 df=pd.DataFrame(dict1,index=["P1","P2","P3","P4"])
5 df
Out[11]: Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18
In [14]: 1 #attribute access
2 df.Name
Out[14]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64
In [15]: 1 df['Name']
Out[15]: P1 Riya
P2 Rishab
P3 Isha
P4 Rahul
Name: Name, dtype: object
In [16]: 1 df.Age
Out[16]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64
In [17]: 1 df['Age']
Out[17]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64
In [18]: 1 df.index
Out[18]: Index(['P1', 'P2', 'P3', 'P4'], dtype='object')
In [19]: 1 df.info
Out[19]: <bound method DataFrame.info of Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18>
In [20]: 1 df.shape
Out[20]: (4, 2)
In [21]: 1 df.columns
Out[21]: Index(['Name', 'Age'], dtype='object')
In [22]: 1 df.ndim
Out[22]: 2
In [23]: 1 df
Out[23]: Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18
In [24]: 1 print(df)
Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18
In [25]: 1 df.count()
Out[25]: Name 4
Age 4
dtype: int64
In [33]: 1 import pandas as pd
2 import numpy as np
3 df = pd.DataFrame({"Person":["Jhonny", "Mira", "Tom", "Jhonny", "Mira"],
4 "Age": [26., np.nan, 24., 35, 36],
5 "Single": [False, True, True, True, False]})
6 df
Out[33]: Person Age Single
0 Jhonny 26.0 False
1 Mira NaN True
2 Tom 24.0 True
3 Jhonny 35.0 True
4 Mira 36.0 False
Head & Tail function in dataframe
In [2]: 1 import pandas as pd
2 import numpy as np
3 name_dict = {'Name' : ["Anita", "Sajal", "Ayaan", "Abhey","Rahul","Isha"]
4 'Age' : [14,32, 3, 6,10,13] }
5 df = pd.DataFrame(name_dict)
6 df
7 print("-----First Five Rows-----")
8 print(df.head()) # Displays first Five Rows
-----First Five Rows-----
Name Age
0 Anita 14
1 Sajal 32
2 Ayaan 3
3 Abhey 6
4 Rahul 10
In [3]: 1 print("-----First Two Rows-----")
2 print(df.head(2)) # Displays first 2 Rows
-----First Two Rows-----
Name Age
0 Anita 14
1 Sajal 32
In [4]: 1 print("-----Last Five Rows-----")
2 print(df.tail()) # Displays last Five Rows
3 print("-----Last Two Rows-----")
4 print(df.tail(2)) # Displays last 2 Rows
-----Last Five Rows-----
Name Age
1 Sajal 32
2 Ayaan 3
3 Abhey 6
4 Rahul 10
5 Isha 13
-----Last Two Rows-----
Name Age
4 Rahul 10
5 Isha 13
In [8]: 1 #Display all rows except last 2 rows
2 df.head(-2)
Out[8]: Name Age
0 Anita 14
1 Sajal 32
2 Ayaan 3
3 Abhey 6
In [9]: 1 df.tail(-1)
Out[9]: Name Age
1 Sajal 32
2 Ayaan 3
3 Abhey 6
4 Rahul 10
5 Isha 13
In [2]: 1 import pandas as pd
2
3 # dictionary of lists
4 dict1 = {'Name':["aparna", "pankaj", "sudhir", "Geeku"],
5 'Degree': ["BCA", "BCA", "M.Tech", "BCA"],
6 'Score':[90, 40, 80, 98]}
7
8 # creating a dataframe
9 df = pd.DataFrame(dict1)
10 df
Out[2]: Name Degree Score
0 aparna BCA 90
1 pankaj BCA 40
2 sudhir M.Tech 80
3 Geeku BCA 98
In [19]: 1 df[df['Score']<=40]
Out[19]: Name Degree Score
1 pankaj BCA 40
In [15]: 1 print(df['Score'])
2 print(df['Degree'])
3 print(df['Name'])
0 90
1 40
2 80
3 98
Name: Score, dtype: int64
0 BCA
1 BCA
2 M.Tech
3 BCA
Name: Degree, dtype: object
0 aparna
1 pankaj
2 sudhir
3 Geeku
Name: Name, dtype: object
In [1]: 1 import pandas as pd
2 dict={'Rollno':[1,2,3,4],
3 'Name':["Aman","Preeti","Kartik", "Lakshay"],
4 'Class':['IX','X','IX','X'],
5 'Section':['E','F','D','A'],
6 'CGPA':[8.7,8.9,9.2,9.4],
7 'Stream':["Science","Arts","Science","Commerce"]
8 }
9 classframe=pd.DataFrame(dict,index=["ST1","ST2","ST3","ST4"])
10 print(classframe)
Rollno Name Class Section CGPA Stream
ST1 1 Aman IX E 8.7 Science
ST2 2 Preeti X F 8.9 Arts
ST3 3 Kartik IX D 9.2 Science
ST4 4 Lakshay X A 9.4 Commerce
In [2]: 1 import pandas as pd
2 dict1={"Rollno":[1,2,3,4],
3 "Name":["Aman","Preeti","Kartik","Lakshay"],
4 "Class":["IX","X","IX","X"],
5 "Section":["E","F","D","A"],
6 "CGPA":[8.7,8.9,9.2,9.4],
7 "Stream":["Science","Arts","Science","Commerce"]
8 }
9 df=pd.DataFrame(dict1,index=["ST1","ST2","ST3","ST4"])
10 df
Out[2]: Rollno Name Class Section CGPA Stream
ST1 1 Aman IX E 8.7 Science
ST2 2 Preeti X F 8.9 Arts
ST3 3 Kartik IX D 9.2 Science
ST4 4 Lakshay X A 9.4 Commerce
In [4]: 1 df[df['CGPA']>9]
Out[4]: Rollno Name Class Section CGPA Stream
ST3 3 Kartik IX D 9.2 Science
ST4 4 Lakshay X A 9.4 Commerce
In [3]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
Out[3]: TNAME TANO TNADD SALARY
0 AMIT T01 123 PASCHIM VIHAR 23000
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
In [5]: 1 df.SALARY
Out[5]: 0 23000
1 34000
2 12000
3 45000
4 34000
Name: SALARY, dtype: int64
In [6]: 1 df['SALARY']
Out[6]: 0 23000
1 34000
2 12000
3 45000
4 34000
Name: SALARY, dtype: int64
In [7]: 1 df['SALARY']>16000
Out[7]: 0 True
1 True
2 False
3 True
4 True
Name: SALARY, dtype: bool
In [10]: 1 df[df['SALARY']>16000]
Out[10]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
In [8]: 1 df=df.set_index('TNAME')
In [9]: 1 df
Out[9]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABHI BAG H 12000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
In [27]: 1 import pandas as pd
2 import numpy as np
3 dic={'rollno':[1,2,3,4,5,6],
4 'name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
5 'UT1':[24,18,20,22,15,20],
6 'UT2':[24,17,22,20,np.nan,15],
7 'UT3':[20,19,18,24,18,22],
8 'UT4':[22,22,24,20,22,24]
9 }
10 df=pd.DataFrame(dic)
11 df
Out[27]: rollno name UT1 UT2 UT3 UT4
0 1 Prerna Singh 24 24.0 20 22
1 2 Manish Arora 18 17.0 19 22
2 3 Tanish Goel 20 22.0 18 24
3 4 Falguni Jain 22 20.0 24 20
4 5 Kanika Bhatnagar 15 NaN 18 22
5 6 Ramandeep Kaur 20 15.0 22 24
In [25]: 1 df[df['rollno']==4]
Out[25]: rollno name UT1 UT2 UT3 UT4
3 4 Falguni Jain 22 20 24 20
In [28]: 1 print(df.count())
rollno 6
name 6
UT1 6
UT2 5
UT3 6
UT4 6
dtype: int64
In [29]: 1 print(df.columns)
Index(['rollno', 'name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')
Using loc & iloc
In [8]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
Out[8]: TNAME TANO TNADD SALARY
0 AMIT T01 123 PASCHIM VIHAR 23000
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
loc is label based and iloc is index based integers to retreive rows from dataframe
In [10]: 1 df.iloc[1:4]
Out[10]: TNAME TANO TNADD SALARY
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
In [11]: 1 df.iloc[2:3]
Out[11]: TNAME TANO TNADD SALARY
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
In [10]: 1 df.loc[2:4]
Out[10]: TNAME TANO TNADD SALARY
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
In [15]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
Out[15]: TNAME TANO TNADD SALARY
0 AMIT T01 123 PASCHIM VIHAR 23000
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
Adding a column
In [17]: 1 df ['Grade']=['A','B','A','A','B']
2 df
Out[17]: TNAME TANO TNADD SALARY Grade
0 AMIT T01 123 PASCHIM VIHAR 23000 A
1 RAJESH TO2 6/11 RAMESH NAGAR 34000 B
2 BINNY T03 5 WEST PUNJABHI BAG H 12000 A
3 CHARU T04 23 MALVIYA NAGAR 45000 A
4 MEENAKSHI TO5 19 MEERA BAGH 34000 B
In [1]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
Out[1]: TNAME TANO TNADD SALARY
0 AMIT T01 123 PASCHIM VIHAR 23000
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
In [2]: 1 df=df.set_index('TNAME')
In [3]: 1 df
Out[3]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABHI BAG H 12000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
In [4]: 1 df['Allowance']=[4000,6000,8000,10000,'']
2 df
Out[4]: TANO TNADD SALARY Allowance
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000
RAJESH TO2 6/11 RAMESH NAGAR 34000 6000
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000
CHARU T04 23 MALVIYA NAGAR 45000 10000
MEENAKSHI TO5 19 MEERA BAGH 34000
In [5]: 1 df['Desig']=['Manager','Clerk','Manager','HR','Manager']
2 df
Out[5]: TANO TNADD SALARY Allowance Desig
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager
RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager
CHARU T04 23 MALVIYA NAGAR 45000 10000 HR
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager
In [6]: 1 #add a column using assign function
2 df=df.assign(Tax=[500,100,300,200,150])
In [7]: 1 df
Out[7]: TANO TNADD SALARY Allowance Desig Tax
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500
RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300
CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150
In [8]: 1 df.loc['AMIT']
Out[8]: TANO T01
TNADD 123 PASCHIM VIHAR
SALARY 23000
Allowance 4000
Desig Manager
Tax 500
Name: AMIT, dtype: object
In [9]: 1 df.loc[['AMIT','BINNY']]
Out[9]: TANO TNADD SALARY Allowance Desig Tax
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300
In [10]: 1 df.loc[['AMIT','BINNY'],['TANO','SALARY']]
2
Out[10]: TANO SALARY
TNAME
AMIT T01 23000
BINNY T03 12000
In [11]: 1 df.loc[['AMIT','BINNY'],'SALARY']
Out[11]: TNAME
AMIT 23000
BINNY 12000
Name: SALARY, dtype: int64
In [12]: 1 df.loc['AMIT':'BINNY','SALARY']
Out[12]: TNAME
AMIT 23000
RAJESH 34000
BINNY 12000
Name: SALARY, dtype: int64
In [13]: 1 #adding a column with loc
2 df.loc[:,'HRA']=[3000,4000,5000,3000,6000]
In [14]: 1 df
Out[14]: TANO TNADD SALARY Allowance Desig Tax HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500 3000
RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100 4000
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300 5000
CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200 3000
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000
In [15]: 1 df.loc[:,'HRA']
Out[15]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64
In [16]: 1 df.HRA
Out[16]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64
In [17]: 1 df["HRA"]
Out[17]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64
In [18]: 1 df.loc[["AMIT","CHARU"],"HRA"]
Out[18]: TNAME
AMIT 3000
CHARU 3000
Name: HRA, dtype: int64
In [19]: 1 df
Out[19]: TANO TNADD SALARY Allowance Desig Tax HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500 3000
RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100 4000
BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300 5000
CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200 3000
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000
In [20]: 1 df['Total_Salary']=df['SALARY']+df['HRA']-df['Tax']
In [21]: 1 df
Out[21]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary
TNAME
123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR
6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR
5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H
23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850
In [22]: 1 #Sorting the data frame
2 dfsort=df.sort_values('Total_Salary')
3 dfsort
Out[22]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary
TNAME
5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H
123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR
6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850
23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR
In [23]: 1 dfsort=df.sort_values('Total_Salary',ascending=False)
2 dfsort
Out[23]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary
TNAME
23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR
MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850
6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR
123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR
5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H
In [24]: 1 df['SALARY']=df['SALARY']+df['SALARY']*10/100
2 df
Out[24]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary
TNAME
123 PASCHIM
AMIT T01 25300.0 4000 Manager 500 3000 25500
VIHAR
6/11 RAMESH
RAJESH TO2 37400.0 6000 Clerk 100 4000 37900
NAGAR
5 WEST
BINNY T03 13200.0 8000 Manager 300 5000 16700
PUNJABHI BAG H
23 MALVIYA
CHARU T04 49500.0 10000 HR 200 3000 47800
NAGAR
MEENAKSHI TO5 19 MEERA BAGH 37400.0 Manager 150 6000 39850
In [25]: 1 #Deleting a column using del
2 del df['Total_Salary']
In [26]: 1 df
Out[26]: TANO TNADD SALARY Allowance Desig Tax HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0 4000 Manager 500 3000
RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 Clerk 100 4000
BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 Manager 300 5000
CHARU T04 23 MALVIYA NAGAR 49500.0 10000 HR 200 3000
MEENAKSHI TO5 19 MEERA BAGH 37400.0 Manager 150 6000
In [27]: 1 #Deleting a column using pop()
2 df.pop("Desig")
Out[27]: TNAME
AMIT Manager
RAJESH Clerk
BINNY Manager
CHARU HR
MEENAKSHI Manager
Name: Desig, dtype: object
In [28]: 1 df
Out[28]: TANO TNADD SALARY Allowance Tax HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0 4000 500 3000
RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 100 4000
BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 300 5000
CHARU T04 23 MALVIYA NAGAR 49500.0 10000 200 3000
MEENAKSHI TO5 19 MEERA BAGH 37400.0 150 6000
In [29]: 1 df.drop(labels='Allowance',axis=1)
Out[29]: TANO TNADD SALARY Tax HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0 500 3000
RAJESH TO2 6/11 RAMESH NAGAR 37400.0 100 4000
BINNY T03 5 WEST PUNJABHI BAG H 13200.0 300 5000
CHARU T04 23 MALVIYA NAGAR 49500.0 200 3000
MEENAKSHI TO5 19 MEERA BAGH 37400.0 150 6000
In [30]: 1 df.drop(labels='Tax',axis=1,inplace=True)
In [31]: 1 df
Out[31]: TANO TNADD SALARY Allowance HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0 4000 3000
RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 4000
BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 5000
CHARU T04 23 MALVIYA NAGAR 49500.0 10000 3000
MEENAKSHI TO5 19 MEERA BAGH 37400.0 6000
In [32]: 1 df
Out[32]: TANO TNADD SALARY Allowance HRA
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0 4000 3000
RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 4000
BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 5000
CHARU T04 23 MALVIYA NAGAR 49500.0 10000 3000
MEENAKSHI TO5 19 MEERA BAGH 37400.0 6000
In [33]: 1 df.drop(labels=['Allowance','HRA'],axis=1,inplace=True)
In [34]: 1 df
Out[34]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 25300.0
RAJESH TO2 6/11 RAMESH NAGAR 37400.0
BINNY T03 5 WEST PUNJABHI BAG H 13200.0
CHARU T04 23 MALVIYA NAGAR 49500.0
MEENAKSHI TO5 19 MEERA BAGH 37400.0
INSERTING ROWS & DELETING ROWS
In [1]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
8
Out[1]: TNAME TANO TNADD SALARY
0 AMIT T01 123 PASCHIM VIHAR 23000
1 RAJESH TO2 6/11 RAMESH NAGAR 34000
2 BINNY T03 5 WEST PUNJABHI BAG H 12000
3 CHARU T04 23 MALVIYA NAGAR 45000
4 MEENAKSHI TO5 19 MEERA BAGH 34000
In [2]: 1 df=df.set_index('TNAME')
2 df
Out[2]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABHI BAG H 12000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
In [3]: 1 #INSERT NEW ROW WITH VALUES["ISHA","T06","23 MODEL TOWN",35000] using loc
2 df.loc["ISHA"]=["T06","23 MODEL TOWN",35000]
3 df
Out[3]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABHI BAG H 12000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
ISHA T06 23 MODEL TOWN 35000
In [4]: 1 #CHANGING THE CONTENTS
2 df.loc["BINNY"]=["T03","5 WEST PUNJABI BAGH",20000]
3 df
Out[4]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI TO5 19 MEERA BAGH 34000
ISHA T06 23 MODEL TOWN 35000
In [5]: 1 #Edit the contents using iloc
2 df.iloc[4]=["T05","10 MEERA BAGH",25000]
3 df
Out[5]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI T05 10 MEERA BAGH 25000
ISHA T06 23 MODEL TOWN 35000
DELETING ROW
In [6]: 1 df.drop("RAJESH",axis=0)
Out[6]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI T05 10 MEERA BAGH 25000
ISHA T06 23 MODEL TOWN 35000
In [7]: 1 df
Out[7]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
RAJESH TO2 6/11 RAMESH NAGAR 34000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI T05 10 MEERA BAGH 25000
ISHA T06 23 MODEL TOWN 35000
In [8]: 1 df.drop("RAJESH",axis=0,inplace=True)
In [9]: 1 df
Out[9]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI T05 10 MEERA BAGH 25000
ISHA T06 23 MODEL TOWN 35000
In [10]: 1 df
Out[10]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
BINNY T03 5 WEST PUNJABI BAGH 20000
CHARU T04 23 MALVIYA NAGAR 45000
MEENAKSHI T05 10 MEERA BAGH 25000
ISHA T06 23 MODEL TOWN 35000
In [12]: 1 #df.drop(labels=["ISHA","CHARU"],axis=0,inplace=True)
2 df.drop(["ISHA","CHARU"],0,inplace=True)
In [13]: 1 df
Out[13]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
BINNY T03 5 WEST PUNJABI BAGH 20000
MEENAKSHI T05 10 MEERA BAGH 25000
In [14]: 1 df.drop(df.index[1])
Out[14]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
MEENAKSHI T05 10 MEERA BAGH 25000
In [15]: 1 df
Out[15]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
BINNY T03 5 WEST PUNJABI BAGH 20000
MEENAKSHI T05 10 MEERA BAGH 25000
In [16]: 1 df.drop(df.index[1],inplace=True)
In [17]: 1 df
Out[17]: TANO TNADD SALARY
TNAME
AMIT T01 123 PASCHIM VIHAR 23000
MEENAKSHI T05 10 MEERA BAGH 25000
In [18]: 1 df.drop(df.index[[0,1]],inplace=True)
In [19]: 1 df
Out[19]: TANO TNADD SALARY
TNAME
BOOLEAN INDEXING
In [1]: 1 import pandas as pd
2 dict1={'Names':['Sush','Adarsh','Ravi','Manu','Sushma'],
3 'Clas':[11,12,11,12,12],
4 'Sec':['A','A','C','A','B'],
5 'Phy':[34,40,56,67,50],
6 'Chem':[78,90,50,65,90],
7 'Eng':[50,55,67,68,69],
8 'Proj_rem':['Avg','Good','Good','Fair','Avg']
9 }
10 student=pd.DataFrame(dict1,index=[100,101,102,103,104])
11 student
12
Out[1]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Good
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushma 12 B 50 90 69 Avg
In [6]: 1 #Marks >=70 in Chemistry
2 student[student.Chem>=70]
Out[6]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Good
104 Sushma 12 B 50 90 69 Avg
In [5]: 1 #Student whose section is A
2 student[student.Sec=="A"]
Out[5]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Good
103 Manu 12 A 67 65 68 Fair
WAC to display the details of class 11 students
In [4]: 1 student[student.Clas==11]
Out[4]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
102 Ravi 11 C 56 50 67 Good
In [8]: 1 student.loc[student.Clas==11]
Out[8]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
102 Ravi 11 C 56 50 67 Good
WAC display the project remarks of all students.
In [10]: 1 student["Proj_rem"]
Out[10]: 100 Avg
101 Good
102 Good
103 Fair
104 Avg
Name: Proj_rem, dtype: object
In [11]: 1 student.Proj_rem
Out[11]: 100 Avg
101 Good
102 Good
103 Fair
104 Avg
Name: Proj_rem, dtype: object
WAC to display all subject marks for class 12 students.
In [13]: 1 student
Out[13]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Good
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushma 12 B 50 90 69 Avg
In [15]: 1 student[student.Clas==12][["Phy","Chem","Eng"]]
Out[15]: Phy Chem Eng
101 40 90 55
103 67 65 68
104 50 90 69
In [2]: 1 student.loc[student.Clas==12,["Phy","Chem","Eng"]]
Out[2]: Phy Chem Eng
101 40 90 55
103 67 65 68
104 50 90 69
WAC to view the Project remark for those who have got more than 80 in chemistry.
In [3]: 1 student.loc[student.Chem>80,"Proj_rem"]
Out[3]: 101 Good
104 Avg
Name: Proj_rem, dtype: object
Display the details of students who have got Good in their Project remarks.
In [4]: 1 student.loc[student.Proj_rem=="Good"]
Out[4]: Names Clas Sec Phy Chem Eng Proj_rem
101 Adarsh 12 A 40 90 55 Good
102 Ravi 11 C 56 50 67 Good
In [5]: 1 student[student.Proj_rem=="Good"]
Out[5]: Names Clas Sec Phy Chem Eng Proj_rem
101 Adarsh 12 A 40 90 55 Good
102 Ravi 11 C 56 50 67 Good
Change the physics marks of Adarsh to 50.
In [7]: 1 student.loc[student.Names=="Adarsh","Phy"]=50
In [8]: 1 student
Out[8]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 50 90 55 Good
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushma 12 B 50 90 69 Avg
In [ ]: 1 WAC to Change the name “Sushma” to Sushmita”.
In [9]: 1 student.loc[student.Names=="Sushma","Names"]="Sushmita"
In [10]: 1 student
Out[10]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 50 90 55 Good
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushmita 12 B 50 90 69 Avg
In [13]: 1 student.loc[student.Names=="Sush"]="Sushmita"
In [15]: 1 student
Out[15]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sushmita Sushmita Sushmita Sushmita Sushmita Sushmita Sushmita
101 Adarsh 12 A 50 90 55 Good
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushmita 12 B 50 90 69 Avg
WAC to change the Project remark to “Excellent” for those who have got more than 80 in
chemistry.
In [3]: 1 student.loc[student.Chem>80,"Proj_rem"]="Excellent"
In [5]: 1 student
Out[5]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Excellent
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushma 12 B 50 90 69 Excellent
In [18]: 1 student.drop(100,0,inplace=True)
In [4]: 1 student
Out[4]: Names Clas Sec Phy Chem Eng Proj_rem
100 Sush 11 A 34 78 50 Avg
101 Adarsh 12 A 40 90 55 Excellent
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushma 12 B 50 90 69 Excellent
In [20]: 1 student.loc[student.Chem>80,"Proj_rem"]="Excellent"
In [21]: 1 student
Out[21]: Names Clas Sec Phy Chem Eng Proj_rem
101 Adarsh 12 A 50 90 55 Excellent
102 Ravi 11 C 56 50 67 Good
103 Manu 12 A 67 65 68 Fair
104 Sushmita 12 B 50 90 69 Excellent
In [1]: 1 D1={ 'Riya':19, 'Isha':20}
2 D2={ 'Isha':20, 'Riya':19}
3 D1==D2
4
Out[1]: True
In [4]: 1 import pandas as pd
2 import numpy as np
3 a1=np.array([2,3,4,5,6])
4 s1=pd.Series(a1,index=list("ABCDE"))
5 print(s1.ndim)
In [2]: 1 import pandas as pd
2 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
3 df=pd.DataFrame(dict1, index=["P1","P2","P3","P4"])
4 df
Out[2]: Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18
In [4]: 1 df.shape
Out[4]: (4, 2)
In [5]: 1 df.count()
Out[5]: Name 4
Age 4
dtype: int64
In [1]: 1 import pandas as pd
2 dic={'Rollno':[1,2,3,4,5,6],
3 'Name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
4 'UT1':[24,18,20,22,15,20],
5 'UT2':[24,17,22,20,20,15],
6 'UT3':[20,19,18,24,18,22],
7 'UT4':[22,22,24,20,22,24]
8 }
9 df=pd.DataFrame(dic,index=["P1","P2","P3","P4","P5","P6"])
10 print(df.index)
11 print(df.info)
12 print(df.columns)
13 print(df)
Index(['P1', 'P2', 'P3', 'P4', 'P5', 'P6'], dtype='object')
<bound method DataFrame.info of Rollno Name UT1 UT2 UT3
UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24>
Index(['Rollno', 'Name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')
Rollno Name UT1 UT2 UT3 UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24
In [8]: 1 import pandas as pd
2 df=pd.read_csv("grocery.csv")
3 print(df)
Sno Product Category Price Quantity
0 1 Chips Food 10 15
1 2 Milk Food 60 5
2 3 Maggi Food 20 5
3 4 Juice Food 100 4
4 5 Bread Food 20 2
5 6 Biscuit Food 20 2
6 7 Tea Food 120 1
7 8 Bourn-Vita Food 70 1
8 9 Bottle Household 80 2
9 10 Tiffin Box Household 75 2
10 11 Bucket Household 200 1
11 12 Detergent Household 80 1
12 13 Tissues Hygiene 30 5
13 14 Soap Hygiene 40 4
14 15 Brush Hygiene 30 2
15 16 Perfume Hygiene 150 1
16 17 Hair-Oil Hygiene 100 1
17 18 Pen Stationery 5 10
18 19 Pencil Stationery 2 10
In [4]: 1 df1=df[["Product","Price","Quantity"]]
In [9]: 1 df2=df.loc[df.Price>100,["Product","Price","Quantity"]]
2 df2
Out[9]: Product Price Quantity
6 Tea 120 1
10 Bucket 200 1
15 Perfume 150 1
In [5]: 1 df1
Out[5]: Product Price Quantity
0 Chips 10 15
1 Milk 60 5
2 Maggi 20 5
3 Juice 100 4
4 Bread 20 2
5 Biscuit 20 2
6 Tea 120 1
7 Bourn-Vita 70 1
8 Bottle 80 2
9 Tiffin Box 75 2
10 Bucket 200 1
11 Detergent 80 1
12 Tissues 30 5
13 Soap 40 4
14 Brush 30 2
15 Perfume 150 1
16 Hair-Oil 100 1
17 Pen 5 10
18 Pencil 2 10
In [ ]: 1