1
Chapter-4: CSV File handling
hJHADESWAR
InteyatLOnaL Scnuel
CHAPTER-4
CSVFILE
cOMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
2 JHADESWAR
Chapter-4:CSV File handling
CSV File
cSV (Comma separated Values) is a Simple file format used to store tabular
data ,such as a Spreadsheet or Database.
" ACSV file stores tabular data (number and text) in plain text.
Each line of the File is a data record.
Each record consists of one or more fields, separated by commas.
" There is an in-built module called csv ,for working with CSV files
Advantages of CSV Format:
A simple ,compact and ubiquitous format for data storage.
Acommon format for data interchange.
Itcan be opened in popular spreadsheet packages like MS Excel, OpenOffice
Cals etc.
Nearly all spreadsheets and database support import/Export tocsv format.
Creating CSV File
" A
CSV is atext file, so it can be created and edited by any text
editor(Notepad) and Spreadsheet applications(MS Excel, OpenOffice calc).
ACSV files follow a Standard format ,i.e., each column is separated by a
delimiter (Such as comma, semicolon, space ortab)and each line indicates a
new row.
a)Creating CSV file Using Notepad:
Start>>Notepad>>Ctrl +N
Write the contents using delimiter(comma)
Save the File (Ctrl+ S)>> Give a fle name (ex:emp1.csv)
>>Choose the save as type >>All files>>save
emp1.csv
File Edit View
ID ,NAME, CITY, SALARY, BONUS
101,SEKHAR,CHENNAI, 25000,4000
102, ANWESH, MUMBAI, 45000,5000
103,ANITA, DELHI, 75000, 6000
104, RAJESH, PUNE, 12000,2900
105, PUNAM, KOLKATA, 18000,3000
COMPILED BY: JYOTI SEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
3 JHADESWAR
lntern al ionlcnool
Chapter-4: CSV File handling
b)Creating CSV file using EXCEL
Start>>Excel>>Blank Workbook
Write the content in row and column
Save the file (Ctrl +S) >> give a File name (eg: emp2)>>Choose save as
type>>csV (Comma delimited)>>save
Reading From CSV file to Data Frame
The read csv() function in pandas ,loads the data in a Pandas DataFrame
Syntax:
pd.read_csv("Filename.csv or path of file")
CSV file treats all datatypes as characters.
Pandas interprets datatypes specifically when loading the data.
EX:
File Insert Page Layout Formulas Data Review View
Calibri 1 A A D Wrap Text
Paste
.. Merge & Center
Cipbosrd Font Aligrnsent
R20
B G H
1 Empid Age City
2 101 Russi 25 umbai
102 Raiesh a5 Chennai
103 Pritamm 12 Delhi
6 104 Sumit 24 Kolkata
105 israt 23 Mumbai
106 Alok 15 Delhi
107 Harsit 16 Pune
108 Lila 17 Goa
EX:
import pandas as pd
df=pd.read_csv("emp2.csv")
print(df)
o/P:
Empid Name Age City Salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
3 103.0 Punam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
5 105.0 Israt 23.0 Mumbai 38000
6106.0 Alok 15.0 Delhi 36000
7 107.0 Harsit 16.0 Pune 45000
8 108.0 Lila 17.0 Goa 10000
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
4
Chapter-4: CSV File handling
JJHADEsWAR
Finding Number of Rows and Columns From a CSV
We can see the total number of rows and columns by using shape command.
EX:
import pandas as pd
df-pd.read_csv("'emp2.csv")
print('shape is:' ,df.shape)
row, col=df.shape
print("Number of Rows :",row)
print("Number of Columns:",col)
o/P:
shape is:(9, 5)
Number of Rows: 9
Number of Columns: 5
Writinga CSV file with adefault Index:
To create a CSV file from a Data Frame ,the to csv() method is used.
We can do this either bytransferring the records directly to the CSV file
or by coping the contents of the original CSV file to another file
The attribute index=false is used to skip in the index as a column in the
new CSV file.
Ex:(To copy one CSV file another new CSV file)
import pandas as pd
df=pd.read_csv("emp2.csv")
df.to_csv("'empnew.csv",jindex=False)
print(df)
o/P:
Empid Name Age City Salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
3 103.0 Punam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
5 105.0 Israt 23.0 Mumbai 38000
6 106.0 Alok 15.0 Delhi 36000
7 107.0 Harsit 16.0 Pune 45000
8 108.0 Lila 17.0 Goa 10000
COMPILED BY: JYOTI SEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241 189
Chapter-4: CSV File handling
dJHADESWAR
Saving DataFrame as CSV file
To create a CSV file from dataframe to_csv() function is used.
EX:
import pandas as pd
student={RollNo':[1,2,3,4,5,6],
'StudName':['Teena',"Rinku', 'Suman', Goutam,'Rajesh', 'Pinku'),
'Marks':[60,50,40,45,15,65],
'Class':['11A','11B','11C", '11A',"11D','11E']}
df=pd.DataFrame(student)
df.to_csv('student.csv')
print("Student.csv file is created")
o/P:
Student.csv file is created
If we read the file Student.csv file into a dataframe using read_csv()
function, then an extracolumn willdisplayed i.e Unnamed :0 To avoid
this columnwe can use the attribute indexcol=0 with read csv()
function.
EX:
import pandas as pd
df=pd.read_csv("student.csv'")
print(df)
df=pd.read_csv("student.csv",index_col=0)
print(df)
o/P:
Unnamed: 0 RollNo StudName Marks Class
1 Teena 60 11A
1 1 2 Rinku 50 11B
2 2 3 Suman 40 11C
3 3 4 Goutam 45 11A
4 4 5 Rajesh 15 11D
5 6 Pinku 65 11£
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
6 JHADESWAR
Chapter-4: CSV File handling
RollNoStudName Marks Class
1 Teena 60 11A
1 2 Rinku 50 11B
2 3 Suman 40 11C
3 4 Goutam 45 11A
4 5 Rajesh 15 11D
6 Pinku 65 11E
Updating /Modifying contents of CSV File
Steps to update CSV file
1)lmport module
2)Open CSV file and read its data
3)Find column tobe updated
4)Update value in the CSV file using to_csv() function.
EX:
import pandas as pd
df=pd.read_csv("'emp2.csv")
print("Data Frame contents before updation:")
print(df)
print()
df.loc[3,'Name']=Pritam'
df.to_csv("emp2.csv",index=False)
print("Data Frame contents after updation:")
print(df)
o/P:
DataFrame contents before updation:
Empid Name Age City Salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
3 103.0 Punam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
5 105.0 Israt 23.0 Mumbai 38000
6 106.0 Alok 15.0 Delhi 36000
7 107.0 Harsit 16.0 Pune 45000
8 108.0 Lila 17.0 Goa 10000
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
7 JHADESWAR
Chapter-4: CSV File handling
DataFrame contents after updation:
Empid Name Age City Salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
3 103.0 Pritam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
5 105.0 Israt 23.0 Mumbai 38000
6 106.0 Alok 15.0 Delhi 36000
7 107.0 Harsit 16.0 Pune 45000
8 108.0 Lila 17.0 Goa 10000
Copying Fields into a new CSV file
By using columns attribute along with to_csv()function we can specify the
name of the columns to be copied.
EX:
import pandas as pd
df-pd.read_csv('"'emp2.csv")
df1=df.to_csv("emp3.csv", columns=['Empid','Name','Age'])
df2=pd.read_csv("emp3.csv")
print(df2)
o/P:
Unnamed: 0 Empid Name Age
101.0 Russi 25.0
1 1 102.0 Rajesh 45.0
2 2 NaN NaN NaN
3 3 103.0 Pritam 12.0
4 4 104.0 Sumit 24.0
105.0 Israt 23.0
6 6 106.0 Alok 15.0
7 107.0 Harsit 16.0
108.0 Lila 17.0
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
8 JHADESWAR
Chapter-4: CSV File handling
Reading CSV File with specific/selected columns
Along with Read_csv we can use usecols attribute to read specific columns
from a CSV file.
EX:
import pandas as pd
df-pd.read_csv("'emp2.csv" ,usecols=[Name','Age','Salary'])
print(df)
o/P:
Name Age Salary
0 Russi 25.0 25000
1 Rajesh 45.0 35000
2 NaN NaN 26000
3 Pritam 12.0 36000
4 Sumit 24.0 17000
5 Israt 23.0 38000
6 Alok 15.0 36000
7 Harsit 16.0 45000
8 Lila 17.0 10000
Reading CSV File with specific/selected rows
Along with read_csv() we can nrows attribute to read specific rows from a CSV
file.
EX:
import pandas as pd
df=pd.read_csv("emp2.csv" ,nrowS=5)
print(df)
o/P:
Empid Name Age City Salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
3 103.0 Pritam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
COMPILED BY: JYOTI SEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
JHADESWAR
Chapter-4: CSV File handling ln!enetion alchoot
Reading CSV File without header
Alongwith read csv() we can use header=none attribute ,if we don't want to
display the first row as the header row for dataframe.
import pandas as pd
df=pd.read _csv("'emp2.csv" ,header=None)
print(df)
o/P:
1 2 3 4
0 Empid Name Age City Salary
1 101.0 Russi 25.0 Mumbai 25000
2 102.0 Rajesh 45.0 Chennai 35000
3 NaN NaN NaN NaN 26000
4 103.0 Pritam 12.0 Delhi 36000
S 104.0 Sumit 24.0 Kolkata 17000
6 105.0 Israt 23.0 Mumbai 38000
7 106.0 Alok 15.0 Delhi 36000
8 107.0 Harsit 16.0 Pune 45000
9 108.0 Lila 17.0 Goa 10000
Reading CSV File by Skip ROWS
Along with read csv()we can use skiprows argument ,if we don't want to
display the first 'n' rows from the dataframe.
EX:
import pandas as pd
df=pd.read_csv("emp2.csv" ,skiprows=2)
print(df)
o/P:
102.0 Rajesh 45.0 Chennai 35000
0 NaN NaN NaN NaN 26000
1103.0 Pritam 12.0 Delhi 36000
104.0 Sumit 24.0 Kolkata 17000
3 105.0 Israt 23.0 Mumbai 38000
106.0 Alok 15.0 Delhi 36000
107.0 Harsit 16.0 Pune 45000
6 108.0 Lila 17.0 Goa 10000
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189
10 JHADESWAR
Chapter-4: CSV File handling
Reading CSV File Without index
Alongwith read_csv() we can use index col=0 attribute ,if we don't want to
display their respective index numbers from dataframe.
EX:
import pandas as pd
df=pd.read_csv("'emp2.csv" ,index_col=0)
print(df)
o/P:
Name Age City Salary
Empid
101.0 Russi 25.0 Mumbai 25000
102.0 Rajesh 45.0 Chennai 35000
NaN NaN NaN NaN 26000
103.0 Pritam 12.0 Delhi 36000
104.0 Sumit 24.0 Kolkata 17000
105.0 Israt 23.0 Mumbai 38000
106.0 Alok 15.0 Delhi 36000
107.0 Harsit 16.0 Pune 45000
108.0 Lila 17.0 Goa 10000
Reading CSV file with new Column Names
If the header exists ,we have to skip it using skiprows Arguments then we can use names
argument for renaming of columns with read _csv() function.
EX:
import pandas as pd
df=pd.read_csv("emp2.csv"
skiprows=1,names=['E_id','E_name','E_age,'E_city','E_salary'l)
print(df)
o/P:
E id E_name E_age E_city E_salary
0 101.0 Russi 25.0 Mumbai 25000
1 102.0 Rajesh 45.0 Chennai 35000
2 NaN NaN NaN NaN 26000
103.0 Pritam 12.0 Delhi 36000
4 104.0 Sumit 24.0 Kolkata 17000
105.0 Israt 23.0 Mumbai 38000
6 106.0 Alok 15.0 Delhi 36000
7107.0 Harsit 16.0 Pune 45000
8 108.0 Lila 17.0 Goa 10000
COMPILED BY: JYOTISEKHAR PANDA
PGT(COMPUTER SCIENCE)MCA,B.ED CONTACT NO=7381241189