File Handling in Python
Introduction
• A file is a collection of data stored on secondary storage device like hard disk.
• So far we had been processing data that was entered through the keyboard using input()
• Reading from input is become very tedious especially when there are large amount of
data to be processed.
• To overcome this to find a better solution is to combine all input data into a file and
then read this data from file whenever we required.
• When a program is being executed , its data is stored in ‘random access memory ’
(RAM) ,it can accessed faster by CPU, it is also volatile.
• Volatile ,which means that when the program ends, or the computer shuts down, all the
data is lost.
• To rectify this we need to store data in non-volatile or permanent storage media (Hard
disk,USB,DVD .etc)
What is a file?
• A file is a collection of data on secondary storage device (non-volatile) is stored in named
locations on the device or media called files.
Example : How use notebook? To open - To read or write – To finally close.
• The Notebook concept can be applied to files , that is we first open a file, read or write to it ,
and then finally close it.
• Hence, in Python, a file operation takes place in the following order.
1. Open a file 2. Read or Write (perform operation) 3. Close the file
• A file is basically used real life applications involve huge amount of data and in such
situations the console oriented I/O operations pose two major problems.
1. It becomes cumbersome and time consuming to handle huge amount of data through terminal
2. Using I/O operations with terminal ,the entire data is lost when either program or computer is
terminated. It becomes necessary to store data on permanent storage and read whenever necessary
with out destroying the data.
• How data is read or written to a file I/O operations same as terminal also
What is Path in File System?
• A file is a collection of data stored on secondary storage device like
hard disk in such way that they can be easily retrieved as and when C:\
required.
• File system that are used to stores files in a tree or hierarchical
structure.
Academics Administration
• Tree Structure: At top to the tree 1 or more root nodes, under root
node, there are other files and folders or directories and each folder
turn contain other files and folders. Even those folder can also contain
other files and folders and this can go on to an almost limitless depth ,
type of file is indicated by its extension. PUC Engg
• Every file is identified by its path that begins from the root node or root
folder.(C:\)
• In windows C:\ is the root folder but you can access from other drives 1 2
like D:\,E:\,etc
• The File path is also know as pathname ,slash(\) delimiter is used to
separate the folder names is specific to the file system.
filePath.py
• C:\Academics\PUC\2\filePath.py
Relative and Absolute Path
• A file path can be either relative or absolute.
• An Absolute path always contains the root and the complete directory list to specify the exact
location the file.
• Relative path on the other hand, needs to be combined with another path in order to access a
file. That is relative path names starts with respect to the current working directory.
• For Example : C:\Academics\PUC\2\filePath.py Absolute , 2\filePath.py relative
• Relative file path is specified relative to the program’s current working directory
• Note: when a relative file path, is specified ,the relative path is joined with current
working directory to create an absolute file path .
• For Example : C:\Academics is current working directory , then relative path is
2\filePath.py is equivalent to using its absolute path.
• If you use a relative file path from the wrong directory then wrong file will be accessed or no
file will be accessed.
Types of files(text & binary)
• Python as supports two type of files text files and binary files
• A text file is a stream of characters that can be sequentially processed by a computer
in forward direction.
• A text file can process characters they can only read or write data one character at a
time, the newline characters may be converted to or from carriage- return /linefeed.
• Another important thing is that when a text file is used , there are actually two
representations of data internal or external.
• An integer value will be represented as a number that occupies 2 or 4 bytes of
memory internally but externally the integer value will be represented as a string of
characters its decimal or hexadecimal.
• In a text file, each line of data end with a newline character .Each file ends with a
special character called the end - of - file (EOF) marker.
Binary File
• A binary file is a file which may contain any type of data . Encoded in binary form for computer storage
and processing purposes.
• It includes file such as word processing documents, PDFs, images, spreadsheets, videos, zip files, and
other executable programs.
• A text file and a binary file is a collection of bytes. A binary file is also referred to as a character stream
with following two essential differences.
• A binary file does not require any special processing of the data and each byte of data is transferred to
or from the disk unprocessed.
• Python places no processing on the file, and it may be read from, or written to ,it any manner the
programmer wants.
• While text file can be processed sequentially , binary files, on the other hand, can be either processed
sequentially or randomly depending on the need of the application.
• Binary files store data in the internal representation format. An integer value will be stored in binary
form as 2 bytes whereas text format 3 bytes and its also end with an EOF marker
• Text file contains only basic characters ,do not store any information about color, font and size of text
whereas binary files are mainly used to store data beyond such as images, executables etc.
Opening a file
• Before reading or writing a file , we must first open it using python’s built-in open()
function, this function creates file object, which will be used to invoke methods
associated with it.
• The syntax : - fileObj = open( file_name,[ access_mod ] )
• Here file_name is a string value that specifies name of the file that you want to access
• access_mode indicates the mode in which the file has to be opened i.e., read, write,
append, etc. The open() function returns a file object. This file object will be used to
read, write, or perform any other operation on the file. It works like a file handle.
• Access_mode is an optional parameter and the default file access mode is read(r).
Access Modes
Mode Purpose
r/rb This is default mode of opening a file which opens the file for reading only ,pointer is place at be beginning
This mode opens a file for both reading and writing in binary format. Pointer is placed at beginning of the
r+/rb+
file(in binary format also)
This mode opens the file for writing only. when a file is opened in w mode. Two things can happen.if the file
w/wb doesn’t exists , a new file is created for writing, if the file already exists and has some data stored in it.that
contains overwritten. (in binary also)
Opens a file both reading and writing .when a file is opened in this mode, two things can happen. If the file
w+/wb+ doesn’t , a new file is created for reading as well as writing, if the file already exists and has some data stored
in it.that contains overwritten(in binary format also)
Opens a file for appending .The file pointer is place at the end of the file if the file exists , If the file does
a/ab not exist it create a new file for writing(in binary format also)
Opens a file in reading and appending .The file pointer is place at the end of the file if the file exists , If the
a+/ab+ file does not exist it create a new file for reading and writing(in binary format also)
The File Object Attributes
• Once a file is successfully opened , a file object is returned . Using this object, you
can easily access different types of information related to that file.
• This information can be obtained by reading values of specific attributes of the file
Attribute Information Obtained
fileObj.closed Returns True if the file is closed and False otherwise
fileObj.mode Returns access mode with which file has been open
fileObj.name Return name of the file
The close() Method
• The close() method as the name suggests is used to closed the file object. Once a file
object is closed , you cannot further read from or write into the file associated with the
file object.
• While closing the file object the close() flushes any unwritten information.
• Python automatically closes a file when the reference object is of a file is reassigned to
another file, but as a good programming habit you should always explicitly use the
close() method to close a file ,the syntax
• Once the file is closed using the close() method ,any attempt to use the file object will
result in an error.
• Python has a garbage collector to clean up unrepresented objects but still it is our
responsibility to close the file and release the resource consumed by it.
• This method is not entirely safe. If
an exception occurs when we are
performing some operation with the
file, the code exits without closing
the file.
• A safer way is to use a try...finally
block.
• This way, we are guaranteed that the
file is properly closed even if an
exception is raised, causing program
flow to stop.
Reading and Writing Files
• The read() and write() are used to read data from file and write data to files respectively.
write()
• The write() method is used to write a string to an already opened file. Of course this string may include
numbers, special characters, or other symbols.
• While writing data to a file , you must remember that the write() method does not add a newline(‘\n’)
to the end of the string .
The syntax of write() :- file.write()
• As per the syntax, the string that is passed as an argument to the write() is written into the opened file
and it in write 'w', append 'a' or exclusive creation 'x‘ are modes of write()
• We need to be careful with the 'w' mode as it will overwrite into the file if it already exists. All previous
data are erased.
• Writing a string or sequence of bytes (for binary files) is done using write() method. This method
returns the number of characters written to the file.
• This program will create a new file named 'test.txt' if it does not exist. If it does exist, it is overwritten.
• The write() method returns None and
• The writelines() method is used to write a list of strings
append() Method
• Once you have stored some data in a file , you
can always open that file again to write more
data or append data to it.
• To append a file , you must open it using ‘a’ or
‘ab’ mode depending on whether it is a text file
of binary file. ‘open with w’ or ‘wb’ mode then
start writing data into it . Then its existing
contents would be overwritten.
• Open the file in ‘a’ or ‘ab’ mode to add more
data to existing data stored in a file and
appending data is especially essential when
creating a log of events or combining a large set
of data into file
The read() and readline()Methods
• The read() method is used to read a string from an already opened file, the string can
include alphabets,numbers,characters,or other symbols.
• The syntax of read() method given as, file. read([count])
• Count is optional parameter which if passed to the read() method specifies the
number of bytes to be read from the opened file.
• The read() method starts reading from the beginning of the file and if count is missing
or has a negative value, then it reads entire contents of the file
• Note : if you try open a file for reading that does not exist, then you will get an error
• read() method returns newline as ‘\n’
• The readline() method is used to read a single line from the file, the method returns
an empty string when the end of the line has been reached.
• The blank line is represented by \n and the readline() method returns a string
containing only newline character when a blank line is encountered in the file
• After reading a line from the file using the
readline() method, the control automatically
passes to the next line.
• The readlines() or read() method is used
to read all lines in the file and display the content of the file using list() method.
• We can change our current file cursor (position) using the seek() method. Similarly,
the tell() method returns our current position (in number of bytes).
• We can read a file line-by-line using a for loop. This is both efficient and fast
• All reading methods return an empty string when end-of-file(EOF) is reached
• That is , if you have to read the entire file and then again call readline(),an empty
string would be returned.
Using with keyword
• It is good programming habit to use the with keyword when working with file objects
• That the file is properly closed after it is used even if an error occurs during read or
write operation or even when you forget to explicitly close the file.
• The file is opened using with keyword
• It is automatically closed after for loop is
over
• The file is opened using without keyword
• It is not automatically closed ,we need to explicitly close the file after using it.
• When you open a file for reading or writing, the file is searched in the current working
directory, if the file exists somewhere else then you need to specify the path of the file
Splitting words
• Python allows you to read line(s) from a file and splits the line (treated as a string)
based on a character,
• By default character is space, but your can even specify any other character to split
words in the string
• fileno() – returns the file number of the file
which is an integer description
• flush() – flushes the write buffer of the file
stream
• isatty() – returns True if the file stream is
interactive and False otherwise
• truncate() – resize the file into n bytes
• rstrip() – strips off whitespaces including
newline characters from the right
side of the string read from the file
(lstrip- leftstrip)
File Positions
• The file management system is associates a pointer often know as file pointer that
facilitates the movement across the file for reading and writing data.
• The tell() method tells the current position within the file at which the next read or
write operation will occur, it specifies number of bytes from the beginning.
• The seek(offset[,from]) method is used to set the position of the file pointer of in
simpler terms, move the file pointer to a new location.
• The offset argument indicates number of bytes to be moved and the from argument
specifies the reference position form where the bytes are to be moved.
• From arguments 0 – From the beginning of the file
1 – From the current position of the file
2 – From the end of the file
• File Positons
Renaming and deleting Files
• To perform file – processing operations like renaming and deleting files, to use methods
defined in the os module.
• The rename() methods takes two arguments, the current filename and the new filename
• The remove() method take a filename as an argument and deletes that file.
Practice Problems
1. Write a program that copies first 10 bytes of a binary file into another?
2. Write a program that copies one python script into another in such a way that all
comment lines are skipped and not copied in the destination file.
3. Write a program that accepts filename as an input from the user. Open the file and
count the number of times character appears in the file?
4. Write a program that reads data from a file and calculates the percentage of vowels
and consonants in the file?