Thanks to visit codestin.com
Credit goes to www.pythonmorsels.com

Seeking in files PREMIUM

Series: Files
Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
5 min. read 4 min. video Python 3.10—3.14
Python Morsels
Watch as video
04:18

What does it mean to seek in a file?

Python keeps track of your file position

Here we have a text file called my_file.txt:

This is my file 👋
This is line 2
And this is line 3
This is the last line

When we open this file in Python and read from it, we'll start our reading from the beginning:

>>> f = open("my_file.txt")
>>> f.read()
'This is my file 👋\nThis is line 2\nAnd this is line 3\nThis is the last line\n'

But if we read again, we won't get anything back:

>>> f.read()
''

This is because we're now at the end of our file.

Python keeps track of the position that we're at within a file as we read from files (and as we write to files).

Re-positioning back to the beginning

You can change your position within a file by using the seek method which accepts a byte offset. Here we're seeking back to the first byte in our file (byte 0):

>>> f.seek(0)
0

If we start reading again, we'll start reading from the beginning of our file again:

>>> f.read()
'This is my file 👋\nThis is line 2\nAnd this is line 3\nThis is the last line\n'

Seeking is a byte-wise operation

Seeking back to the beginning of a file with seek(0) is by far the most common use of seek.

When you're working with a file opened in text mode (the default mode) you'll pretty much only ever use seek with a position of 0.

If you seek to an arbitrary position in a text file, you might get an error while reading from your file:

>>> with open("my_file.txt", mode="rt") as f:
...     f.seek(17)
...     contents = f.read()
...
17
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 0: invalid start byte

We got an error above because seek(17) moved our position to the middle of a multi-byte character in our file (characters can be represented by multiple bytes).

If we seek to an arbitrary position in a binary file, we shouldn't ever get an error when reading:

>>> with open("my_file.txt", mode="rb") as f:
...     f.seek(17)
...     contents = f.read()
...
17
>>> contents
b'\x9f\x91\x8b\nThis is line 2\nAnd this is line 3\nThis is the last line\n'

We don't get an error when seeking in a binary file because seeking is a byte-wise operation, and reading in a binary file is also a byte-wise operation.

The seek method's whence argument

The seek method also supports a whence argument. This whence argument represents from whence we seek in a file:

0 seeks from beginning (default)            non-negative offset
1 seeks from current position               any offset
2 seeks from end                            negative offset

The default value of whence is 0 which represents the beginning of the file.

Here we have a text file called readings.txt:

20220402A000308
20220402B000043
20220402N000762
20220402A000309
20220402F000025
20220402B000057
20220402A000321

Let's seek 16 bytes into our file (from the beginning) to move to the beginning of the second line:

>>> f = open("readings.txt", mode="rb")
>>> f.seek(16, 0)
16

And then we'll seek 9 bytes forward from our current position (whence of 1):

>>> f.seek(9, 1)
25

That was a forward seek because 9 is a positive number.

We'll then read 6 bytes:

Then we read 6 bytes:

>>> f.read(6)
b'000043'

And then we'll seek to position -7 from the end of our file:

>>> f.seek(-7, 2)
105

That's 7 bytes before the end of our file (negative numbers seek backward from the whence location).

Then we read 6 bytes, so we're reading those last bytes in our file:

>>> f.read(6)
b'000321'

You can't use whence with text files

It's really uncommon to see whence specified when file-seeking unless you're working with a file in binary mode.

For text files:

  • A whence of 1 is invalid
  • A whence of 2 is only valid with a position of 0 (meaning the very end of our file)
  • And 0 is the default whence value, so doesn't need to be specifying

So you you'll almost never see whence specified for files opened in text mode.

Finding your position with the tell method

Notice that every time we've called seek, we always get a number back. That number represents our byte offset relative to the beginning of our file.

>>> f.seek(-7, 2)
105

To see our current byte offset (without also changing it) we can use the tell method.

Here we're reading one line in our file, and then checking to see where we ended up:

>>> with open("my_file.txt") as f:
...     line = f.readline()
...     position = f.tell()
...
>>> position
21

We ended up at byte 21.

If we opened our file again and we knew the file hadn't changed, we could seek to position 21 and start reading again:

>>> with open("my_file.txt") as f:
...     f.seek(position)
...     contents = f.read()
...
21

When we started reading the second, we started from the second line in our file:

>>> contents
'This is line 2\nAnd this is line 3\nThis is the last line\n'

We were able to skipped over the first line because we knew that position we ended up at after reading that first line.

It's most common to specify a position of 0 to seek, but any position returned by tell is also a valid position for seek. So if you read characters from a text file and then use tell to find your position, you can confidently seek back to that same position later.

Strangely, the tell method is that it doesn't work if you loop over your file.

Here we're looping over a file to get two lines from it, and then we're using tell to ask what position we ended up at: But we get an error:

>>> with open("my_file.txt") as f:
...     first_two = []
...     for n, line in enumerate(f, start=1):
...         first_two.append(line)
...         if n >= 2:
...             break
...     stopped_at = f.tell()
...
Traceback (most recent call last):
  File "<stdin>", line 7, in <module>
OSError: telling position disabled by next() call

The tell method raised an error.

In order to use tell when looping line-by-line, we would need to loop manually by repeatedly calling the readline method on our file. This design decision is a bit unfortunate, but it's just part of the files work in Python.

Summary

When you read from files or write to files, Python will keep track of the position you're at within your file.

To change your position in a file, you can use the seek method. To see your current position in a file, you can use the tell method.

For files opening a file in text mode, you'll almost always use the seek method used with a position of 0. An offset of 0 represents the beginning of the file.

Now it's your turn! 🚀

We don't learn by reading or watching. We learn by doing. That means writing Python code.

Practice this topic by working on these related Python exercises.

Python Morsels
Watch as video
04:18
This is a free preview of a premium screencast. You have 2 previews remaining.