Thanks to visit codestin.com
Credit goes to www.pythonmorsels.com

Raw strings PREMIUM

Series: Strings
Trey Hunner smiling in a t-shirt against a yellow wall
Trey Hunner
2 min. read Watch as video Python 3.10—3.14
Python Morsels
Watch as video
01:52

Let's talk about raw strings.

Escapes Sequences can be confusing

Normally backslashes (\) in strings represent escape sequences.

\n represents a newline character:

>>> message = "Hello\nworld"
>>> print(message)
Hello
world

Escape sequences can be a problem sometimes.

This representation of escape sequence could be a problem sometimes.

We have a string here that is supposed to represent a Windows file path:

>>> filename = "C:\Users\Nathan"

But when we run this code, we get a SyntaxError because \U and \N both mean something special and we're misusing those escape sequences them here.

>>> filename = "C:\Users\Nathan"
  File "<stdin>", line 1
    filename = "C:\Users\Nathan"
                                ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The traditional double backslash fix

We can fix this problem by doubling up our backslashes (using \\ instead of \):

>>> filename = "C:\\Users\\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)**
C:\Users\Nathan

This tells Python that we want literal backslash characters, not escape sequences.

Making strings without escape sequences

But there's another way to fix this backslash problem. If we prefix our string literal with an r, this will double up our backslashes for us automatically:

>>> filename = r"C:\Users\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)
C:\Users\Nathan

We've just made a raw string.

A raw string tells Python:

  1. This string doesn't have any escape sequences
  2. Every backslash (\) should be taken literally (not as the start of an escape sequence)

Raw strings are a way to avoid leaning toothpick syndrome (when your strings become unreadable because there's so many backslashes in them).

Raw strings are often used for regular expressions

Raw strings are often used with regular expressions in Python.

We have a regular expression here (r"\bpython\b") that looks for every use of the word "python", as an entire word (with word boundaries around it):

>>> import re
>>> statement = "I like Python, but I fear pythons"
>>> re.findall(r"\bpython\b", statement, flags=re.IGNORECASE)
['Python']

This re.findall call gave us back a list with single entry. So, pythons (with an s on the end) isn't matched, but Python (with a comma after it) is matched just fine.

If we remove the r prefix before our regular expression, \b will be treated as an escape sequence (\b represents a backspace in ASCII land).

>>> re.findall("\bpython\b", statement, flags=re.IGNORECASE)
[]

Because those \b represent an escape sequence, they don't end up giving us any matches.

Whenever I'm making a regular expression in Python, I always prefix it with an r (just in case).

Summary

Raw strings are a way of making a string in Python that has no escape sequences and instead reads every backslash as a literal backslash.

Python Morsels
Watch as video
01:52
This is a free preview of a premium screencast. You have 2 previews remaining.