Raw strings PREMIUM

Series: Strings

Trey Hunner

2 min. read • Watch as video • Python 3.10—3.14 • Aug. 12, 2021

Watch as video

01:52

Show captions

Autoplay

Auto-expand

Let's talk about raw strings.

Escapes Sequences can be confusing

Normally backslashes (\) in strings represent escape sequences.

\n represents a newline character:

>>> message = "Hello\nworld"
>>> print(message)
Hello
world

Escape sequences can be a problem sometimes.

This representation of escape sequence could be a problem sometimes.

We have a string here that is supposed to represent a Windows file path:

>>> filename = "C:\Users\Nathan"

But when we run this code, we get a SyntaxError because \U and \N both mean something special and we're misusing those escape sequences them here.

>>> filename = "C:\Users\Nathan"
  File "<stdin>", line 1
    filename = "C:\Users\Nathan"
                                ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

The traditional double backslash fix

We can fix this problem by doubling up our backslashes (using \\ instead of \):

>>> filename = "C:\\Users\\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)**
C:\Users\Nathan

This tells Python that we want literal backslash characters, not escape sequences.

Making strings without escape sequences

But there's another way to fix this backslash problem. If we prefix our string literal with an r, this will double up our backslashes for us automatically:

>>> filename = r"C:\Users\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)
C:\Users\Nathan

We've just made a raw string.

A raw string tells Python:

This string doesn't have any escape sequences
Every backslash (\) should be taken literally (not as the start of an escape sequence)

Raw strings are a way to avoid leaning toothpick syndrome (when your strings become unreadable because there's so many backslashes in them).

Raw strings are often used for regular expressions

Raw strings are often used with regular expressions in Python.

We have a regular expression here (r"\bpython\b") that looks for every use of the word "python", as an entire word (with word boundaries around it):

>>> import re
>>> statement = "I like Python, but I fear pythons"
>>> re.findall(r"\bpython\b", statement, flags=re.IGNORECASE)
['Python']

This re.findall call gave us back a list with single entry. So, pythons (with an s on the end) isn't matched, but Python (with a comma after it) is matched just fine.

If we remove the r prefix before our regular expression, \b will be treated as an escape sequence (\b represents a backspace in ASCII land).

>>> re.findall("\bpython\b", statement, flags=re.IGNORECASE)
[]

Because those \b represent an escape sequence, they don't end up giving us any matches.

Whenever I'm making a regular expression in Python, I always prefix it with an r (just in case).

Summary

Raw strings are a way of making a string in Python that has no escape sequences and instead reads every backslash as a literal backslash.

A Python tip every week

Need to fill-in gaps in your Python skills?

Series: Strings

Regardless of what you're doing in Python, you almost certainly use strings all the time. A string is usually the default tool we reach for when we don't have a more specific way to represent our data.

To track your progress on this Python Morsels topic trail, sign in or sign up.

Strings in Python