Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
Let's talk about raw strings.
Normally backslashes (\) in strings represent escape sequences.
\n represents a newline character:
>>> message = "Hello\nworld"
>>> print(message)
Hello
world
Escape sequences can be a problem sometimes.
This representation of escape sequence could be a problem sometimes.
We have a string here that is supposed to represent a Windows file path:
>>> filename = "C:\Users\Nathan"
But when we run this code, we get a SyntaxError because \U and \N both mean something special and we're misusing those escape sequences them here.
>>> filename = "C:\Users\Nathan"
File "<stdin>", line 1
filename = "C:\Users\Nathan"
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
We can fix this problem by doubling up our backslashes (using \\ instead of \):
>>> filename = "C:\\Users\\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)**
C:\Users\Nathan
This tells Python that we want literal backslash characters, not escape sequences.
But there's another way to fix this backslash problem.
If we prefix our string literal with an r, this will double up our backslashes for us automatically:
>>> filename = r"C:\Users\Nathan"
>>> filename
'C:\\Users\\Nathan'
>>> print(filename)
C:\Users\Nathan
We've just made a raw string.
A raw string tells Python:
\) should be taken literally (not as the start of an escape sequence)Raw strings are a way to avoid leaning toothpick syndrome (when your strings become unreadable because there's so many backslashes in them).
Raw strings are often used with regular expressions in Python.
We have a regular expression here (r"\bpython\b") that looks for every use of the word "python", as an entire word (with word boundaries around it):
>>> import re
>>> statement = "I like Python, but I fear pythons"
>>> re.findall(r"\bpython\b", statement, flags=re.IGNORECASE)
['Python']
This re.findall call gave us back a list with single entry.
So, pythons (with an s on the end) isn't matched, but Python (with a comma after it) is matched just fine.
If we remove the r prefix before our regular expression, \b will be treated as an escape sequence (\b represents a backspace in ASCII land).
>>> re.findall("\bpython\b", statement, flags=re.IGNORECASE)
[]
Because those \b represent an escape sequence, they don't end up giving us any matches.
Whenever I'm making a regular expression in Python, I always prefix it with an r (just in case).
Raw strings are a way of making a string in Python that has no escape sequences and instead reads every backslash as a literal backslash.
Need to fill-in gaps in your Python skills?
Sign up for my Python newsletter where I share one of my favorite Python tips every week.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.