Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 5a045b9

Browse files
committed
#10713: Improve documentation for \b and \B and add a few tests. Initial patch and tests by Martin Pool.
1 parent 62417a0 commit 5a045b9

File tree

2 files changed

+40
-8
lines changed

2 files changed

+40
-8
lines changed

Doc/library/re.rst

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -330,16 +330,22 @@ the second character. For example, ``\$`` matches the character ``'$'``.
330330
Matches the empty string, but only at the beginning or end of a word.
331331
A word is defined as a sequence of Unicode alphanumeric or underscore
332332
characters, so the end of a word is indicated by whitespace or a
333-
non-alphanumeric, non-underscore Unicode character. Note that
334-
formally, ``\b`` is defined as the boundary between a ``\w`` and a
335-
``\W`` character (or vice versa). By default Unicode alphanumerics
336-
are the ones used, but this can be changed by using the :const:`ASCII`
337-
flag. Inside a character range, ``\b`` represents the backspace
338-
character, for compatibility with Python's string literals.
333+
non-alphanumeric, non-underscore Unicode character. Note that formally,
334+
``\b`` is defined as the boundary between a ``\w`` and a ``\W`` character
335+
(or vice versa), or between ``\w`` and the beginning/end of the string.
336+
This means that ``r'\bfoo\b'`` matches ``'foo'``, ``'foo.'``, ``'(foo)'``,
337+
``'bar foo baz'`` but not ``'foobar'`` or ``'foo3'``.
338+
339+
By default Unicode alphanumerics are the ones used, but this can be changed
340+
by using the :const:`ASCII` flag. Inside a character range, ``\b``
341+
represents the backspace character, for compatibility with Python's string
342+
literals.
339343

340344
``\B``
341-
Matches the empty string, but only when it is *not* at the beginning or end of a
342-
word. This is just the opposite of ``\b``, so word characters are
345+
Matches the empty string, but only when it is *not* at the beginning or end
346+
of a word. This means that ``r'py\B'`` matches ``'python'``, ``'py3'``,
347+
``'py2'``, but not ``'py'``, ``'py.'``, or ``'py!'``.
348+
``\B`` is just the opposite of ``\b``, so word characters are
343349
Unicode alphanumerics or the underscore, although this can be changed
344350
by using the :const:`ASCII` flag.
345351

Lib/test/test_re.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -355,6 +355,32 @@ def test_special_escapes(self):
355355
self.assertEqual(re.search(r"\d\D\w\W\s\S",
356356
"1aa! a", re.UNICODE).group(0), "1aa! a")
357357

358+
def test_string_boundaries(self):
359+
# See http://bugs.python.org/issue10713
360+
self.assertEqual(re.search(r"\b(abc)\b", "abc").group(1),
361+
"abc")
362+
# There's a word boundary at the start of a string.
363+
self.assertTrue(re.match(r"\b", "abc"))
364+
# A non-empty string includes a non-boundary zero-length match.
365+
self.assertTrue(re.search(r"\B", "abc"))
366+
# There is no non-boundary match at the start of a string.
367+
self.assertFalse(re.match(r"\B", "abc"))
368+
# However, an empty string contains no word boundaries, and also no
369+
# non-boundaries.
370+
self.assertEqual(re.search(r"\B", ""), None)
371+
# This one is questionable and different from the perlre behaviour,
372+
# but describes current behavior.
373+
self.assertEqual(re.search(r"\b", ""), None)
374+
# A single word-character string has two boundaries, but no
375+
# non-boundary gaps.
376+
self.assertEqual(len(re.findall(r"\b", "a")), 2)
377+
self.assertEqual(len(re.findall(r"\B", "a")), 0)
378+
# If there are no words, there are no boundaries
379+
self.assertEqual(len(re.findall(r"\b", " ")), 0)
380+
self.assertEqual(len(re.findall(r"\b", " ")), 0)
381+
# Can match around the whitespace.
382+
self.assertEqual(len(re.findall(r"\B", " ")), 2)
383+
358384
def test_bigcharset(self):
359385
self.assertEqual(re.match("([\u2222\u2223])",
360386
"\u2222").group(1), "\u2222")

0 commit comments

Comments
 (0)