Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f41485d

Browse files
authored
Add several points about the regex module
* Warn about the default matching of unicode character classes. This is a serious source of bugs. * Show that patterns can be pre-compiled and reused. * Mention `help(re)`, because it's a very good resource.
1 parent 2bd6df8 commit f41485d

File tree

1 file changed

+14
-4
lines changed

1 file changed

+14
-4
lines changed

README.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,17 @@ import re
276276
* **Parameter `'flags=re.IGNORECASE'` can be used with all functions.**
277277
* **Parameter `'flags=re.DOTALL'` makes dot also accept newline.**
278278
* **Use `r'\1'` or `'\\\\1'` for backreference.**
279-
* **Use `'?'` to make operators non-greedy.**
279+
* **Use `'?'` to make operators non-greedy.**
280+
* **Call `help(re)` in the Python console to get a comprehensive usage guide (`import re` first).**
281+
282+
### Precompiled Patterns
283+
**Create a reusable pattern object and call its methods to apply the above functions to it.**
284+
285+
```python
286+
pattern = re.compile(<regex>, FLAGS)
287+
return (pattern.sub('NEW TEXT', doc) for doc in documents)
288+
```
289+
280290

281291
### Match Object
282292
```python
@@ -290,9 +300,9 @@ import re
290300
### Special Sequences
291301
**Use capital letter for negation.**
292302
```python
293-
'\d' == '[0-9]' # Digit
294-
'\s' == '[ \t\n\r\f\v]' # Whitespace
295-
'\w' == '[a-zA-Z0-9_]' # Alphanumeric
303+
'\d' == '[0-9]' # Digit (includes all other Unicode digits by default)
304+
'\s' == '[ \t\n\r\f\v]' # Whitespace (includes other Unicode whitespace by default)
305+
'\w' == '[a-zA-Z0-9_]' # Alphanumeric (includes non-ascii Unicode letters by default)
296306
```
297307

298308

0 commit comments

Comments
 (0)