-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-131791: Improve speed of textwrap.dedent
by replacing re
#131792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
b722c23
optimized dedent by 4x for larger files, added only_whitespace option…
Marius-Juston 540b462
removed necessary regex expression
Marius-Juston 2159118
added news entry
Marius-Juston 8cf2153
updated news inline literals backticks
Marius-Juston 7754af3
corrected dedent to pass test cases
Marius-Juston 76f231a
added missing os dependencie
Marius-Juston 8dce5ab
reset what the current doc was
Marius-Juston 520f25f
much simpler implementation, using fileter and common prefix for fast…
Marius-Juston e703e0d
small mistake for single line
Marius-Juston 159e363
forgot to invert the compairson
Marius-Juston 1c2678e
minor optimization, using lstrip and more concise margin_len computation
Marius-Juston da9e65c
Merge remote-tracking branch 'origin/main'
Marius-Juston c7cd5ce
updated variable name
Marius-Juston 3ab30a7
Merge branch 'main' into main
Marius-Juston 199d237
minor faster filtering using list comprehension
Marius-Juston 27fcf53
small speedup by removing ends with
Marius-Juston 1379566
strip seems to give an extremely minor performance boost over lstrip
Marius-Juston b71240d
fixed unecessary end line operator for empty
Marius-Juston File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,7 @@ | |
# Written by Greg Ward <[email protected]> | ||
|
||
import re | ||
import os | ||
|
||
__all__ = ['TextWrapper', 'wrap', 'fill', 'dedent', 'indent', 'shorten'] | ||
|
||
|
@@ -413,9 +414,6 @@ def shorten(text, width, **kwargs): | |
|
||
# -- Loosely related functionality ------------------------------------- | ||
|
||
_whitespace_only_re = re.compile('^[ \t]+$', re.MULTILINE) | ||
_leading_whitespace_re = re.compile('(^[ \t]*)(?:[^ \t\n])', re.MULTILINE) | ||
|
||
def dedent(text): | ||
"""Remove any common leading whitespace from every line in `text`. | ||
|
||
|
@@ -429,42 +427,15 @@ def dedent(text): | |
|
||
Entirely blank lines are normalized to a newline character. | ||
""" | ||
# Look for the longest leading string of spaces and tabs common to | ||
# all lines. | ||
margin = None | ||
text = _whitespace_only_re.sub('', text) | ||
indents = _leading_whitespace_re.findall(text) | ||
for indent in indents: | ||
if margin is None: | ||
margin = indent | ||
|
||
# Current line more deeply indented than previous winner: | ||
# no change (previous winner is still on top). | ||
elif indent.startswith(margin): | ||
pass | ||
|
||
# Current line consistent with and no deeper than previous winner: | ||
# it's the new winner. | ||
elif margin.startswith(indent): | ||
margin = indent | ||
|
||
# Find the largest common whitespace between current line and previous | ||
# winner. | ||
else: | ||
for i, (x, y) in enumerate(zip(margin, indent)): | ||
if x != y: | ||
margin = margin[:i] | ||
break | ||
if not text: | ||
return text | ||
|
||
lines = text.split("\n") | ||
|
||
# sanity check (testing/debugging only) | ||
if 0 and margin: | ||
for line in text.split("\n"): | ||
assert not line or line.startswith(margin), \ | ||
"line = %r, margin = %r" % (line, margin) | ||
margin = os.path.commonprefix([line for line in lines if line.strip()]) | ||
margin_len = len(margin) - len(margin.lstrip()) | ||
|
||
if margin: | ||
text = re.sub(r'(?m)^' + margin, '', text) | ||
return text | ||
return "\n".join([line[margin_len:] if line.strip() else "" for line in lines]) | ||
|
||
|
||
def indent(text, prefix, predicate=None): | ||
|
1 change: 1 addition & 0 deletions
1
Misc/NEWS.d/next/Library/2025-03-27-04-35-17.gh-issue-131792.UtGg3O.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Optimized :func:`textwrap.dedent`. It is now 2x faster than before for large inputs. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.