Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 8a5b3bf

Browse files
authored
fix: bug#15089 - Removing leading indentation (#15110)
# Fix doctest paste: strip prompts and dedent (#15089) ## Summary This PR improves how IPython handles pasting doctest and xdoctest code snippets. When pasting code like: ``` >>> print(1) ``` IPython now strips the `>>>` prompt and also removes the extra indentation, so the result is: ``` print(1) ``` Previously, the result would have been: ``` print(1) ``` which can cause an `IndentationError` at top-level. ## What changed - `inputtransformer2.py` - Enhanced PromptStripper to: - Detect doctest-style pastes when any line matches `^\s*>>>`. - In doctest-mode, strip `^\s*>>>\s?` and `^\s*\.\.\.\s?` (continuation). - After stripping, apply `textwrap.dedent()` to the joined block, then return dedented lines. - Only apply dedent when doctest prompts were actually stripped (to avoid changing non-doctest pastes). - Only treat `...` as a continuation prompt when a `>>>` is present in the same block (avoids confusing Python Ellipsis with doctest continuation). - `test_inputtransformer2_line.py` - Added/updated tests covering single-line doctest, indented `>>>`, multi-line `>>>` + `...` doctest, and standalone ... (unchanged when no `>>>`). ## Description of changes - Update `IPython/core/inputtransformer2.py` to enhance the `PromptStripper`: - Detect doctest-mode for a pasted block when any line matches `^\s*>>>`. - In doctest-mode, strip indented doctest prompts: - `^\s*>>>\s?` (primary prompt) - `^\s*\.\.\.\s?` (continuation prompt) - After stripping doctest prompts, apply `textwrap.dedent()` to the joined block and return the dedented lines. - Only apply `dedent()` when doctest prompts were actually stripped to avoid changing normal paste semantics. - Update/add tests in `tests/test_inputtransformer2_line.py` for the new behavior. ## Motivation and context Users often copy doctest examples from documentation or tools that show doctest prompts and indentation (for example, xdoctest-style indented examples). Previously IPython removed the `>>>` prompt but preserved the remaining indentation, which could turn a pasted, single-line doctest into an indented top-level statement and raise `IndentationError`. This PR makes doctest pastes behave as users expect: prompts are removed and the code block is normalized by dedenting. ## Examples (Before → After) ### Single-line doctest Input: ```py >>> print(1) ``` Before: ```py print(1) ``` After: ```py print(1) ``` ### Indented xdoctest-style prompt Input: ```py >>> print(1) ``` After: ```py print(1) ``` ### Multi-line doctest with continuation Input: ```py >>> for i in range(2): ... print(i) ``` After: ```py for i in range(2): print(i) ``` ### Standalone continuation prompt (unchanged) Input: ```py ... print(1) ``` After (unchanged because no `>>>` present): ```py ... print(1) ``` ## Implementation notes - `dedent()` is applied only after doctest prompts were detected and stripped. This avoids affecting normal paste behavior for non-doctest code, and prevents accidental removal of meaningful indentation in user-pasted code. - `...` is considered a continuation prompt only when the pasted block contains at least one `>>>`, to avoid confusing Python's `Ellipsis` literal with doctest continuations. - Regexes used are tolerant of leading whitespace to support indented `>>>` prompts (xdoctest-style). ## Related issues - Fixes #15089 ## Testing performed - Added and updated tests in `tests/test_inputtransformer2_line.py` to cover: - Single-line doctest with extra indentation: `">>> print(1)\n" -> "print(1)\n"` - Leading whitespace before `>>>`: `" >>> print(1)\n" -> "print(1)\n"` - Multiline doctest using `>>>` and `...` continuation lines. - Standalone `...` only lines remain unchanged. ## Doc Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=191372963
2 parents 7dbbd24 + 1110979 commit 8a5b3bf

2 files changed

Lines changed: 149 additions & 4 deletions

File tree

IPython/core/inputtransformer2.py

Lines changed: 92 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,24 +67,114 @@ class PromptStripper:
6767
If any prompt is found on the first two lines,
6868
prompts will be stripped from the rest of the block.
6969
"""
70-
def __init__(self, prompt_re, initial_re=None):
70+
def __init__(self, prompt_re, initial_re=None, *, doctest=False):
7171
self.prompt_re = prompt_re
7272
self.initial_re = initial_re or prompt_re
73+
self.doctest = doctest
74+
if doctest:
75+
# Doctest/xdoctest prompts may be indented (e.g. " >>>").
76+
# We only treat "..." as a continuation prompt when the same pasted
77+
# block contains at least one ">>>" line, to avoid ambiguity with the
78+
# Python Ellipsis literal.
79+
self._doctest_initial_re = re.compile(r'^\s*>>>')
80+
self._doctest_ps1_re = re.compile(r'^\s*>>>\s?')
81+
self._doctest_ps2_re = re.compile(r'^\s*\.\.\.\s?')
82+
83+
# Very small state machine to detect triple-quoted strings in the
84+
# *same* input block (e.g. user typed """ then pasted doctest).
85+
# We preserve literal >>> / ... inside triple-quoted strings.
86+
self._triple_quote_re = re.compile(r"(?<!\\)(\"\"\"|''')")
87+
88+
89+
def _triple_quote_mask(self, lines: List[str]) -> List[bool]:
90+
"""
91+
Return a boolean mask: True if the corresponding line is considered
92+
inside a triple-quoted string literal.
93+
94+
This is intentionally heuristic (fast + good enough for paste handling).
95+
"""
96+
mask: List[bool] = []
97+
in_triple: str | None = None # either ''' or """
98+
for line in lines:
99+
mask.append(in_triple is not None)
100+
# Toggle state for each occurrence of """ or ''' in the line.
101+
for m in self._triple_quote_re.finditer(line):
102+
q = m.group(1)
103+
if in_triple is None:
104+
in_triple = q
105+
mask[-1] = True # current line is inside triple quotes
106+
elif in_triple == q:
107+
in_triple = None
108+
# else: ignore mismatched triple quote while inside
109+
return mask
110+
73111

74112
def _strip(self, lines):
75113
return [self.prompt_re.sub('', l, count=1) for l in lines]
76114

77115
def __call__(self, lines):
78116
if not lines:
79117
return lines
118+
119+
if self.doctest:
120+
triple_mask = self._triple_quote_mask(lines)
121+
122+
# Detect doctest prompts only outside triple-quoted strings.
123+
has_doctest_outside = any(
124+
(not in_triple) and self._doctest_initial_re.match(l)
125+
for l, in_triple in zip(lines, triple_mask)
126+
)
127+
if not has_doctest_outside:
128+
return lines
129+
130+
out_lines: List[str] = []
131+
stripped_mask: List[bool] = []
132+
133+
for l, in_triple in zip(lines, triple_mask):
134+
if in_triple:
135+
out_lines.append(l)
136+
stripped_mask.append(False)
137+
continue
138+
139+
if self._doctest_ps1_re.match(l):
140+
new_l = self._doctest_ps1_re.sub('', l, count=1)
141+
elif self._doctest_ps2_re.match(l):
142+
new_l = self._doctest_ps2_re.sub('', l, count=1)
143+
else:
144+
new_l = l
145+
out_lines.append(new_l)
146+
stripped_mask.append(new_l != l)
147+
148+
# Dedent only the non-triple-quoted segments where stripping occurred.
149+
dedented: List[str] = []
150+
i = 0
151+
while i < len(out_lines):
152+
j = i
153+
in_triple = triple_mask[i]
154+
while j < len(out_lines) and triple_mask[j] == in_triple:
155+
j += 1
156+
157+
segment = out_lines[i:j]
158+
seg_stripped = any(stripped_mask[i:j])
159+
160+
if (not in_triple) and seg_stripped:
161+
dedented.extend(dedent(''.join(segment)).splitlines(keepends=True))
162+
else:
163+
dedented.extend(segment)
164+
165+
i = j
166+
167+
return dedented
168+
80169
if self.initial_re.match(lines[0]) or \
81170
(len(lines) > 1 and self.prompt_re.match(lines[1])):
82171
return self._strip(lines)
83172
return lines
84173

85174
classic_prompt = PromptStripper(
86175
prompt_re=re.compile(r'^(>>>|\.\.\.)( |$)'),
87-
initial_re=re.compile(r'^>>>( |$)')
176+
initial_re=re.compile(r'^>>>( |$)'),
177+
doctest=True,
88178
)
89179

90180
ipython_prompt = PromptStripper(

tests/test_inputtransformer2_line.py

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,15 +45,70 @@ def test_cell_magic():
4545
... print(a ** 2)
4646
""",
4747
"""\
48+
for a in range(5):
49+
... print(a)
50+
... print(a ** 2)
51+
""",
52+
)
53+
54+
CLASSIC_PROMPT_L3 = (
55+
"""\
56+
>>> \"\"\"
57+
... This code is inside a triple-quoted string.
58+
... >>> for a in range(5):
59+
... ... print(a)
60+
... \"\"\"
61+
>>> for a in range(5):
62+
... print(a)
63+
""",
64+
"""\
65+
>>> \"\"\"
66+
... This code is inside a triple-quoted string.
67+
... >>> for a in range(5):
68+
... ... print(a)
69+
... \"\"\"
4870
for a in range(5):
4971
print(a)
50-
print(a ** 2)
5172
""",
5273
)
5374

75+
CLASSIC_PROMPT_DEDENT_SINGLE_LINE = (
76+
">>> print(1)\n",
77+
"print(1)\n",
78+
)
79+
80+
CLASSIC_PROMPT_DEDENT_LEADING_WS = (
81+
" >>> print(1)\n",
82+
"print(1)\n",
83+
)
84+
85+
CLASSIC_PROMPT_MULTILINE_DOCTEST = (
86+
"""\
87+
>>> for i in range(2):
88+
... print(i)
89+
""",
90+
"""\
91+
for i in range(2):
92+
print(i)
93+
""",
94+
)
95+
96+
CLASSIC_PROMPT_STANDALONE_CONTINUATION = (
97+
"... print(1)\n",
98+
"... print(1)\n",
99+
)
100+
54101

55102
def test_classic_prompt():
56-
for sample, expected in [CLASSIC_PROMPT, CLASSIC_PROMPT_L2]:
103+
for sample, expected in [
104+
CLASSIC_PROMPT,
105+
CLASSIC_PROMPT_L2,
106+
CLASSIC_PROMPT_L3,
107+
CLASSIC_PROMPT_DEDENT_SINGLE_LINE,
108+
CLASSIC_PROMPT_DEDENT_LEADING_WS,
109+
CLASSIC_PROMPT_MULTILINE_DOCTEST,
110+
CLASSIC_PROMPT_STANDALONE_CONTINUATION,
111+
]:
57112
assert ipt2.classic_prompt(
58113
sample.splitlines(keepends=True)
59114
) == expected.splitlines(keepends=True)

0 commit comments

Comments
 (0)