Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit c51da2b

Browse files
committed
#14332: provide a better explanation of junk in difflib docs
Initial patch by Alba Magallanes.
1 parent 2e3743c commit c51da2b

2 files changed

Lines changed: 24 additions & 16 deletions

File tree

Doc/library/difflib.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,9 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
2727
little fancier than, an algorithm published in the late 1980's by Ratcliff and
2828
Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to
2929
find the longest contiguous matching subsequence that contains no "junk"
30-
elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same
30+
elements; these "junk" elements are ones that are uninteresting in some
31+
sense, such as blank lines or whitespace. (Handling junk is an
32+
extension to the Ratcliff and Obershelp algorithm.) The same
3133
idea is then applied recursively to the pieces of the sequences to the left and
3234
to the right of the matching subsequence. This does not yield minimal edit
3335
sequences, but does tend to yield matches that "look right" to people.
@@ -210,7 +212,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
210212
Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style
211213
delta (a :term:`generator` generating the delta lines).
212214

213-
Optional keyword parameters *linejunk* and *charjunk* are for filter functions
215+
Optional keyword parameters *linejunk* and *charjunk* are filtering functions
214216
(or ``None``):
215217

216218
*linejunk*: A function that accepts a single string argument, and returns
@@ -224,7 +226,7 @@ diffs. For comparing directories and files, see also, the :mod:`filecmp` module.
224226
*charjunk*: A function that accepts a character (a string of length 1), and
225227
returns if the character is junk, or false if not. The default is module-level
226228
function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a
227-
blank or tab; note: bad idea to include newline in this!).
229+
blank or tab; it's a bad idea to include newline in this!).
228230

229231
:file:`Tools/scripts/ndiff.py` is a command-line front-end to this function.
230232

@@ -624,6 +626,12 @@ The :class:`Differ` class has this constructor:
624626
length 1), and returns true if the character is junk. The default is ``None``,
625627
meaning that no character is considered junk.
626628

629+
These junk-filtering functions speed up matching to find
630+
differences and do not cause any differing lines or characters to
631+
be ignored. Read the description of the
632+
:meth:`~SequenceMatcher.find_longest_match` method's *isjunk*
633+
parameter for an explanation.
634+
627635
:class:`Differ` objects are used (deltas generated) via a single method:
628636

629637

Lib/difflib.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -853,10 +853,9 @@ def __init__(self, linejunk=None, charjunk=None):
853853
and return true iff the string is junk. The module-level function
854854
`IS_LINE_JUNK` may be used to filter out lines without visible
855855
characters, except for at most one splat ('#'). It is recommended
856-
to leave linejunk None; as of Python 2.3, the underlying
857-
SequenceMatcher class has grown an adaptive notion of "noise" lines
858-
that's better than any static definition the author has ever been
859-
able to craft.
856+
to leave linejunk None; the underlying SequenceMatcher class has
857+
an adaptive notion of "noise" lines that's better than any static
858+
definition the author has ever been able to craft.
860859
861860
- `charjunk`: A function that should accept a string of length 1. The
862861
module-level function `IS_CHARACTER_JUNK` may be used to filter out
@@ -1299,17 +1298,18 @@ def ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK):
12991298
Compare `a` and `b` (lists of strings); return a `Differ`-style delta.
13001299
13011300
Optional keyword parameters `linejunk` and `charjunk` are for filter
1302-
functions (or None):
1301+
functions, or can be None:
13031302
1304-
- linejunk: A function that should accept a single string argument, and
1303+
- linejunk: A function that should accept a single string argument and
13051304
return true iff the string is junk. The default is None, and is
1306-
recommended; as of Python 2.3, an adaptive notion of "noise" lines is
1307-
used that does a good job on its own.
1305+
recommended; the underlying SequenceMatcher class has an adaptive
1306+
notion of "noise" lines.
13081307
1309-
- charjunk: A function that should accept a string of length 1. The
1310-
default is module-level function IS_CHARACTER_JUNK, which filters out
1311-
whitespace characters (a blank or tab; note: bad idea to include newline
1312-
in this!).
1308+
- charjunk: A function that accepts a character (string of length
1309+
1), and returns true iff the character is junk. The default is
1310+
the module-level function IS_CHARACTER_JUNK, which filters out
1311+
whitespace characters (a blank or tab; note: it's a bad idea to
1312+
include newline in this!).
13131313
13141314
Tools/scripts/ndiff.py is a command-line front-end to this function.
13151315
@@ -1680,7 +1680,7 @@ def __init__(self,tabsize=8,wrapcolumn=None,linejunk=None,
16801680
tabsize -- tab stop spacing, defaults to 8.
16811681
wrapcolumn -- column number where lines are broken and wrapped,
16821682
defaults to None where lines are not wrapped.
1683-
linejunk,charjunk -- keyword arguments passed into ndiff() (used to by
1683+
linejunk,charjunk -- keyword arguments passed into ndiff() (used by
16841684
HtmlDiff() to generate the side by side HTML differences). See
16851685
ndiff() documentation for argument default values and descriptions.
16861686
"""

0 commit comments

Comments
 (0)