Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 8a9c284

Browse files
committed
Make difflib.ndiff() and difflib.Differ.compare() generators. This
restores the 2.1 ability of Tools/scripts/ndiff.py to start producing output before the entire comparison is complete.
1 parent 380bad1 commit 8a9c284

4 files changed

Lines changed: 84 additions & 70 deletions

File tree

Doc/lib/libdifflib.tex

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ \section{\module{difflib} ---
3232

3333
\begin{classdesc*}{Differ}
3434
This is a class for comparing sequences of lines of text, and
35-
producing human-readable differences or deltas. Differ uses
35+
producing human-readable differences or deltas. Differ uses
3636
\class{SequenceMatcher} both to compare sequences of lines, and to
3737
compare sequences of characters within similar (near-matching)
3838
lines.
@@ -85,7 +85,7 @@ \section{\module{difflib} ---
8585
\begin{funcdesc}{ndiff}{a, b\optional{, linejunk\optional{,
8686
charjunk}}}
8787
Compare \var{a} and \var{b} (lists of strings); return a
88-
\class{Differ}-style delta.
88+
\class{Differ}-style delta (a generator generating the delta lines).
8989

9090
Optional keyword parameters \var{linejunk} and \var{charjunk} are
9191
for filter functions (or \code{None}):
@@ -109,12 +109,12 @@ \section{\module{difflib} ---
109109
... 'ore\ntree\nemu\n'.splitlines(1)))
110110
>>> print ''.join(diff),
111111
- one
112-
? ^
112+
? ^
113113
+ ore
114-
? ^
114+
? ^
115115
- two
116116
- three
117-
? -
117+
? -
118118
+ tree
119119
+ emu
120120
\end{verbatim}
@@ -132,6 +132,7 @@ \section{\module{difflib} ---
132132
\begin{verbatim}
133133
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
134134
... 'ore\ntree\nemu\n'.splitlines(1))
135+
>>> diff = list(diff) # materialize the generated delta into a list
135136
>>> print ''.join(restore(diff, 1)),
136137
one
137138
two
@@ -226,7 +227,7 @@ \subsection{SequenceMatcher Objects \label{sequence-matcher}}
226227
If \var{isjunk} was omitted or \code{None},
227228
\method{get_longest_match()} returns \code{(\var{i}, \var{j},
228229
\var{k})} such that \code{\var{a}[\var{i}:\var{i}+\var{k}]} is equal
229-
to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where
230+
to \code{\var{b}[\var{j}:\var{j}+\var{k}]}, where
230231
\code{\var{alo} <= \var{i} <= \var{i}+\var{k} <= \var{ahi}} and
231232
\code{\var{blo} <= \var{j} <= \var{j}+\var{k} <= \var{bhi}}.
232233
For all \code{(\var{i'}, \var{j'}, \var{k'})} meeting those
@@ -303,7 +304,7 @@ \subsection{SequenceMatcher Objects \label{sequence-matcher}}
303304
deleted. Note that \code{\var{j1} == \var{j2}} in
304305
this case.}
305306
\lineii{'insert'}{\code{\var{b}[\var{j1}:\var{j2}]} should be
306-
inserted at \code{\var{a}[\var{i1}:\var{i1}]}.
307+
inserted at \code{\var{a}[\var{i1}:\var{i1}]}.
307308
Note that \code{\var{i1} == \var{i2}} in this
308309
case.}
309310
\lineii{'equal'}{\code{\var{a}[\var{i1}:\var{i2}] ==
@@ -459,13 +460,14 @@ \subsection{Differ Objects \label{differ-objects}}
459460
method:
460461

461462
\begin{methoddesc}{compare}{a, b}
462-
Compare two sequences of lines; return the resulting delta (list).
463+
Compare two sequences of lines, and generate the delta (a sequence
464+
of lines).
463465

464466
Each sequence must contain individual single-line strings ending
465467
with newlines. Such sequences can be obtained from the
466-
\method{readlines()} method of file-like objects. The list returned
467-
is also made up of newline-terminated strings, and ready to be used
468-
with the \method{writelines()} method of a file-like object.
468+
\method{readlines()} method of file-like objects. The delta generated
469+
also consists of newline-terminated strings, ready to be printed as-is
470+
via the \method{writeline()} method of a file-like object.
469471
\end{methoddesc}
470472

471473

@@ -506,7 +508,7 @@ \subsection{Differ Example \label{differ-examples}}
506508
Finally, we compare the two:
507509

508510
\begin{verbatim}
509-
>>> result = d.compare(text1, text2)
511+
>>> result = list(d.compare(text1, text2))
510512
\end{verbatim}
511513

512514
\code{result} is a list of strings, so let's pretty-print it:

Lib/difflib.py

Lines changed: 47 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
#! /usr/bin/env python
22

3+
from __future__ import generators
4+
35
"""
46
Module difflib -- helpers for computing deltas between objects.
57
@@ -22,8 +24,6 @@
2224
__all__ = ['get_close_matches', 'ndiff', 'restore', 'SequenceMatcher',
2325
'Differ']
2426

25-
TRACE = 0
26-
2727
class SequenceMatcher:
2828

2929
"""
@@ -406,9 +406,6 @@ def find_longest_match(self, alo, ahi, blo, bhi):
406406
a[besti+bestsize] == b[bestj+bestsize]:
407407
bestsize = bestsize + 1
408408

409-
if TRACE:
410-
print "get_matching_blocks", alo, ahi, blo, bhi
411-
print " returns", besti, bestj, bestsize
412409
return besti, bestj, bestsize
413410

414411
def get_matching_blocks(self):
@@ -432,8 +429,6 @@ def get_matching_blocks(self):
432429
la, lb = len(self.a), len(self.b)
433430
self.__helper(0, la, 0, lb, self.matching_blocks)
434431
self.matching_blocks.append( (la, lb, 0) )
435-
if TRACE:
436-
print '*** matching blocks', self.matching_blocks
437432
return self.matching_blocks
438433

439434
# builds list of matching blocks covering a[alo:ahi] and
@@ -694,7 +689,7 @@ class Differ:
694689
695690
Finally, we compare the two:
696691
697-
>>> result = d.compare(text1, text2)
692+
>>> result = list(d.compare(text1, text2))
698693
699694
'result' is a list of strings, so let's pretty-print it:
700695
@@ -731,7 +726,7 @@ class Differ:
731726
Construct a text differencer, with optional filters.
732727
733728
compare(a, b)
734-
Compare two sequences of lines; return the resulting delta (list).
729+
Compare two sequences of lines; generate the resulting delta.
735730
"""
736731

737732
def __init__(self, linejunk=None, charjunk=None):
@@ -753,16 +748,15 @@ def __init__(self, linejunk=None, charjunk=None):
753748

754749
self.linejunk = linejunk
755750
self.charjunk = charjunk
756-
self.results = []
757751

758752
def compare(self, a, b):
759753
r"""
760-
Compare two sequences of lines; return the resulting delta (list).
754+
Compare two sequences of lines; generate the resulting delta.
761755
762756
Each sequence must contain individual single-line strings ending with
763757
newlines. Such sequences can be obtained from the `readlines()` method
764-
of file-like objects. The list returned is also made up of
765-
newline-terminated strings, ready to be used with the `writelines()`
758+
of file-like objects. The delta generated also consists of newline-
759+
terminated strings, ready to be printed as-is via the writeline()
766760
method of a file-like object.
767761
768762
Example:
@@ -783,34 +777,38 @@ def compare(self, a, b):
783777
cruncher = SequenceMatcher(self.linejunk, a, b)
784778
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
785779
if tag == 'replace':
786-
self._fancy_replace(a, alo, ahi, b, blo, bhi)
780+
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
787781
elif tag == 'delete':
788-
self._dump('-', a, alo, ahi)
782+
g = self._dump('-', a, alo, ahi)
789783
elif tag == 'insert':
790-
self._dump('+', b, blo, bhi)
784+
g = self._dump('+', b, blo, bhi)
791785
elif tag == 'equal':
792-
self._dump(' ', a, alo, ahi)
786+
g = self._dump(' ', a, alo, ahi)
793787
else:
794788
raise ValueError, 'unknown tag ' + `tag`
795-
results = self.results
796-
self.results = []
797-
return results
789+
790+
for line in g:
791+
yield line
798792

799793
def _dump(self, tag, x, lo, hi):
800-
"""Store comparison results for a same-tagged range."""
794+
"""Generate comparison results for a same-tagged range."""
801795
for i in xrange(lo, hi):
802-
self.results.append('%s %s' % (tag, x[i]))
796+
yield '%s %s' % (tag, x[i])
803797

804798
def _plain_replace(self, a, alo, ahi, b, blo, bhi):
805799
assert alo < ahi and blo < bhi
806800
# dump the shorter block first -- reduces the burden on short-term
807801
# memory if the blocks are of very different sizes
808802
if bhi - blo < ahi - alo:
809-
self._dump('+', b, blo, bhi)
810-
self._dump('-', a, alo, ahi)
803+
first = self._dump('+', b, blo, bhi)
804+
second = self._dump('-', a, alo, ahi)
811805
else:
812-
self._dump('-', a, alo, ahi)
813-
self._dump('+', b, blo, bhi)
806+
first = self._dump('-', a, alo, ahi)
807+
second = self._dump('+', b, blo, bhi)
808+
809+
for g in first, second:
810+
for line in g:
811+
yield line
814812

815813
def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
816814
r"""
@@ -830,12 +828,6 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
830828
? ^ ^ ^
831829
"""
832830

833-
if TRACE:
834-
self.results.append('*** _fancy_replace %s %s %s %s\n'
835-
% (alo, ahi, blo, bhi))
836-
self._dump('>', a, alo, ahi)
837-
self._dump('<', b, blo, bhi)
838-
839831
# don't synch up unless the lines have a similarity score of at
840832
# least cutoff; best_ratio tracks the best score seen so far
841833
best_ratio, cutoff = 0.74, 0.75
@@ -869,7 +861,8 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
869861
# no non-identical "pretty close" pair
870862
if eqi is None:
871863
# no identical pair either -- treat it as a straight replace
872-
self._plain_replace(a, alo, ahi, b, blo, bhi)
864+
for line in self._plain_replace(a, alo, ahi, b, blo, bhi):
865+
yield line
873866
return
874867
# no close pair, but an identical pair -- synch up on that
875868
best_i, best_j, best_ratio = eqi, eqj, 1.0
@@ -879,14 +872,10 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
879872

880873
# a[best_i] very similar to b[best_j]; eqi is None iff they're not
881874
# identical
882-
if TRACE:
883-
self.results.append('*** best_ratio %s %s %s %s\n'
884-
% (best_ratio, best_i, best_j))
885-
self._dump('>', a, best_i, best_i+1)
886-
self._dump('<', b, best_j, best_j+1)
887875

888876
# pump out diffs from before the synch point
889-
self._fancy_helper(a, alo, best_i, b, blo, best_j)
877+
for line in self._fancy_helper(a, alo, best_i, b, blo, best_j):
878+
yield line
890879

891880
# do intraline marking on the synch pair
892881
aelt, belt = a[best_i], b[best_j]
@@ -908,22 +897,28 @@ def _fancy_replace(self, a, alo, ahi, b, blo, bhi):
908897
btags += ' ' * lb
909898
else:
910899
raise ValueError, 'unknown tag ' + `tag`
911-
self._qformat(aelt, belt, atags, btags)
900+
for line in self._qformat(aelt, belt, atags, btags):
901+
yield line
912902
else:
913903
# the synch pair is identical
914-
self.results.append(' ' + aelt)
904+
yield ' ' + aelt
915905

916906
# pump out diffs from after the synch point
917-
self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi)
907+
for line in self._fancy_helper(a, best_i+1, ahi, b, best_j+1, bhi):
908+
yield line
918909

919910
def _fancy_helper(self, a, alo, ahi, b, blo, bhi):
911+
g = []
920912
if alo < ahi:
921913
if blo < bhi:
922-
self._fancy_replace(a, alo, ahi, b, blo, bhi)
914+
g = self._fancy_replace(a, alo, ahi, b, blo, bhi)
923915
else:
924-
self._dump('-', a, alo, ahi)
916+
g = self._dump('-', a, alo, ahi)
925917
elif blo < bhi:
926-
self._dump('+', b, blo, bhi)
918+
g = self._dump('+', b, blo, bhi)
919+
920+
for line in g:
921+
yield line
927922

928923
def _qformat(self, aline, bline, atags, btags):
929924
r"""
@@ -949,13 +944,13 @@ def _qformat(self, aline, bline, atags, btags):
949944
atags = atags[common:].rstrip()
950945
btags = btags[common:].rstrip()
951946

952-
self.results.append("- " + aline)
947+
yield "- " + aline
953948
if atags:
954-
self.results.append("? %s%s\n" % ("\t" * common, atags))
949+
yield "? %s%s\n" % ("\t" * common, atags)
955950

956-
self.results.append("+ " + bline)
951+
yield "+ " + bline
957952
if btags:
958-
self.results.append("? %s%s\n" % ("\t" * common, btags))
953+
yield "? %s%s\n" % ("\t" * common, btags)
959954

960955
# With respect to junk, an earlier version of ndiff simply refused to
961956
# *start* a match with a junk element. The result was cases like this:
@@ -1050,7 +1045,7 @@ def ndiff(a, b, linejunk=IS_LINE_JUNK, charjunk=IS_CHARACTER_JUNK):
10501045

10511046
def restore(delta, which):
10521047
r"""
1053-
Return one of the two sequences that generated a delta.
1048+
Generate one of the two sequences that generated a delta.
10541049
10551050
Given a `delta` produced by `Differ.compare()` or `ndiff()`, extract
10561051
lines originating from file 1 or 2 (parameter `which`), stripping off line
@@ -1060,6 +1055,7 @@ def restore(delta, which):
10601055
10611056
>>> diff = ndiff('one\ntwo\nthree\n'.splitlines(1),
10621057
... 'ore\ntree\nemu\n'.splitlines(1))
1058+
>>> diff = list(diff)
10631059
>>> print ''.join(restore(diff, 1)),
10641060
one
10651061
two
@@ -1075,11 +1071,9 @@ def restore(delta, which):
10751071
raise ValueError, ('unknown delta choice (must be 1 or 2): %r'
10761072
% which)
10771073
prefixes = (" ", tag)
1078-
results = []
10791074
for line in delta:
10801075
if line[:2] in prefixes:
1081-
results.append(line[2:])
1082-
return results
1076+
yield line[2:]
10831077

10841078
def _test():
10851079
import doctest, difflib

Misc/NEWS

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Core
3030

3131
- In 2.2a3, __new__ would only see sequential arguments passed to the
3232
type in a constructor call; __init__ would see both sequential and
33-
positional arguments. This made no sense whatsoever any more, so
33+
keyword arguments. This made no sense whatsoever any more, so
3434
now both __new__ and __init__ see all arguments.
3535

3636
- In 2.2a3, hash() applied to an instance of a subclass of str or unicode
@@ -54,6 +54,10 @@ Core
5454

5555
Library
5656

57+
- difflib.ndiff() and difflib.Differ.compare() are generators now. This
58+
restores the ability of Tools/scripts/ndiff.py to start producing output
59+
before the entire comparison is complete.
60+
5761
- StringIO.StringIO instances and cStringIO.StringIO instances support
5862
iteration just like file objects (i.e. their .readline() method is
5963
called for each iteration until it returns an empty string).
@@ -124,10 +128,25 @@ New platforms
124128

125129
Tests
126130

131+
- The "classic" standard tests, which work by comparing stdout to
132+
an expected-output file under Lib/test/output/, no longer stop at
133+
the first mismatch. Instead the test is run to completion, and a
134+
variant of ndiff-style comparison is used to report all differences.
135+
This is much easier to understand than the previous style of reporting.
136+
137+
- The unittest-based standard tests now use regrtest's test_main()
138+
convention, instead of running as a side-effect of merely being
139+
imported. This allows these tests to be run in more natural and
140+
flexible ways as unittests, outside the regrtest framework.
141+
142+
- regrtest.py is much better integrated with unittest and doctest now,
143+
especially in regard to reporting errors.
144+
127145
Windows
128146

129147
- Large file support now also works for files > 4GB, on filesystems
130-
that support it (NTFS under Windows 2000).
148+
that support it (NTFS under Windows 2000). See "What's New in
149+
Python 2.2a3" for more detail.
131150

132151

133152
What's New in Python 2.2a3?

Tools/scripts/ndiff.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,8 @@ def fcompare(f1name, f2name):
7373

7474
a = f1.readlines(); f1.close()
7575
b = f2.readlines(); f2.close()
76-
77-
diff = difflib.ndiff(a, b)
78-
sys.stdout.writelines(diff)
76+
for line in difflib.ndiff(a, b):
77+
print line,
7978

8079
return 1
8180

0 commit comments

Comments
 (0)