Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 53c58f8

Browse files
committed
Forward port sorting howto
1 parent 9707fd2 commit 53c58f8

2 files changed

Lines changed: 282 additions & 0 deletions

File tree

Doc/howto/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Currently, the HOWTOs are:
2121
functional.rst
2222
regex.rst
2323
sockets.rst
24+
sorting.rst
2425
unicode.rst
2526
urllib2.rst
2627
webservers.rst

Doc/howto/sorting.rst

Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
Sorting HOW TO
2+
**************
3+
4+
:Author: Andrew Dalke and Raymond Hettinger
5+
:Release: 0.1
6+
7+
8+
Python lists have a built-in :meth:`list.sort` method that modifies the list
9+
in-place and a :func:`sorted` built-in function that builds a new sorted list
10+
from an iterable.
11+
12+
In this document, we explore the various techniques for sorting data using Python.
13+
14+
15+
Sorting Basics
16+
==============
17+
18+
A simple ascending sort is very easy: just call the :func:`sorted` function. It
19+
returns a new sorted list::
20+
21+
>>> sorted([5, 2, 3, 1, 4])
22+
[1, 2, 3, 4, 5]
23+
24+
You can also use the :meth:`list.sort` method of a list. It modifies the list
25+
in-place (and returns *None* to avoid confusion). Usually it's less convenient
26+
than :func:`sorted` - but if you don't need the original list, it's slightly
27+
more efficient.
28+
29+
>>> a = [5, 2, 3, 1, 4]
30+
>>> a.sort()
31+
>>> a
32+
[1, 2, 3, 4, 5]
33+
34+
Another difference is that the :meth:`list.sort` method is only defined for
35+
lists. In contrast, the :func:`sorted` function accepts any iterable.
36+
37+
>>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'})
38+
[1, 2, 3, 4, 5]
39+
40+
Key Functions
41+
=============
42+
43+
Both :meth:`list.sort` and :func:`sorted` have *key* parameter to specify a
44+
function to be called on each list element prior to making comparisons.
45+
46+
For example, here's a case-insensitive string comparison:
47+
48+
>>> sorted("This is a test string from Andrew".split(), key=str.lower)
49+
['a', 'Andrew', 'from', 'is', 'string', 'test', 'This']
50+
51+
The value of the *key* parameter should be a function that takes a single argument
52+
and returns a key to use for sorting purposes. This technique is fast because
53+
the key function is called exactly once for each input record.
54+
55+
A common pattern is to sort complex objects using some of the object's indices
56+
as keys. For example:
57+
58+
>>> student_tuples = [
59+
('john', 'A', 15),
60+
('jane', 'B', 12),
61+
('dave', 'B', 10),
62+
]
63+
>>> sorted(student_tuples, key=lambda student: student[2]) # sort by age
64+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
65+
66+
The same technique works for objects with named attributes. For example:
67+
68+
>>> class Student:
69+
def __init__(self, name, grade, age):
70+
self.name = name
71+
self.grade = grade
72+
self.age = age
73+
def __repr__(self):
74+
return repr((self.name, self.grade, self.age))
75+
76+
>>> student_objects = [
77+
Student('john', 'A', 15),
78+
Student('jane', 'B', 12),
79+
Student('dave', 'B', 10),
80+
]
81+
>>> sorted(student_objects, key=lambda student: student.age) # sort by age
82+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
83+
84+
Operator Module Functions
85+
=========================
86+
87+
The key-function patterns shown above are very common, so Python provides
88+
convenience functions to make accessor functions easier and faster. The operator
89+
module has :func:`operator.itemgetter`, :func:`operator.attrgetter`, and
90+
an :func:`operator.methodcaller` function.
91+
92+
Using those functions, the above examples become simpler and faster:
93+
94+
>>> from operator import itemgetter, attrgetter
95+
96+
>>> sorted(student_tuples, key=itemgetter(2))
97+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
98+
99+
>>> sorted(student_objects, key=attrgetter('age'))
100+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
101+
102+
The operator module functions allow multiple levels of sorting. For example, to
103+
sort by *grade* then by *age*:
104+
105+
>>> sorted(student_tuples, key=itemgetter(1,2))
106+
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
107+
108+
>>> sorted(student_objects, key=attrgetter('grade', 'age'))
109+
[('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)]
110+
111+
Ascending and Descending
112+
========================
113+
114+
Both :meth:`list.sort` and :func:`sorted` accept a *reverse* parameter with a
115+
boolean value. This is using to flag descending sorts. For example, to get the
116+
student data in reverse *age* order:
117+
118+
>>> sorted(student_tuples, key=itemgetter(2), reverse=True)
119+
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
120+
121+
>>> sorted(student_objects, key=attrgetter('age'), reverse=True)
122+
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
123+
124+
Sort Stability and Complex Sorts
125+
================================
126+
127+
Sorts are guaranteed to be `stable
128+
<http://en.wikipedia.org/wiki/Sorting_algorithm#Stability>`_\. That means that
129+
when multiple records have the same key, their original order is preserved.
130+
131+
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
132+
>>> sorted(data, key=itemgetter(0))
133+
[('blue', 1), ('blue', 2), ('red', 1), ('red', 2)]
134+
135+
Notice how the two records for *blue* retain their original order so that
136+
``('blue', 1)`` is guaranteed to precede ``('blue', 2)``.
137+
138+
This wonderful property lets you build complex sorts in a series of sorting
139+
steps. For example, to sort the student data by descending *grade* and then
140+
ascending *age*, do the *age* sort first and then sort again using *grade*:
141+
142+
>>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key
143+
>>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending
144+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
145+
146+
The `Timsort <http://en.wikipedia.org/wiki/Timsort>`_ algorithm used in Python
147+
does multiple sorts efficiently because it can take advantage of any ordering
148+
already present in a dataset.
149+
150+
The Old Way Using Decorate-Sort-Undecorate
151+
==========================================
152+
153+
This idiom is called Decorate-Sort-Undecorate after its three steps:
154+
155+
* First, the initial list is decorated with new values that control the sort order.
156+
157+
* Second, the decorated list is sorted.
158+
159+
* Finally, the decorations are removed, creating a list that contains only the
160+
initial values in the new order.
161+
162+
For example, to sort the student data by *grade* using the DSU approach:
163+
164+
>>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)]
165+
>>> decorated.sort()
166+
>>> [student for grade, i, student in decorated] # undecorate
167+
[('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)]
168+
169+
This idiom works because tuples are compared lexicographically; the first items
170+
are compared; if they are the same then the second items are compared, and so
171+
on.
172+
173+
It is not strictly necessary in all cases to include the index *i* in the
174+
decorated list, but including it gives two benefits:
175+
176+
* The sort is stable -- if two items have the same key, their order will be
177+
preserved in the sorted list.
178+
179+
* The original items do not have to be comparable because the ordering of the
180+
decorated tuples will be determined by at most the first two items. So for
181+
example the original list could contain complex numbers which cannot be sorted
182+
directly.
183+
184+
Another name for this idiom is
185+
`Schwartzian transform <http://en.wikipedia.org/wiki/Schwartzian_transform>`_\,
186+
after Randal L. Schwartz, who popularized it among Perl programmers.
187+
188+
Now that Python sorting provides key-functions, this technique is not often needed.
189+
190+
191+
The Old Way Using the *cmp* Parameter
192+
=====================================
193+
194+
Many constructs given in this HOWTO assume Python 2.4 or later. Before that,
195+
there was no :func:`sorted` builtin and :meth:`list.sort` took no keyword
196+
arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to
197+
handle user specified comparison functions.
198+
199+
In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to
200+
simplify and unify the language, eliminating the conflict between rich
201+
comparisons and the :meth:`__cmp__` magic method).
202+
203+
In Py2.x, sort allowed an optional function which can be called for doing the
204+
comparisons. That function should take two arguments to be compared and then
205+
return a negative value for less-than, return zero if they are equal, or return
206+
a positive value for greater-than. For example, we can do:
207+
208+
>>> def numeric_compare(x, y):
209+
return x - y
210+
>>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare)
211+
[1, 2, 3, 4, 5]
212+
213+
Or you can reverse the order of comparison with:
214+
215+
>>> def reverse_numeric(x, y):
216+
return y - x
217+
>>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric)
218+
[5, 4, 3, 2, 1]
219+
220+
When porting code from Python 2.x to 3.x, the situation can arise when you have
221+
the user supplying a comparison function and you need to convert that to a key
222+
function. The following wrapper makes that easy to do::
223+
224+
def cmp_to_key(mycmp):
225+
'Convert a cmp= function into a key= function'
226+
class K(object):
227+
def __init__(self, obj, *args):
228+
self.obj = obj
229+
def __lt__(self, other):
230+
return mycmp(self.obj, other.obj) < 0
231+
def __gt__(self, other):
232+
return mycmp(self.obj, other.obj) > 0
233+
def __eq__(self, other):
234+
return mycmp(self.obj, other.obj) == 0
235+
def __le__(self, other):
236+
return mycmp(self.obj, other.obj) <= 0
237+
def __ge__(self, other):
238+
return mycmp(self.obj, other.obj) >= 0
239+
def __ne__(self, other):
240+
return mycmp(self.obj, other.obj) != 0
241+
return K
242+
243+
To convert to a key function, just wrap the old comparison function:
244+
245+
>>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric))
246+
[5, 4, 3, 2, 1]
247+
248+
In Python 3.2, the :func:`functools.cmp_to_key` function was added to the
249+
functools module in the standard library.
250+
251+
Odd and Ends
252+
============
253+
254+
* For locale aware sorting, use :func:`locale.strxfrm` for a key function or
255+
:func:`locale.strcoll` for a comparison function.
256+
257+
* The *reverse* parameter still maintains sort stability (i.e. records with
258+
equal keys retain the original order). Interestingly, that effect can be
259+
simulated without the parameter by using the builtin :func:`reversed` function
260+
twice:
261+
262+
>>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)]
263+
>>> assert sorted(data, reverse=True) == list(reversed(sorted(reversed(data))))
264+
265+
* The sort routines are guaranteed to use :meth:`__lt__` when making comparisons
266+
between two objects. So, it is easy to add a standard sort order to a class by
267+
defining an :meth:`__lt__` method::
268+
269+
>>> Student.__lt__ = lambda self, other: self.age < other.age
270+
>>> sorted(student_objects)
271+
[('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)]
272+
273+
* Key functions need not depend directly on the objects being sorted. A key
274+
function can also access external resources. For instance, if the student grades
275+
are stored in a dictionary, they can be used to sort a separate list of student
276+
names:
277+
278+
>>> students = ['dave', 'john', 'jane']
279+
>>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'}
280+
>>> sorted(students, key=newgrades.__getitem__)
281+
['jane', 'dave', 'john']

0 commit comments

Comments
 (0)