Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 83d7c02

Browse files
committed
Add Roberto de Almeida's Arrayterator.
1 parent c05d81d commit 83d7c02

4 files changed

Lines changed: 206 additions & 15 deletions

File tree

THANKS.txt

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,39 @@
11
Travis Oliphant for the majority of code adaptation
2-
Jim Hugunin, Paul Dubois, Konrad Hinsen, David Ascher, and many others for
2+
Jim Hugunin, Paul Dubois, Konrad Hinsen, David Ascher, and many others for
33
Numeric on which the code is based.
44
Perry Greenfield, J Todd Miller, Rick White, Paul Barrett for Numarray
55
which gave much inspiration and showed the way forward.
66
Paul Dubois for original Masked Arrays
77
Pearu Peterson for f2py and numpy.distutils and help with code organization
8-
Robert Kern for mtrand, bug fixes, help with distutils, code organization,
9-
and much more.
8+
Robert Kern for mtrand, bug fixes, help with distutils, code organization,
9+
and much more.
1010
Eric Jones for sundry subroutines
1111
Fernando Perez for code snippets, ideas, bugfixes, and testing.
1212
Ed Schofield for matrix.py patches, bugfixes, testing, and docstrings.
1313
Robert Cimrman for array set operations and numpy.distutils help
1414
John Hunter for code snippets (from matplotlib)
1515
Chris Hanley for help with records.py, testing, and bug fixes.
16-
Travis Vaught, Joe Cooper, Jeff Strunk for administration of
17-
numpy.org web site and SVN
16+
Travis Vaught, Joe Cooper, Jeff Strunk for administration of
17+
numpy.org web site and SVN
1818
Eric Firing for bugfixes.
1919
Arnd Baecker for 64-bit testing
2020
David Cooke for many code improvements including the auto-generated C-API,
2121
and optimizations.
22-
Alexander Belopolsky (Sasha) for Masked array bug-fixes and tests,
22+
Alexander Belopolsky (Sasha) for Masked array bug-fixes and tests,
2323
rank-0 array improvements, scalar math help and other code additions
24-
Francesc Altet for unicode and nested record tests
25-
and much help with rooting out nested record array bugs.
26-
Tim Hochberg for getting the build working on MSVC, optimization
24+
Francesc Altet for unicode and nested record tests
25+
and much help with rooting out nested record array bugs.
26+
Tim Hochberg for getting the build working on MSVC, optimization
2727
improvements, and code review
28-
Charles Harris for the sorting code originally written for Numarray and
28+
Charles Harris for the sorting code originally written for Numarray and
2929
for improvements to polyfit, many bug fixes, and documentation strings.
30-
A.M. Archibald for no-copy-reshape code.
31-
David Huard for histogram improvements including 2-d and d-d code and
32-
other bug-fixes.
33-
Albert Strasheim for documentation, bug-fixes, regression tests and
30+
A.M. Archibald for no-copy-reshape code.
31+
David Huard for histogram improvements including 2-d and d-d code and
32+
other bug-fixes.
33+
Albert Strasheim for documentation, bug-fixes, regression tests and
3434
Valgrind expertise.
3535
Stefan van der Walt for documentation, bug-fixes and regression-tests.
36-
Andrew Straw for help with http://www.scipy.org, documentation, and testing.
36+
Andrew Straw for help with http://www.scipy.org, documentation, and testing.
3737
David Cournapeau for scons build support, doc-and-bug fixes, and code contributions including fast_clipping.
3838
Pierre Gerard-Marchant for his rewrite of the masked array functionality.
39+
Roberto de Almeida for the buffered array iterator.

numpy/lib/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
from io import *
2020
from financial import *
2121
import math
22+
from arrayterator import *
2223

2324
__all__ = ['emath','math']
2425
__all__ += type_check.__all__

numpy/lib/arrayterator.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
"""
2+
A buffered iterator for big arrays.
3+
4+
This module solves the problem of iterating over a big file-based array
5+
without having to read it into memory. The ``Arrayterator`` class wraps
6+
an array object, and when iterated it will return subarrays with at most
7+
``buf_size`` elements.
8+
9+
The algorithm works by first finding a "running dimension", along which
10+
the blocks will be extracted. Given an array of dimensions (d1, d2, ...,
11+
dn), eg, if ``buf_size`` is smaller than ``d1`` the first dimension will
12+
be used. If, on the other hand,
13+
14+
d1 < buf_size < d1*d2
15+
16+
the second dimension will be used, and so on. Blocks are extracted along
17+
this dimension, and when the last block is returned the process continues
18+
from the next dimension, until all elements have been read.
19+
20+
"""
21+
22+
from __future__ import division
23+
24+
from operator import mul
25+
26+
__all__ = ['Arrayterator']
27+
28+
class Arrayterator(object):
29+
"""
30+
Buffered iterator for big arrays.
31+
32+
This class creates a buffered iterator for reading big arrays in small
33+
contiguous blocks. The class is useful for objects stored in the
34+
filesystem. It allows iteration over the object *without* reading
35+
everything in memory; instead, small blocks are read and iterated over.
36+
37+
The class can be used with any object that supports multidimensional
38+
slices, like variables from Scientific.IO.NetCDF, pynetcdf and ndarrays.
39+
40+
"""
41+
42+
def __init__(self, var, buf_size=None):
43+
self.var = var
44+
self.buf_size = buf_size
45+
46+
self.start = [0 for dim in var.shape]
47+
self.stop = [dim for dim in var.shape]
48+
self.step = [1 for dim in var.shape]
49+
50+
def __getattr__(self, attr):
51+
return getattr(self.var, attr)
52+
53+
def __getitem__(self, index):
54+
"""
55+
Return a new arrayterator.
56+
57+
"""
58+
# Fix index, handling ellipsis and incomplete slices.
59+
if not isinstance(index, tuple): index = (index,)
60+
fixed = []
61+
length, dims = len(index), len(self.shape)
62+
for slice_ in index:
63+
if slice_ is Ellipsis:
64+
fixed.extend([slice(None)] * (dims-length+1))
65+
length = len(fixed)
66+
elif isinstance(slice_, (int, long)):
67+
fixed.append(slice(slice_, slice_+1, 1))
68+
else:
69+
fixed.append(slice_)
70+
index = tuple(fixed)
71+
if len(index) < dims:
72+
index += (slice(None),) * (dims-len(index))
73+
74+
# Return a new arrayterator object.
75+
out = self.__class__(self.var, self.buf_size)
76+
for i, (start, stop, step, slice_) in enumerate(
77+
zip(self.start, self.stop, self.step, index)):
78+
out.start[i] = start + (slice_.start or 0)
79+
out.step[i] = step * (slice_.step or 1)
80+
out.stop[i] = start + (slice_.stop or stop-start)
81+
out.stop[i] = min(stop, out.stop[i])
82+
return out
83+
84+
def __array__(self):
85+
"""
86+
Return corresponding data.
87+
88+
"""
89+
slice_ = tuple(slice(*t) for t in zip(
90+
self.start, self.stop, self.step))
91+
return self.var[slice_]
92+
93+
@property
94+
def flat(self):
95+
for block in self:
96+
for value in block.flat:
97+
yield value
98+
99+
@property
100+
def shape(self):
101+
return tuple(((stop-start-1)//step+1) for start, stop, step in
102+
zip(self.start, self.stop, self.step))
103+
104+
def __iter__(self):
105+
# Skip arrays with degenerate dimensions
106+
if [dim for dim in self.shape if dim <= 0]: raise StopIteration
107+
108+
start = self.start[:]
109+
stop = self.stop[:]
110+
step = self.step[:]
111+
ndims = len(self.var.shape)
112+
113+
while 1:
114+
count = self.buf_size or reduce(mul, self.shape)
115+
116+
# iterate over each dimension, looking for the
117+
# running dimension (ie, the dimension along which
118+
# the blocks will be built from)
119+
rundim = 0
120+
for i in range(ndims-1, -1, -1):
121+
# if count is zero we ran out of elements to read
122+
# along higher dimensions, so we read only a single position
123+
if count == 0:
124+
stop[i] = start[i]+1
125+
elif count <= self.shape[i]: # limit along this dimension
126+
stop[i] = start[i] + count*step[i]
127+
rundim = i
128+
else:
129+
stop[i] = self.stop[i] # read everything along this
130+
# dimension
131+
stop[i] = min(self.stop[i], stop[i])
132+
count = count//self.shape[i]
133+
134+
# yield a block
135+
slice_ = tuple(slice(*t) for t in zip(start, stop, step))
136+
yield self.var[slice_]
137+
138+
# Update start position, taking care of overflow to
139+
# other dimensions
140+
start[rundim] = stop[rundim] # start where we stopped
141+
for i in range(ndims-1, 0, -1):
142+
if start[i] >= self.stop[i]:
143+
start[i] = self.start[i]
144+
start[i-1] += self.step[i-1]
145+
if start[0] >= self.stop[0]:
146+
raise StopIteration
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
from operator import mul
2+
3+
import numpy as np
4+
from numpy.random import randint
5+
from numpy.lib import Arrayterator
6+
7+
def test():
8+
np.random.seed(np.arange(10))
9+
10+
# Create a random array
11+
ndims = randint(5)+1
12+
shape = tuple(randint(10)+1 for dim in range(ndims))
13+
els = reduce(mul, shape)
14+
a = np.arange(els)
15+
a.shape = shape
16+
17+
buf_size = randint(2*els)
18+
b = Arrayterator(a, buf_size)
19+
20+
# Check that each block has at most ``buf_size`` elements
21+
for block in b:
22+
assert len(block.flat) <= (buf_size or els)
23+
24+
# Check that all elements are iterated correctly
25+
assert list(b.flat) == list(a.flat)
26+
27+
# Slice arrayterator
28+
start = [randint(dim) for dim in shape]
29+
stop = [randint(dim)+1 for dim in shape]
30+
step = [randint(dim)+1 for dim in shape]
31+
slice_ = tuple(slice(*t) for t in zip(start, stop, step))
32+
c = b[slice_]
33+
d = a[slice_]
34+
35+
# Check that each block has at most ``buf_size`` elements
36+
for block in c:
37+
assert len(block.flat) <= (buf_size or els)
38+
39+
# Check that the arrayterator is sliced correctly
40+
assert np.all(c.__array__() == d)
41+
42+
# Check that all elements are iterated correctly
43+
assert list(c.flat) == list(d.flat)

0 commit comments

Comments
 (0)