Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 02d219b

Browse files
committed
Merge branch 'docs'
2 parents 84eedc5 + ec51136 commit 02d219b

11 files changed

Lines changed: 374 additions & 73 deletions

File tree

README.rst

Lines changed: 0 additions & 51 deletions
This file was deleted.

README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
doc/source/intro.rst

doc/source/api.rst

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
.. _api-label:
2+
3+
#############
4+
API Reference
5+
#############
6+
7+
***********************
8+
Mapped Memory Managers
9+
***********************
10+
11+
.. automodule:: smmap.mman
12+
:members:
13+
:undoc-members:
14+
15+
*******
16+
Buffers
17+
*******
18+
19+
.. automodule:: smmap.buf
20+
:members:
21+
:undoc-members:
22+
23+
**********
24+
Exceptions
25+
**********
26+
27+
.. automodule:: smmap.exc
28+
:members:
29+
:undoc-members:
30+
31+
*********
32+
Utilities
33+
*********
34+
35+
.. automodule:: smmap.util
36+
:members:
37+
:undoc-members:
38+
39+
40+
41+
42+

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
# If extensions (or modules to document with autodoc) are in another directory,
1717
# add these directories to sys.path here. If the directory is relative to the
1818
# documentation root, use os.path.abspath to make it absolute, like shown here.
19-
#sys.path.append(os.path.abspath('.'))
19+
sys.path.append(os.path.abspath('../../'))
2020

2121
# -- General configuration -----------------------------------------------------
2222

doc/source/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ Contents:
1212
.. toctree::
1313
:maxdepth: 2
1414

15+
intro
16+
tutorial
17+
api
1518
changes
1619

1720
Indices and tables

doc/source/intro.rst

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
###########
2+
Motivation
3+
###########
4+
When reading from many possibly large files in a fashion similar to random access, it is usually the fastest and most efficient to use memory maps.
5+
6+
Although memory maps have many advantages, they represent a very limited system resource as every map uses one file descriptor, whose amount is limited per process. On 32 bit systems, the amount of memory you can have mapped at a time is naturally limited to theoretical 4GB of memory, which may not be enough for some applications.
7+
8+
########
9+
Overview
10+
########
11+
12+
Smmap wraps an interface around mmap and tracks the mapped files as well as the amount of clients who use it. If the system runs out of resources, or if a memory limit is reached, it will automatically unload unused maps to allow continued operation.
13+
14+
To allow processing large files even on 32 bit systems, it allows only portions of the file to be mapped. Once the user reads beyond the mapped region, smmap will automatically map the next required region, unloading unused regions using a LRU algorithm.
15+
16+
The interface also works around the missing offset parameter in python implementations up to python 2.5.
17+
18+
Although the library can be used most efficiently with its native interface, a Buffer implementation is provided to hide these details behind a simple string-like interface.
19+
20+
For performance critical 64 bit applications, a simplified version of memory mapping is provided which always maps the whole file, but still provides the benefit of unloading unused mappings on demand.
21+
22+
#############
23+
Prerequisites
24+
#############
25+
* Python 2.4, 2.5 or 2.6
26+
* OSX, Windows or Linux
27+
28+
The package was tested on all of the previously mentioned configurations.
29+
30+
###########
31+
Limitations
32+
###########
33+
* The memory access is read-only by design.
34+
* In python below 2.6, memory maps will be created in compatibility mode which works, but creates inefficient memory mappings as they always start at offset 0.
35+
* It wasn't tested on python 2.7 and 3.x.
36+
37+
################
38+
Installing smmap
39+
################
40+
Its easiest to install smmap using the *easy_install* or *pip* program, which is part of the `setuptools`_ or `pip`_ respectively::
41+
42+
$ easy_install smmap
43+
# or
44+
$ pip install smmap
45+
46+
As the command will install smmap in your respective python distribution, you will most likely need root permissions to authorize the required changes.
47+
48+
If you have downloaded the source archive, the package can be installed by running the ``setup.py`` script::
49+
50+
$ python setup.py install
51+
52+
It is advised to have a look at the :ref:`Usage Guide <tutorial-label>` for a brief introduction on the different database implementations.
53+
54+
##################
55+
Homepage and Links
56+
##################
57+
The project is home on github at `https://github.com/Byron/smmap <https://github.com/Byron/smmap>`_.
58+
59+
The latest source can be cloned from github as well:
60+
61+
* git://github.com/gitpython-developers/smmap.git
62+
63+
64+
For support, please use the git-python mailing list:
65+
66+
* http://groups.google.com/group/git-python
67+
68+
69+
Issues can be filed on github:
70+
71+
* https://github.com/Byron/smmap/issues
72+
73+
###################
74+
License Information
75+
###################
76+
*smmap* is licensed under the New BSD License.
77+
78+
.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools
79+
.. _pip: http://www.pip-installer.org/en/latest/

doc/source/tutorial.rst

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
.. _tutorial-label:
2+
3+
###########
4+
Usage Guide
5+
###########
6+
This text briefly introduces you to the basic design decisions and accompanying classes.
7+
8+
******
9+
Design
10+
******
11+
Per application, there is *MemoryManager* which is held as static instance and used throughout the application. It can be configured to keep your resources within certain limits.
12+
13+
To access mapped regions, you require a cursor. Cursors point to exactly one file and serve as handles into it. As long as it exists, the respective memory region will remain available.
14+
15+
For convenience, a buffer implementation is provided which handles cursors and resource allocation behind its simple buffer like interface.
16+
17+
***************
18+
Memory Managers
19+
***************
20+
There are two types of memory managers, one uses *static* windows, the other one uses *sliding* windows. A window is a region of a file mapped into memory. Although the names might be somewhat misleading as technically windows are always static, the *sliding* version will allocate relatively small windows whereas the *static* version will always map the whole file.
21+
22+
The *static* manager does nothing more than keeping a client count on the respective memory maps which always map the whole file, which allows to make some assumptions that can lead to simplified data access and increased performance, but reduces the compatibility to 32 bit systems or giant files.
23+
24+
The *sliding* memory manager therefore should be the default manager when preparing an application for handling huge amounts of data on 32 bit and 64 bit platforms::
25+
26+
import smmap
27+
# This instance should be globally available in your application
28+
# It is configured to be well suitable for 32-bit or 64 bit applications.
29+
mman = smmap.SlidingWindowMapManager()
30+
31+
# the manager provides much useful information about its current state
32+
# like the amount of open file handles or the amount of mapped memory
33+
mman.num_file_handles()
34+
mman.mapped_memory_size()
35+
# and many more ...
36+
37+
38+
Cursors
39+
*******
40+
*Cursors* are handles that point onto a window, i.e. a region of a file mapped into memory. From them you may obtain a buffer through which the data of that window can actually be accessed::
41+
42+
import smmap.test.lib
43+
fc = smmap.test.lib.FileCreator(1024*1024*8, "test_file")
44+
45+
# obtain a cursor to access some file.
46+
c = mman.make_cursor(fc.path)
47+
48+
# the cursor is now associated with the file, but not yet usable
49+
assert c.is_associated()
50+
assert not c.is_valid()
51+
52+
# before you can use the cursor, you have to specify a window you want to
53+
# access. The following just says you want as much data as possible starting
54+
# from offset 0.
55+
# To be sure your region could be mapped, query for validity
56+
assert c.use_region().is_valid() # use_region returns self
57+
58+
# once a region was mapped, you must query its dimension regularly
59+
# to assure you don't try to access its buffer out of its bounds
60+
assert c.size()
61+
c.buffer()[0] # first byte
62+
c.buffer()[1:10] # first 9 bytes
63+
c.buffer()[c.size()-1] # last byte
64+
65+
# its recommended not to create big slices when feeding the buffer
66+
# into consumers (e.g. struct or zlib).
67+
# Instead, either give the buffer directly, or use pythons buffer command.
68+
buffer(c.buffer(), 1, 9) # first 9 bytes without copying them
69+
70+
# you can query absolute offsets, and check whether an offset is included
71+
# in the cursor's data.
72+
assert c.ofs_begin() < c.ofs_end()
73+
assert c.includes_ofs(100)
74+
75+
# If you are over out of bounds with one of your region requests, the
76+
# cursor will be come invalid. It cannot be used in that state
77+
assert not c.use_region(fc.size, 100).is_valid()
78+
# map as much as possible after skipping the first 100 bytes
79+
assert c.use_region(100).is_valid()
80+
81+
# You can explicitly free cursor resources by unusing the cursor's region
82+
c.unuse_region()
83+
assert not c.is_valid()
84+
85+
86+
Now you would have to write your algorithms around this interface to properly slide through huge amounts of data.
87+
88+
Alternatively you can use a convenience interface.
89+
90+
*******
91+
Buffers
92+
*******
93+
To make first use easier, at the expense of performance, there is a Buffer implementation which uses a cursor underneath.
94+
95+
With it, you can access all data in a possibly huge file without having to take care of setting the cursor to different regions yourself::
96+
97+
# Create a default buffer which can operate on the whole file
98+
buf = smmap.SlidingWindowMapBuffer(mman.make_cursor(fc.path))
99+
100+
# you can use it right away
101+
assert buf.cursor().is_valid()
102+
103+
buf[0] # access the first byte
104+
buf[-1] # access the last ten bytes on the file
105+
buf[-10:]# access the last ten bytes
106+
107+
# If you want to keep the instance between different accesses, use the
108+
# dedicated methods
109+
buf.end_access()
110+
assert not buf.cursor().is_valid() # you cannot use the buffer anymore
111+
assert buf.begin_access(offset=10) # start using the buffer at an offset
112+
113+
# it will stop using resources automatically once it goes out of scope
114+
115+
Disadvantages
116+
*************
117+
Buffers cannot be used in place of strings or maps, hence you have to slice them to have valid input for the sorts of struct and zlib. A slice means a lot of data handling overhead which makes buffers slower compared to using cursors directly.
118+

smmap/buf.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ def __len__(self):
4747
def __getitem__(self, i):
4848
c = self._c
4949
assert c.is_valid()
50+
if i < 0:
51+
i = self._size + i
5052
if not c.includes_ofs(i):
5153
c.use_region(i, 1)
5254
# END handle region usage
@@ -57,6 +59,12 @@ def __getslice__(self, i, j):
5759
# fast path, slice fully included - safes a concatenate operation and
5860
# should be the default
5961
assert c.is_valid()
62+
if i < 0:
63+
i = self._size + i
64+
if j == sys.maxint:
65+
j = self._size
66+
if j < 0:
67+
j = self._size + j
6068
if (c.ofs_begin() <= i) and (j < c.ofs_end()):
6169
b = c.ofs_begin()
6270
return c.buffer()[i-b:j-b]
@@ -68,6 +76,7 @@ def __getslice__(self, i, j):
6876
md = str()
6977
while l:
7078
c.use_region(ofs, l)
79+
assert c.is_valid()
7180
d = c.buffer()[:l]
7281
ofs += len(d)
7382
l -= len(d)
@@ -102,6 +111,7 @@ def begin_access(self, cursor = None, offset = 0, size = sys.maxint, flags = 0):
102111
self._size = size
103112
#END set size
104113
return res
114+
# END use our cursor
105115
return False
106116

107117
def end_access(self):

0 commit comments

Comments
 (0)