Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-134004: Dbm vacuuming #134028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jun 1, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7505713
Added tests for vacuuming functionality of dbm
Andrea-Oliveri May 14, 2025
1147774
Added vacuuming logic to dbm.sqlite
Andrea-Oliveri May 14, 2025
109a378
Added vacuuming logic to dbm.dumb
Andrea-Oliveri May 14, 2025
cdacb53
Updated documentation of dbm
Andrea-Oliveri May 14, 2025
02a7b8a
Adapted vacuum tests to allow for submodules missing method
Andrea-Oliveri May 14, 2025
dcb43a2
Pushing news and acks entries
Andrea-Oliveri May 15, 2025
89fb2db
Changed News entry to avoid failure during Doc testing due to referen…
Andrea-Oliveri May 15, 2025
476dc55
Changed method names from .vacuum to .reorganize in dbm.sqlite and db…
Andrea-Oliveri May 15, 2025
19c0c8d
Added .reorganize() method in shelve to expose dbm submodule's own .r…
Andrea-Oliveri May 15, 2025
88b4014
Added documentation for shelve.reorganize
Andrea-Oliveri May 15, 2025
5c1d45f
Fixed link in doc
Andrea-Oliveri May 15, 2025
992e7aa
Updated news
Andrea-Oliveri May 15, 2025
b96480b
PR review: removed unnecessary .keys()
Andrea-Oliveri May 15, 2025
4c23b64
Updated documentation to correct notes indentation
Andrea-Oliveri May 17, 2025
8a80977
Left previously removed comment as requested in PR
Andrea-Oliveri May 17, 2025
166a553
Modified documentation of dbm.dumb warning to align with shelve warning
Andrea-Oliveri May 17, 2025
6f34de5
Skipping test instead of succeeding if method not implemented for sub…
Andrea-Oliveri May 28, 2025
059ad82
Converted redundant f-string to regular string
Andrea-Oliveri May 28, 2025
3e7049f
Added versionadded to method documentations
Andrea-Oliveri May 28, 2025
2f5af38
Added whatsnew entries
Andrea-Oliveri May 28, 2025
e2370ac
Merged changes from branch main
Andrea-Oliveri May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion Doc/library/dbm.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,16 @@
* :mod:`dbm.ndbm`

If none of these modules are installed, the
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
slow-but-simple implementation in module :mod:`dbm.dumb` will be used. There
is a `third party interface <https://www.jcea.es/programacion/pybsddb.htm>`_ to
the Oracle Berkeley DB.

.. note::
None of the underlying modules will automatically shrink the disk space used by
the database file. However, :mod:`dbm.sqlite3`, :mod:`dbm.gnu` and :mod:`dbm.dumb`
provide a :meth:`!reorganize` method that can be used for this purpose.


.. exception:: error

A tuple containing the exceptions that can be raised by each of the supported
Expand Down Expand Up @@ -186,6 +192,17 @@ or any other SQLite browser, including the SQLite CLI.
The Unix file access mode of the file (default: octal ``0o666``),
used only when the database has to be created.

.. method:: sqlite3.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will be kept and reused as new (key, value) pairs are added.

.. note::
While reorganizing, as much as two times the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.

.. versionadded:: next

:mod:`dbm.gnu` --- GNU database manager
---------------------------------------
Expand Down Expand Up @@ -284,6 +301,10 @@ functionality like crash tolerance.
reorganization; otherwise, deleted file space will be kept and reused as new
(key, value) pairs are added.

.. note::
While reorganizing, as much as one time the size of the original database is required
in free disk space. However, be aware that this factor changes for each :mod:`dbm` submodule.

.. method:: gdbm.sync()

When the database has been opened in fast mode, this method forces any
Expand Down Expand Up @@ -438,6 +459,11 @@ The :mod:`!dbm.dumb` module defines the following:
with a sufficiently large/complex entry due to stack depth limitations in
Python's AST compiler.

.. warning::
:mod:`dbm.dumb` does not support concurrent read/write access. (Multiple
simultaneous read accesses are safe.) When a program has the database open
for writing, no other program should have it open for reading or writing.

.. versionchanged:: 3.5
:func:`~dbm.dumb.open` always creates a new database when *flag* is ``'n'``.

Expand All @@ -460,3 +486,15 @@ The :mod:`!dbm.dumb` module defines the following:
.. method:: dumbdbm.close()

Close the database.

.. method:: dumbdbm.reorganize()

If you have carried out a lot of deletions and would like to shrink the space
used on disk, this method will reorganize the database; otherwise, deleted file
space will not be reused.

.. note::
While reorganizing, no additional free disk space is required. However, be aware
that this factor changes for each :mod:`dbm` submodule.

.. versionadded:: next
16 changes: 14 additions & 2 deletions Doc/library/shelve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,15 @@ Two additional methods are supported:

Write back all entries in the cache if the shelf was opened with *writeback*
set to :const:`True`. Also empty the cache and synchronize the persistent
dictionary on disk, if feasible. This is called automatically when the shelf
is closed with :meth:`close`.
dictionary on disk, if feasible. This is called automatically when
:meth:`reorganize` is called or the shelf is closed with :meth:`close`.

.. method:: Shelf.reorganize()

Calls :meth:`sync` and attempts to shrink space used on disk by removing empty
space resulting from deletions.

.. versionadded:: next

.. method:: Shelf.close()

Expand Down Expand Up @@ -116,6 +123,11 @@ Restrictions
* On macOS :mod:`dbm.ndbm` can silently corrupt the database file on updates,
which can cause hard crashes when trying to read from the database.

* :meth:`Shelf.reorganize` may not be available for all database packages and
may temporarely increase resource usage (especially disk space) when called.
Additionally, it will never run automatically and instead needs to be called
explicitly.


.. class:: Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8')

Expand Down
17 changes: 17 additions & 0 deletions Doc/whatsnew/3.15.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,13 +89,30 @@ New modules
Improved modules
================

dbm
---

* Added new :meth:`!reorganize` methods to :mod:`dbm.dumb` and :mod:`dbm.sqlite3`
which allow to recover unused free space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)


difflib
-------

* Improved the styling of HTML diff pages generated by the :class:`difflib.HtmlDiff`
class, and migrated the output to the HTML5 standard.
(Contributed by Jiahao Li in :gh:`134580`.)


shelve
------

* Added new :meth:`!reorganize` method to :mod:`shelve` used to recover unused free
space previously occupied by deleted entries.
(Contributed by Andrea Oliveri in :gh:`134004`.)


ssl
---

Expand Down
32 changes: 29 additions & 3 deletions Lib/dbm/dumb.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,14 @@
- seems to contain a bug when updating...

- reclaim free space (currently, space once occupied by deleted or expanded
items is never reused)
items is not reused exept if .reorganize() is called)

- support concurrent access (currently, if two processes take turns making
updates, they can mess up the index)

- support efficient access to large databases (currently, the whole index
is read when the database is opened, and some updates rewrite the whole index)

- support opening for read-only (flag = 'm')

"""

import ast as _ast
Expand Down Expand Up @@ -289,6 +287,34 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def reorganize(self):
if self._readonly:
raise error('The database is opened for reading only')
self._verify_open()
# Ensure all changes are committed before reorganizing.
self._commit()
# Open file in r+ to allow changing in-place.
with _io.open(self._datfile, 'rb+') as f:
reorganize_pos = 0

# Iterate over existing keys, sorted by starting byte.
for key in sorted(self._index, key = lambda k: self._index[k][0]):
pos, siz = self._index[key]
f.seek(pos)
val = f.read(siz)

f.seek(reorganize_pos)
f.write(val)
self._index[key] = (reorganize_pos, siz)

blocks_occupied = (siz + _BLOCKSIZE - 1) // _BLOCKSIZE
reorganize_pos += blocks_occupied * _BLOCKSIZE

f.truncate(reorganize_pos)
# Commit changes to index, which were not in-place.
self._commit()



def open(file, flag='c', mode=0o666):
"""Open the database file, filename, and return corresponding object.
Expand Down
4 changes: 4 additions & 0 deletions Lib/dbm/sqlite3.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
STORE_KV = "REPLACE INTO Dict (key, value) VALUES (CAST(? AS BLOB), CAST(? AS BLOB))"
DELETE_KEY = "DELETE FROM Dict WHERE key = CAST(? AS BLOB)"
ITER_KEYS = "SELECT key FROM Dict"
REORGANIZE = "VACUUM"


class error(OSError):
Expand Down Expand Up @@ -122,6 +123,9 @@ def __enter__(self):
def __exit__(self, *args):
self.close()

def reorganize(self):
self._execute(REORGANIZE)


def open(filename, /, flag="r", mode=0o666):
"""Open a dbm.sqlite3 database and return the dbm object.
Expand Down
5 changes: 5 additions & 0 deletions Lib/shelve.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,11 @@ def sync(self):
if hasattr(self.dict, 'sync'):
self.dict.sync()

def reorganize(self):
self.sync()
if hasattr(self.dict, 'reorganize'):
self.dict.reorganize()


class BsdDbShelf(Shelf):
"""Shelf implementation using the "BSD" db interface.
Expand Down
61 changes: 61 additions & 0 deletions Lib/test/test_dbm.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,67 @@ def test_anydbm_access(self):
assert(f[key] == b"Python:")
f.close()

def test_anydbm_readonly_reorganize(self):
self.init_db()
with dbm.open(_fname, 'r') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

self.assertRaises(dbm.error, lambda: d.reorganize())

def test_anydbm_reorganize_not_changed_content(self):
self.init_db()
with dbm.open(_fname, 'c') as d:
# Early stopping.
if not hasattr(d, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

keys_before = sorted(d.keys())
values_before = [d[k] for k in keys_before]
d.reorganize()
keys_after = sorted(d.keys())
values_after = [d[k] for k in keys_before]
self.assertEqual(keys_before, keys_after)
self.assertEqual(values_before, values_after)

def test_anydbm_reorganize_decreased_size(self):

def _calculate_db_size(db_path):
if os.path.isfile(db_path):
return os.path.getsize(db_path)
total_size = 0
for root, _, filenames in os.walk(db_path):
for filename in filenames:
file_path = os.path.join(root, filename)
total_size += os.path.getsize(file_path)
return total_size

# This test requires relatively large databases to reliably show difference in size before and after reorganizing.
with dbm.open(_fname, 'n') as f:
# Early stopping.
if not hasattr(f, 'reorganize'):
self.skipTest("method reorganize not available this dbm submodule")

for k in self._dict:
f[k.encode('ascii')] = self._dict[k] * 100000
db_keys = list(f.keys())

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_before = _calculate_db_size(os.path.dirname(_fname))

# Delete some elements from the start of the database.
keys_to_delete = db_keys[:len(db_keys) // 2]
with dbm.open(_fname, 'c') as f:
for k in keys_to_delete:
del f[k]
f.reorganize()

# Make sure to calculate size of database only after file is closed to ensure file content are flushed to disk.
size_after = _calculate_db_size(os.path.dirname(_fname))

self.assertLess(size_after, size_before)

def test_open_with_bytes(self):
dbm.open(os.fsencode(_fname), "c").close()

Expand Down
1 change: 1 addition & 0 deletions Misc/ACKS
Original file line number Diff line number Diff line change
Expand Up @@ -1365,6 +1365,7 @@ Milan Oberkirch
Pascal Oberndoerfer
Géry Ogam
Seonkyo Ok
Andrea Oliveri
Jeffrey Ollie
Adam Olsen
Bryan Olson
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:mod:`shelve` as well as underlying :mod:`!dbm.dumb` and :mod:`!dbm.sqlite` now have :meth:`!reorganize` methods to
recover unused free space previously occupied by deleted entries.
Loading