-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-27645: Supporting native backup facility of SQLite #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Suggested by Aviv Palivoda.
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:
Thanks again to your contribution and we look forward to looking at it! |
@lelit, thanks for your PR! By analyzing the history of the files in this pull request, we identified @berkerpeksag, @zware, @gvanrossum, @avassalotti and @orsenthil to be potential reviewers. |
I configured my GitHub username on b.p.o. |
Doc/library/sqlite3.rst
Outdated
@@ -521,6 +521,34 @@ Connection Objects | |||
f.write('%s\n' % line) | |||
|
|||
|
|||
.. method:: backup(filename[, pages, progress]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to make pages and progress keyword-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Doc/library/sqlite3.rst
Outdated
@@ -521,6 +521,34 @@ Connection Objects | |||
f.write('%s\n' % line) | |||
|
|||
|
|||
.. method:: backup(filename[, pages, progress]) | |||
|
|||
This method exposes the `API`__ that allows to make a backup of a SQLite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add a link to SQLite docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Doc/library/sqlite3.rst
Outdated
|
||
con = sqlite3.connect('existing_db.db') | ||
con.backup('copy_of_existing_db.db', 1, progress) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs:
.. versionadded:: 3.7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Doc/library/sqlite3.rst
Outdated
database into the mandatory argument *filename*, even while it's being accessed | ||
by other clients, or concurrently by the same connection. | ||
|
||
By default, or when *pages* is either ``0`` or a negative integer, the entire |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we really need to expose this. The default value should cover most of the use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that I should remove the pages argument? But if I do that, also the progress one becomes pointless, because with the default value there will be at most one callback, when the engine completes the backup. What am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to keep pages for huge databases.
Misc/NEWS
Outdated
@@ -34,6 +36,9 @@ Core and Builtins | |||
|
|||
- bpo-29546: Improve from-import error message with location | |||
|
|||
- Issue #7063: Remove dead code from the "array" module's slice handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be deleted.
} | ||
|
||
Py_BEGIN_ALLOW_THREADS | ||
rc = sqlite3_open(filename, &bckconn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#359 could be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you are suggesting: should I take the same approach of conditionally using sqlite3_open_v2() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to say that this should be merged after #359.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This don't actually have any effect. sqlite3_open(filename, db)
is the same as sqlite3_open_v2(filename, db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL)
Misc/NEWS
Outdated
@@ -10,6 +10,8 @@ What's New in Python 3.7.0 alpha 1? | |||
Core and Builtins | |||
----------------- | |||
|
|||
- bpo-27645: sqlite3.Connection now exposes a backup() method. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "Patch by Your Name." to the next line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Lib/test/test_sqlite.py
Outdated
@@ -18,7 +18,7 @@ def load_tests(*args): | |||
userfunctions.suite(), | |||
factory.suite(), transactions.suite(), | |||
hooks.suite(), regression.suite(), | |||
dump.suite()]) | |||
dump.suite(), backup.suite()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Add this to its own line so we won't clutter git blame
next time we add something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Doc/library/sqlite3.rst
Outdated
import sqlite3 | ||
|
||
def progress(remaining, total): | ||
print("Copied %d of %d pages..." % (total-remaining, total)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use f-strings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Thank you @berkerpeksag! |
Modules/_sqlite/connection.c
Outdated
/* Sleep for 250ms if there are still further pages to copy */ | ||
if (rc == SQLITE_OK || rc == SQLITE_BUSY || rc == SQLITE_LOCKED) { | ||
Py_BEGIN_ALLOW_THREADS | ||
sqlite3_sleep(250); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we need to sleep when SQLITE_OK
is returned? I know this is what is done in the example but we do allow other threads to take the lock between calls to sqlite3_backup_step
.
In addition we might want to have some kind of timeout here. We can block forever if someone holds the db lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we should: the step function returns OK when it “successfully copies N pages and there are still more pages to be copied”.
Not sure about the timeout: maybe we could check whether there is no progress in, say, X calls to the step function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reconsidering, I think you're right here: there's no reason to delay next iteration when the result was SQLITE_OK
.
rc = sqlite3_backup_step(bckhandle, pages); | ||
Py_END_ALLOW_THREADS | ||
|
||
if (progress != Py_None) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call progress
only when sqlite3_backup_step
return SQLITE_OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about that: maybe a better option would be pass the result status to the progress
callback, so that it can eventually signal the error and do some cleanup (think about closing a progress dialog).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the progress callback receives three arguments.
Modules/_sqlite/connection.c
Outdated
Py_INCREF(Py_None); | ||
retval = Py_None; | ||
} else { | ||
/* TODO: should the (probably incomplete/invalid) backup be removed here? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should delete the backup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do that.
Thanks a lot for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need some help here: I expected to find an abstract/portable PyOS_UnlinkFile()
, but couldn't find it.
What is the recommended idiom to remove a file from C? Is the only way to "import" the os
module and then PyObject_CallFunction()
on its unlink
member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the simplest way for now, using unlink(2) from unistd.h
. Any hint for a better way is welcome.
Modules/_sqlite/connection.c
Outdated
sqlite3_backup_pagecount(bckhandle))) { | ||
/* User's callback raised an error: interrupt the loop and | ||
propagate it. */ | ||
rc = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a test for the case that progress
raise an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Modules/_sqlite/connection.c
Outdated
} | ||
|
||
if (rc != -1) { | ||
rc = _pysqlite_seterror(bckconn, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this but from the documentation sqlite3_backup_step
will not set an error code on error but will return one. I can see that in the example they do the same as here but I am not sure if this is the right behavior. Would we receive SQLITE_NOMEM
error from sqlite3_backup_step
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try to understand the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to address this concern in my latest patches.
Modules/_sqlite/connection.c
Outdated
} | ||
|
||
Py_BEGIN_ALLOW_THREADS | ||
bckhandle = sqlite3_backup_init(bckconn, "main", self->db, "main"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't database name be a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not really the database name: accordingly to the documentation that name denotes the target, that can either be "main" for the main storage, "temp" for the temporary storage or the name of the attached database.
Of course we can easily expose that as a (default) argument, but I did not see as necessary when I coded the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a source_name
and a dest_name
parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, further investigation seems to reveal that the dest_name
is pointless: attached databases are not a persistent property of a database, so that we cannot allow the following:
bck = sqlite.connect('/tmp/thebackup')
bck.execute(f"ATTACH DATABASE '/tmp/theattachment' AS attached_db")
bck.commit()
db = sqlite.connect('/tmp/database')
db.backup('/tmp/thebackup', dest_name='attached_db')
Changed back to a single name
argument to specify the source database...
Suggested by Aviv Palivoda.
When sqlite3_backup_step() returns OK it means that it was able to do it's work but there are further pages to be copied: in such case it's better to immediately start the next iteration, without opening a window that allows other threads to go on and possibly locking the database. Suggested by Aviv Palivoda.
Modules/_sqlite/connection.c
Outdated
@@ -1478,6 +1478,94 @@ pysqlite_connection_iterdump(pysqlite_Connection* self, PyObject* args) | |||
} | |||
|
|||
static PyObject * | |||
pysqlite_connection_backup(pysqlite_Connection* self, PyObject* args, PyObject* kwds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I read SQLite release notes correctly, sqlite3_backup_init
has been added in 3.6.11 so we need to expose this only if SQLite version is 3.6.11 or newer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, is it better to omit the whole function or rather raise an NotImplementedError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's omit the whole function. This is what we do for other version specific features in the stdlib. Please also update the documentation to note that this feature is available from SQLite 3.6.11.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, done. Thank you!
Lib/sqlite3/test/backup.py
Outdated
from tempfile import NamedTemporaryFile | ||
import unittest | ||
|
||
class BackupTests(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a @unittest.skipIf(...)
decorator to skip the whole test case if SQLite version is not 3.6.11 or newer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add tests to make sure that keyword-only parameters are really keyword-only:
self.cx.backup(bckfn.name, 1, lambda x, y: None) # should raise TypeError
Doc/library/sqlite3.rst
Outdated
database is copied in a single step; otherwise the method performs a loop | ||
copying up to the specified *pages* at a time. | ||
|
||
If *progress* is specified, it must either ``None`` or a callable object that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either be ``None``
…llback Add new SQLITE_DONE integer constant to the module, that could be used by the callback to determine whether the backup is terminated.
This should address Aviv's concerns about how the sqlite3_backup_step() errors are handled.
Is there anything else I could/should do to see this accepted? |
|
||
This method makes a backup of a SQLite database into the mandatory argument | ||
*filename*, even while it's being accessed by other clients, or concurrently by | ||
the same connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that backups work for on-disk and in-memory databases like the sqlite3 C api docs mention?
FYI, I rebased these commits on top of current master: https://github.com/lelit/cpython/tree/sqlite-backup-api-v3 |
@Mariatta I'm confused by that "added |
@lelit We can only review PRs in CPython repo. Currently this PR has conflict. If the conflict is resolved in a different branch, maybe you can create the PR for the other branch? |
Does that mean you'd prefer me to rebase the branch I opened the PR with? |
@lelit There's a conflict because you edited Misc/NEWS and it will no longer merge cleanly as-is. See https://devguide.python.org/committing/?highlight=news.d#what-s-new-and-news-entries on how to fix this. |
@brettcannon, I rebased the branch into https://github.com/lelit/cpython/compare/sqlite-backup-api-v3 As said in this comment I'm sorry about the fuss, but it's not clear to me whether I should have closed this PR and reopened a new one on the v3 branch, or if instead I should have rewritten the same v2 branch. As always, I'm willing to do whatever is easier/more correct for you! |
@lelit I'm not sure what 'v2' and 'v3' are referring to (some personal branches in your fork you have?), but if you look at https://github.com/python/cpython/pull/377/files you will see that the branch your PR is based on has not been updated in regards to the NEWS file. If it's easier, you can just close this PR, open a new one if you have some other branch that's in better shape, and then refer to this old PR in the new PR's opening comment. |
Ok, doing that immediately. |
As suggested by Brett, I'm closing this PR, replaced by #4238. |
This is a set of patches (rebased on current master) that adds a new
backup()
method onsqlite3.Connection
.https://bugs.python.org/issue27645