Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-27645: Supporting native backup facility of SQLite #377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

lelit
Copy link
Contributor

@lelit lelit commented Mar 1, 2017

This is a set of patches (rebased on current master) that adds a new backup() method on sqlite3.Connection.

https://bugs.python.org/issue27645

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement. The "bugs.python.org username " requested by the form is the "Login name" field under "Your Details".
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@mention-bot
Copy link

@lelit, thanks for your PR! By analyzing the history of the files in this pull request, we identified @berkerpeksag, @zware, @gvanrossum, @avassalotti and @orsenthil to be potential reviewers.

@lelit
Copy link
Contributor Author

lelit commented Mar 1, 2017

I configured my GitHub username on b.p.o.

@@ -521,6 +521,34 @@ Connection Objects
f.write('%s\n' % line)


.. method:: backup(filename[, pages, progress])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to make pages and progress keyword-only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -521,6 +521,34 @@ Connection Objects
f.write('%s\n' % line)


.. method:: backup(filename[, pages, progress])

This method exposes the `API`__ that allows to make a backup of a SQLite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add a link to SQLite docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.


con = sqlite3.connect('existing_db.db')
con.backup('copy_of_existing_db.db', 1, progress)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs:

.. versionadded:: 3.7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

database into the mandatory argument *filename*, even while it's being accessed
by other clients, or concurrently by the same connection.

By default, or when *pages* is either ``0`` or a negative integer, the entire
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really need to expose this. The default value should cover most of the use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that I should remove the pages argument? But if I do that, also the progress one becomes pointless, because with the default value there will be at most one callback, when the engine completes the backup. What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep pages for huge databases.

Misc/NEWS Outdated
@@ -34,6 +36,9 @@ Core and Builtins

- bpo-29546: Improve from-import error message with location

- Issue #7063: Remove dead code from the "array" module's slice handling.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be deleted.

}

Py_BEGIN_ALLOW_THREADS
rc = sqlite3_open(filename, &bckconn);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#359 could be useful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you are suggesting: should I take the same approach of conditionally using sqlite3_open_v2() instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to say that this should be merged after #359.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This don't actually have any effect. sqlite3_open(filename, db) is the same as sqlite3_open_v2(filename, db, SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE, NULL)

Misc/NEWS Outdated
@@ -10,6 +10,8 @@ What's New in Python 3.7.0 alpha 1?
Core and Builtins
-----------------

- bpo-27645: sqlite3.Connection now exposes a backup() method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "Patch by Your Name." to the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@@ -18,7 +18,7 @@ def load_tests(*args):
userfunctions.suite(),
factory.suite(), transactions.suite(),
hooks.suite(), regression.suite(),
dump.suite()])
dump.suite(), backup.suite()])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Add this to its own line so we won't clutter git blame next time we add something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

import sqlite3

def progress(remaining, total):
print("Copied %d of %d pages..." % (total-remaining, total))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use f-strings here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@lelit
Copy link
Contributor Author

lelit commented Mar 1, 2017

Thank you @berkerpeksag!

/* Sleep for 250ms if there are still further pages to copy */
if (rc == SQLITE_OK || rc == SQLITE_BUSY || rc == SQLITE_LOCKED) {
Py_BEGIN_ALLOW_THREADS
sqlite3_sleep(250);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we need to sleep when SQLITE_OK is returned? I know this is what is done in the example but we do allow other threads to take the lock between calls to sqlite3_backup_step.
In addition we might want to have some kind of timeout here. We can block forever if someone holds the db lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we should: the step function returns OK when it “successfully copies N pages and there are still more pages to be copied”.
Not sure about the timeout: maybe we could check whether there is no progress in, say, X calls to the step function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconsidering, I think you're right here: there's no reason to delay next iteration when the result was SQLITE_OK.

rc = sqlite3_backup_step(bckhandle, pages);
Py_END_ALLOW_THREADS

if (progress != Py_None) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call progress only when sqlite3_backup_step return SQLITE_OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about that: maybe a better option would be pass the result status to the progress callback, so that it can eventually signal the error and do some cleanup (think about closing a progress dialog).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the progress callback receives three arguments.

Py_INCREF(Py_None);
retval = Py_None;
} else {
/* TODO: should the (probably incomplete/invalid) backup be removed here? */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should delete the backup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do that.

Thanks a lot for the review!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some help here: I expected to find an abstract/portable PyOS_UnlinkFile(), but couldn't find it.
What is the recommended idiom to remove a file from C? Is the only way to "import" the os module and then PyObject_CallFunction() on its unlink member?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the simplest way for now, using unlink(2) from unistd.h. Any hint for a better way is welcome.

sqlite3_backup_pagecount(bckhandle))) {
/* User's callback raised an error: interrupt the loop and
propagate it. */
rc = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a test for the case that progress raise an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

if (rc != -1) {
rc = _pysqlite_seterror(bckconn, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this but from the documentation sqlite3_backup_step will not set an error code on error but will return one. I can see that in the example they do the same as here but I am not sure if this is the right behavior. Would we receive SQLITE_NOMEM error from sqlite3_backup_step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to understand the point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to address this concern in my latest patches.

}

Py_BEGIN_ALLOW_THREADS
bckhandle = sqlite3_backup_init(bckconn, "main", self->db, "main");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't database name be a parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not really the database name: accordingly to the documentation that name denotes the target, that can either be "main" for the main storage, "temp" for the temporary storage or the name of the attached database.
Of course we can easily expose that as a (default) argument, but I did not see as necessary when I coded the feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a source_name and a dest_name parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, further investigation seems to reveal that the dest_name is pointless: attached databases are not a persistent property of a database, so that we cannot allow the following:

bck = sqlite.connect('/tmp/thebackup')
bck.execute(f"ATTACH DATABASE '/tmp/theattachment' AS attached_db")
bck.commit()

db = sqlite.connect('/tmp/database')
db.backup('/tmp/thebackup', dest_name='attached_db')

Changed back to a single name argument to specify the source database...

lelit added 4 commits March 3, 2017 10:09
When sqlite3_backup_step() returns OK it means that it was able to do it's work but there are
further pages to be copied: in such case it's better to immediately start the next iteration,
without opening a window that allows other threads to go on and possibly locking the database.

Suggested by Aviv Palivoda.
@@ -1478,6 +1478,94 @@ pysqlite_connection_iterdump(pysqlite_Connection* self, PyObject* args)
}

static PyObject *
pysqlite_connection_backup(pysqlite_Connection* self, PyObject* args, PyObject* kwds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read SQLite release notes correctly, sqlite3_backup_init has been added in 3.6.11 so we need to expose this only if SQLite version is 3.6.11 or newer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, is it better to omit the whole function or rather raise an NotImplementedError?

Copy link
Member

@berkerpeksag berkerpeksag Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's omit the whole function. This is what we do for other version specific features in the stdlib. Please also update the documentation to note that this feature is available from SQLite 3.6.11.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, done. Thank you!

from tempfile import NamedTemporaryFile
import unittest

class BackupTests(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a @unittest.skipIf(...) decorator to skip the whole test case if SQLite version is not 3.6.11 or newer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests to make sure that keyword-only parameters are really keyword-only:

self.cx.backup(bckfn.name, 1, lambda x, y: None)  # should raise TypeError

database is copied in a single step; otherwise the method performs a loop
copying up to the specified *pages* at a time.

If *progress* is specified, it must either ``None`` or a callable object that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either be ``None``

…llback

Add new SQLITE_DONE integer constant to the module, that could be used by the callback to
determine whether the backup is terminated.
lelit added 2 commits March 4, 2017 10:45
This should address Aviv's concerns about how the sqlite3_backup_step() errors are handled.
@lelit
Copy link
Contributor Author

lelit commented Mar 24, 2017

Is there anything else I could/should do to see this accepted?


This method makes a backup of a SQLite database into the mandatory argument
*filename*, even while it's being accessed by other clients, or concurrently by
the same connection.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that backups work for on-disk and in-memory databases like the sqlite3 C api docs mention?

@lelit
Copy link
Contributor Author

lelit commented Oct 5, 2017

FYI, I rebased these commits on top of current master: https://github.com/lelit/cpython/tree/sqlite-backup-api-v3

@lelit
Copy link
Contributor Author

lelit commented Oct 24, 2017

@Mariatta I'm confused by that "added needs rebase and removed needs rebase, what should I do?
Also, given that current implementation (rebased on master of three weeks ago) lives in a different branch (https://github.com/lelit/cpython/compare/sqlite-backup-api-v3) should I do something wrt this PR?

@Mariatta
Copy link
Member

@lelit We can only review PRs in CPython repo. Currently this PR has conflict. If the conflict is resolved in a different branch, maybe you can create the PR for the other branch?
Although I would prefer that the conflict is resolved in the same branch so we get to keep the past discussions in the same PR.

@lelit
Copy link
Contributor Author

lelit commented Oct 30, 2017

Does that mean you'd prefer me to rebase the branch I opened the PR with?

@brettcannon brettcannon closed this Nov 2, 2017
@brettcannon brettcannon reopened this Nov 2, 2017
@brettcannon
Copy link
Member

@lelit There's a conflict because you edited Misc/NEWS and it will no longer merge cleanly as-is. See https://devguide.python.org/committing/?highlight=news.d#what-s-new-and-news-entries on how to fix this.

@brettcannon brettcannon changed the title bpo-27645: Supporting native backup facility of SQLite bpo-27645: Supporting native backup facility of SQLite. Nov 2, 2017
@brettcannon brettcannon changed the title bpo-27645: Supporting native backup facility of SQLite. bpo-27645: Supporting native backup facility of SQLite Nov 2, 2017
@lelit
Copy link
Contributor Author

lelit commented Nov 2, 2017

@brettcannon, I rebased the branch into https://github.com/lelit/cpython/compare/sqlite-backup-api-v3
rewriting the NEWS change using the new tool.

As said in this comment I'm sorry about the fuss, but it's not clear to me whether I should have closed this PR and reopened a new one on the v3 branch, or if instead I should have rewritten the same v2 branch.

As always, I'm willing to do whatever is easier/more correct for you!

@brettcannon
Copy link
Member

@lelit I'm not sure what 'v2' and 'v3' are referring to (some personal branches in your fork you have?), but if you look at https://github.com/python/cpython/pull/377/files you will see that the branch your PR is based on has not been updated in regards to the NEWS file.

If it's easier, you can just close this PR, open a new one if you have some other branch that's in better shape, and then refer to this old PR in the new PR's opening comment.

@lelit
Copy link
Contributor Author

lelit commented Nov 2, 2017

Ok, doing that immediately.

@lelit
Copy link
Contributor Author

lelit commented Nov 2, 2017

As suggested by Brett, I'm closing this PR, replaced by #4238.

@lelit lelit closed this Nov 2, 2017
@python python deleted a comment from richmoore1962 Nov 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants