Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@tgross
Copy link
Contributor

@tgross tgross commented Apr 21, 2016

Having the setup for the database into the preStart handler has lead to a number of hard-to-debug races because we need to startup a temporary DB and then shut it down to run the "real" DB. This PR moves most of the initialization logic into the health check and drops a file lock on disk to make sure we only do it once per instance.

While I was working on this I added some tooling to help future development:

  • Added some initial unit tests that you can run inside the container via ./setup.sh test. I'd like to flesh these out further in the future but this is a good starting framework. (Props to @dissipate for encouraging me to finally start on this.)
  • Added a decorator for trace logging so that it's easier to trace execution given all the side-effects this code has. I've switched the default log level to INFO as well.

Also:

cc @misterbisson and @moretea @tianon as an FYI

tgross added 7 commits April 21, 2016 09:25
Because we set the "binlog isn't stale" flag at the end of a backup, a newly
started cluster will spawn many `create_snapshot` processes even though we're
locking the BACKUP_TTL in Consul. This adds a local file lock that prevents
the node from running multiples.

Also removed an unnecessary reliance on the repl user having access to the
end-user's data DB just to run health checks.
If we don't make sure we create the file here we'll throw an IOError and catch
it in the except block. This is ok in `is_backup_running` because if the file
doesn't even exist we're obviously not running the backup either. But we need
to make sure we create the file in `create_snapshot` or we end up bailing out
too early.
MySQL must be running in order to execute most of our setup behavior
so we're just going to make sure the directory structures are in
place and then let the first health check handler take it from there.

The happy path on the health check is to check a lock file against the
node state and immediately return if we discover the lock exists.
Otherwise, we bootstrap the instance.
- Corrected hostname vs service name confusion in get_state()
- Added some extra debug logging
- Fixed two dumb bugs in the locking for backups
tgross added 2 commits April 21, 2016 20:31
Restoring from snapshot has to happen before we start mysqld, so if we have
a clean start then we need to check in with Consul during startup to see if
there's a snapshot available.

Added debug trace logging and environment variable cleanup functions to
assist with development and otherwise make life sane.
@tgross tgross changed the title [WIP] Move init into health Move most of initialization into health check Apr 22, 2016
@misterbisson
Copy link
Contributor

🏡 🚶

@tgross tgross merged commit 3095651 into autopilotpattern:master Apr 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants