Move most of initialization into health check #21

tgross · 2016-04-21T22:57:41Z

Having the setup for the database into the preStart handler has lead to a number of hard-to-debug races because we need to startup a temporary DB and then shut it down to run the "real" DB. This PR moves most of the initialization logic into the health check and drops a file lock on disk to make sure we only do it once per instance.

While I was working on this I added some tooling to help future development:

Added some initial unit tests that you can run inside the container via ./setup.sh test. I'd like to flesh these out further in the future but this is a good starting framework. (Props to @dissipate for encouraging me to finally start on this.)
Added a decorator for trace logging so that it's easier to trace execution given all the side-effects this code has. I've switched the default log level to INFO as well.

Also:

Fixed the bug in the mutex I added for No mutual exclusion enforced for MySQL backups #17
Fixed an annoyance where comments we add to the _env file in ./setup.sh will get parsed as values if you don't remove them.

cc @misterbisson and @moretea @tianon as an FYI

Because we set the "binlog isn't stale" flag at the end of a backup, a newly started cluster will spawn many `create_snapshot` processes even though we're locking the BACKUP_TTL in Consul. This adds a local file lock that prevents the node from running multiples. Also removed an unnecessary reliance on the repl user having access to the end-user's data DB just to run health checks.

If we don't make sure we create the file here we'll throw an IOError and catch it in the except block. This is ok in `is_backup_running` because if the file doesn't even exist we're obviously not running the backup either. But we need to make sure we create the file in `create_snapshot` or we end up bailing out too early.

MySQL must be running in order to execute most of our setup behavior so we're just going to make sure the directory structures are in place and then let the first health check handler take it from there. The happy path on the health check is to check a lock file against the node state and immediately return if we discover the lock exists. Otherwise, we bootstrap the instance.

- Corrected hostname vs service name confusion in get_state() - Added some extra debug logging - Fixed two dumb bugs in the locking for backups

Restoring from snapshot has to happen before we start mysqld, so if we have a clean start then we need to check in with Consul during startup to see if there's a snapshot available. Added debug trace logging and environment variable cleanup functions to assist with development and otherwise make life sane.

misterbisson · 2016-04-22T14:16:22Z

🏡 🚶

tgross added 7 commits April 21, 2016 09:25

Started setting up test code for our management code

adf0ddb

Fix merge conflict

1615203

Rename triton-mysql.py to platform-agnostic name

649ab8d

Use missing password to assert uninitialized tables

5a88bc5

- Corrected hostname vs service name confusion in get_state() - Added some extra debug logging - Fixed two dumb bugs in the locking for backups

tgross mentioned this pull request Apr 22, 2016

No mutual exclusion enforced for MySQL backups #17

Closed

tgross added 2 commits April 21, 2016 20:31

Fix reversed sense of is_backup_running check

7d6f800

tgross changed the title ~~[WIP] Move init into health~~ Move most of initialization into health check Apr 22, 2016

tgross merged commit 3095651 into autopilotpattern:master Apr 22, 2016

tgross mentioned this pull request May 5, 2016

Replication needs to be shut down on failover #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move most of initialization into health check #21

Move most of initialization into health check #21

Uh oh!

tgross commented Apr 21, 2016 •

edited

Loading

Uh oh!

misterbisson commented Apr 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Move most of initialization into health check #21

Move most of initialization into health check #21

Uh oh!

Conversation

tgross commented Apr 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

misterbisson commented Apr 22, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tgross commented Apr 21, 2016 •

edited

Loading