store: Make db thread-safe. #1892

yifan-gu · 2015-12-18T00:38:53Z

alban · 2015-12-18T14:30:00Z

store/db.go

What does "XXX" mean in the comment?

https://en.wikipedia.org/wiki/Comment_%28computer_programming%29
XXX - warn other programmers of problematic or misguiding code

jonboulle · 2015-12-18T18:13:22Z

store/db.go

No, it's not just for making the race detector happy, it's to prevent legitimate races!

@jonboulle Why? The first thing is Open() is to acquire a flock, and the last thing in Close() is to close the flock. All the code between these two lines are within the critical section.

jonboulle · 2015-12-18T18:29:06Z

I'm concerned this is insufficient to make the store.DB threadsafe (which is I assume what you really want). For example, walking through two concurrent Do calls:

A: db.Open() // initializes db.lock and db.sqldb
B: db.Open() // also initializes db.lock and db.sqldb. Sort of no-op from the locking perspective, but now we leak two file handles, and...
A: db.DoTx(f) // ... now technically using B's sqldb/lock handles
A: db.Close() // deferred from earlier. Closes B's sqldb, and then sets db.sql to nil
B: db.DoTx() // panics trying to do `db.sql.Close`

Correct me if I'm totally misreading this, but otherwise there are several different race problems in the above.

yifan-gu · 2015-12-18T18:40:42Z

@jonboulle #1892 (comment)
I don't understand why the B's Open() can pass the line lock.ExclusiveLock() before A closes the lock.

See this example:
https://gist.github.com/yifan-gu/50bfb8bffd839c219e7b

jonboulle · 2015-12-18T18:53:34Z

Happily, you're correct - I totally missed we're creating new file descriptors for each lock so then the flocks AREN'T reentrant.

Hm, I still wonder if there is some cleaner way we can beat the race detector though

jonboulle · 2015-12-18T18:55:37Z

Can we wrap the NewExclusiveLock with a store-specific mutex?

yifan-gu · 2015-12-18T21:26:58Z

Can we wrap the NewExclusiveLock with a store-specific mutex?

Will do.

I figured out maybe it's the title that misleaded you, sorry :)

yifan-gu · 2015-12-18T21:37:47Z

Actually I just put the mutex locking before the flock.

jonboulle · 2015-12-18T21:56:43Z

store/db.go

Need to db.Unlock() here. I feel a bit uneasy about all this :/

yifan-gu · 2015-12-21T18:38:23Z

Updated

jonboulle · 2015-12-22T16:23:08Z

LGTM, but another pair of eyes would be good too.

yifan-gu · 2015-12-23T21:39:30Z

@iaguis @alban @steveej ? Would require this as not to panic api service in rktnetes

jonboulle · 2015-12-25T16:49:23Z

or @vcaputo if he's around

krnowak · 2016-01-04T14:13:24Z

store/db.go

How about we move the unlocking of the mutex above unlocking of the file lock? That way we have a LIFO unlocking, and we always unlock the locked mutex, even if file unlock fails.

@krnowak Why we want to unlock the mutex when we failed to unlock flock?
#1892 (comment)

I think from man flock(2), unlocking a flock could return EBADF EINTR EINVAL. If we use the dbLock correct here, I can't imagine there should be failures except for memory flips. So I think when a failure happens, we should be able to retry. However if we move the mutex unlock above, we can't retry.

@yifan-gu: I was treating flock like an another kind of mutex, so you usually do locking/unlocking in stack like manner (l1.lock, l2.lock, l2.unlock, l1.unlock). OTOH, we don't really care about sync.Mutex (it is here to shut up go race detector), so your reasoning is fair enough for me.

yifan-gu · 2016-01-04T19:53:23Z

@krnowak Thanks for the review :) Updated. Except for the ordering of unlocking.

vcaputo · 2016-01-04T21:35:59Z

lgtm, but are we sure this change is all that's needed to make safe what that panic and the existing lock implementation looks to be pretty intentionally preventing? @yifan-gu

yifan-gu · 2016-01-04T22:30:33Z

@vcaputo So that previous panic assumes that there is no concurrent tries of acquiring the flock in one process (one single call of rkt cli). As now we provide a long running process which can handle requests concurrently, the assumption is not true anymore, so we should remove the panic. I achieved this by having the fd of the flock open during the whole life of the rkt process(api-service, run, fetch, etc.), so now it's safe for multiple concurrent threads to retry the flock.

The mutex doesn't have much effect here as flock also provides thread-safety, it's merely for the go race detector.

vcaputo · 2016-01-04T22:59:18Z

Makes sense

krnowak · 2016-01-05T09:42:19Z

store/db_test.go

err at this point is always nil, so return value, nil, please.

krnowak · 2016-01-05T09:47:54Z

Small nitpicks, otherwise LFAD.

yifan-gu · 2016-01-05T20:06:43Z

nit fixed. Will merge once this is green

yifan-gu · 2016-01-05T22:44:59Z

Merging this to fix #1890

store: Make db thread-safe.

yifan-gu added the component/store label Dec 18, 2015

yifan-gu added this to the v0.14.0 milestone Dec 18, 2015

yifan-gu added needs/review priority/P1 labels Dec 18, 2015

yifan-gu force-pushed the reentrant_db branch from 157435e to c37efe0 Compare December 18, 2015 02:26

yifan-gu changed the title ~~store: make db reentrant.~~ store: Make db reentrant. Dec 18, 2015

alban reviewed Dec 18, 2015
View reviewed changes

alban modified the milestones: v0.15.0, v0.14.0 Dec 18, 2015

jonboulle reviewed Dec 18, 2015
View reviewed changes

yifan-gu force-pushed the reentrant_db branch from c37efe0 to 28b95e0 Compare December 18, 2015 18:19

yifan-gu force-pushed the reentrant_db branch 2 times, most recently from 48db311 to a7bb9a7 Compare December 18, 2015 21:36

yifan-gu changed the title ~~store: Make db reentrant.~~ store: Make db thread-safe. Dec 18, 2015

jonboulle reviewed Dec 18, 2015
View reviewed changes

store/db.go Outdated

Copy link

Contributor

jonboulle Dec 18, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to db.Unlock() here. I feel a bit uneasy about all this :/

yifan-gu force-pushed the reentrant_db branch 6 times, most recently from 6bcaf6b to 762fd92 Compare December 19, 2015 00:59

krnowak reviewed Jan 4, 2016
View reviewed changes

krnowak added needs/some-changes and removed needs/review labels Jan 4, 2016

yifan-gu force-pushed the reentrant_db branch 2 times, most recently from 39eefb2 to 208be98 Compare January 4, 2016 19:53

yifan-gu added needs/review and removed needs/some-changes labels Jan 4, 2016

yifan-gu force-pushed the reentrant_db branch from 208be98 to d006975 Compare January 4, 2016 21:10

krnowak reviewed Jan 5, 2016
View reviewed changes

store/db_test.go Outdated

Copy link

Collaborator

krnowak Jan 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err at this point is always nil, so return value, nil, please.

store: Make db thread-safe.

d6b3ea8

yifan-gu force-pushed the reentrant_db branch from d006975 to d6b3ea8 Compare January 5, 2016 20:06

yifan-gu added a commit that referenced this pull request Jan 5, 2016

Merge pull request #1892 from yifan-gu/reentrant_db

e9b3cbc

store: Make db thread-safe.

yifan-gu merged commit e9b3cbc into rkt:master Jan 5, 2016

yifan-gu deleted the reentrant_db branch January 5, 2016 22:45

yifan-gu mentioned this pull request Apr 7, 2016

store: fix multi process with multi goroutines race on db. #2391

Merged

Uh oh!

store: Make db thread-safe. #1892

store: Make db thread-safe. #1892

Uh oh!

Conversation

yifan-gu commented Dec 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonboulle commented Dec 18, 2015

Uh oh!

yifan-gu commented Dec 18, 2015

Uh oh!

jonboulle commented Dec 18, 2015

Uh oh!

jonboulle commented Dec 18, 2015

Uh oh!

yifan-gu commented Dec 18, 2015

Uh oh!

yifan-gu commented Dec 18, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifan-gu commented Dec 21, 2015

Uh oh!

jonboulle commented Dec 22, 2015

Uh oh!

yifan-gu commented Dec 23, 2015

Uh oh!

jonboulle commented Dec 25, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifan-gu commented Jan 4, 2016

Uh oh!

vcaputo commented Jan 4, 2016

Uh oh!

yifan-gu commented Jan 4, 2016

Uh oh!

vcaputo commented Jan 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krnowak commented Jan 5, 2016

Uh oh!

yifan-gu commented Jan 5, 2016

Uh oh!

yifan-gu commented Jan 5, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants