Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Feb 24, 2020. It is now read-only.

Conversation

@yifan-gu
Copy link
Contributor

Fix #1890

@yifan-gu yifan-gu added this to the v0.14.0 milestone Dec 18, 2015
@yifan-gu yifan-gu changed the title store: make db reentrant. store: Make db reentrant. Dec 18, 2015
store/db.go Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "XXX" mean in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Comment_%28computer_programming%29
XXX - warn other programmers of problematic or misguiding code

@alban alban modified the milestones: v0.15.0, v0.14.0 Dec 18, 2015
store/db.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's not just for making the race detector happy, it's to prevent legitimate races!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonboulle Why? The first thing is Open() is to acquire a flock, and the last thing in Close() is to close the flock. All the code between these two lines are within the critical section.

@jonboulle
Copy link
Contributor

I'm concerned this is insufficient to make the store.DB threadsafe (which is I assume what you really want). For example, walking through two concurrent Do calls:

A: db.Open() // initializes db.lock and db.sqldb
B: db.Open() // also initializes db.lock and db.sqldb. Sort of no-op from the locking perspective, but now we leak two file handles, and...
A: db.DoTx(f) // ... now technically using B's sqldb/lock handles
A: db.Close() // deferred from earlier. Closes B's sqldb, and then sets db.sql to nil
B: db.DoTx() // panics trying to do `db.sql.Close`

Correct me if I'm totally misreading this, but otherwise there are several different race problems in the above.

@yifan-gu
Copy link
Contributor Author

@jonboulle #1892 (comment)
I don't understand why the B's Open() can pass the line lock.ExclusiveLock() before A closes the lock.

See this example:
https://gist.github.com/yifan-gu/50bfb8bffd839c219e7b

@jonboulle
Copy link
Contributor

Happily, you're correct - I totally missed we're creating new file descriptors for each lock so then the flocks AREN'T reentrant.

Hm, I still wonder if there is some cleaner way we can beat the race detector though

@jonboulle
Copy link
Contributor

Can we wrap the NewExclusiveLock with a store-specific mutex?

@yifan-gu
Copy link
Contributor Author

Can we wrap the NewExclusiveLock with a store-specific mutex?

Will do.

I figured out maybe it's the title that misleaded you, sorry :)

@yifan-gu yifan-gu force-pushed the reentrant_db branch 2 times, most recently from 48db311 to a7bb9a7 Compare December 18, 2015 21:36
@yifan-gu yifan-gu changed the title store: Make db reentrant. store: Make db thread-safe. Dec 18, 2015
@yifan-gu
Copy link
Contributor Author

Actually I just put the mutex locking before the flock.

store/db.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to db.Unlock() here. I feel a bit uneasy about all this :/

@yifan-gu yifan-gu force-pushed the reentrant_db branch 6 times, most recently from 6bcaf6b to 762fd92 Compare December 19, 2015 00:59
@yifan-gu
Copy link
Contributor Author

Updated

@jonboulle
Copy link
Contributor

LGTM, but another pair of eyes would be good too.

@yifan-gu
Copy link
Contributor Author

@iaguis @alban @steveej ? Would require this as not to panic api service in rktnetes

@jonboulle
Copy link
Contributor

or @vcaputo if he's around

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move the unlocking of the mutex above unlocking of the file lock? That way we have a LIFO unlocking, and we always unlock the locked mutex, even if file unlock fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krnowak Why we want to unlock the mutex when we failed to unlock flock?
#1892 (comment)

I think from man flock(2), unlocking a flock could return EBADF EINTR EINVAL. If we use the dbLock correct here, I can't imagine there should be failures except for memory flips. So I think when a failure happens, we should be able to retry. However if we move the mutex unlock above, we can't retry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yifan-gu: I was treating flock like an another kind of mutex, so you usually do locking/unlocking in stack like manner (l1.lock, l2.lock, l2.unlock, l1.unlock). OTOH, we don't really care about sync.Mutex (it is here to shut up go race detector), so your reasoning is fair enough for me.

@yifan-gu yifan-gu force-pushed the reentrant_db branch 2 times, most recently from 39eefb2 to 208be98 Compare January 4, 2016 19:53
@yifan-gu
Copy link
Contributor Author

yifan-gu commented Jan 4, 2016

@krnowak Thanks for the review :) Updated. Except for the ordering of unlocking.

@vcaputo
Copy link
Contributor

vcaputo commented Jan 4, 2016

lgtm, but are we sure this change is all that's needed to make safe what that panic and the existing lock implementation looks to be pretty intentionally preventing? @yifan-gu

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Jan 4, 2016

@vcaputo So that previous panic assumes that there is no concurrent tries of acquiring the flock in one process (one single call of rkt cli). As now we provide a long running process which can handle requests concurrently, the assumption is not true anymore, so we should remove the panic. I achieved this by having the fd of the flock open during the whole life of the rkt process(api-service, run, fetch, etc.), so now it's safe for multiple concurrent threads to retry the flock.

The mutex doesn't have much effect here as flock also provides thread-safety, it's merely for the go race detector.

@vcaputo
Copy link
Contributor

vcaputo commented Jan 4, 2016

Makes sense

store/db_test.go Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err at this point is always nil, so return value, nil, please.

@krnowak
Copy link
Collaborator

krnowak commented Jan 5, 2016

Small nitpicks, otherwise LFAD.

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Jan 5, 2016

nit fixed. Will merge once this is green

@yifan-gu
Copy link
Contributor Author

yifan-gu commented Jan 5, 2016

Merging this to fix #1890

yifan-gu added a commit that referenced this pull request Jan 5, 2016
@yifan-gu yifan-gu merged commit e9b3cbc into rkt:master Jan 5, 2016
@yifan-gu yifan-gu deleted the reentrant_db branch January 5, 2016 22:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants