-
Notifications
You must be signed in to change notification settings - Fork 92
Add sync.Mutex
and os.Rename
to prevent corrupted file when downloading the Postgres archive
#105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
A mutex here won't work because the race is between different processes, not between different tests in the same process. |
So in our particular case this appears to be happening when we spin up multiple instances from the same parent go process. I feel that different parent processes will be much less likely to experience this issue, as the timing is likely to be different enough. In addition I would think that the host machine should manage file access better. I've been playing with this test case and was able to replicate it the func Test_ConcurrentStart(t *testing.T) {
var wg sync.WaitGroup
database := NewDatabase()
cacheLocation, _ := database.cacheLocator()
if err := os.RemoveAll(cacheLocation); err != nil {
panic(err)
}
port := 5432
for i := 1; i <= 5; i++ {
port = port + 1
wg.Add(1)
go func(p int) {
defer wg.Done()
tempDir, err := os.MkdirTemp("", "embedded_postgres_test")
assert.NoError(t, err)
database := NewDatabase(
DefaultConfig().
Port(uint32(p)).
DataPath(tempDir).
RuntimePath(tempDir),
)
if err := database.Start(); err != nil {
shutdownDBAndFail(t, err, database)
}
if err := database.Stop(); err != nil {
shutdownDBAndFail(t, err, database)
}
}(port)
}
wg.Wait()
} |
We run into it, so it's still pretty likely. |
BTW the fix proposed in the issue (#96 (comment)) is the correct fix and works for both cases. |
Ah - sorry. I had addressed the issue we were facing, but hadn't understood yours properly. Sorry I had assumed it was the same thing - and should have read your issue more carefully. I'll close this PR. Is it worth me opening one using https://github.com/natefinch/atomic or will you do this? |
I wasn't planning on opening a PR (we've worked around it by serializing the tests for now). Not sure what are the project's rules around bringing in new dependencies... |
Hey all, This is almost exactly what we're looking for to resolve the issue. However @vanzin and @alecsammon as you guessed we are hesitant to introduce any new dependencies. Ideally we should be able to do a download to a temporary location and move to the intended location as a work around, ignoring failures when the file already exists. If you're up for it, you're welcome to have a go at introducing this technique. I'll have some spare time in the coming weeks and should be able to have a go myself if not. |
sync.Mutex
and os.Rename
to prevent corrupted file when downloading the Postgres archive
8884b81
to
543afd6
Compare
e17d191
to
c7db0ef
Compare
Sorry for all the noise here - was really struggling to understand why this was failing on windows and not linux. Turns out the issue is probably this when using temporary files on windows then:
Manually closing the temporary file looks to solve this problem, and allow the rename to happen. Now I'm close to having something that works I can spend some time cleaning up the PR and hopefully have it ready for review soon. |
decompression.go
Outdated
@@ -21,9 +21,14 @@ func defaultTarReader(xzReader *xz.Reader) (func() (*tar.Header, error), func() | |||
} | |||
|
|||
func decompressTarXz(tarReader func(*xz.Reader) (func() (*tar.Header, error), func() io.Reader), path, extractPath string) error { | |||
tempExtractPath, err := os.MkdirTemp("", "embedded_postgres") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a couple of problems here.
- you're not cleaning up this temp dir (minor problem but it's not nice to leave this stuff behind)
- the temp dir is being created in the current working directory
The latter is a problem because rename doesn't work across filesystems, and this temp dir may be in a different fs than extractPath
.
What you want is to create a temp file in the same directory as the final file. Or create the temp dir under that directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - that make sense - thanks!
I've updated the PR to do both of those:
- Have the temp locations in the same location as the extract location.
- Ensure the temp locations are cleaned up when necessary.
@fergusstrange anything else that's needed to move this forward? |
remote_fetch.go
Outdated
} | ||
}() | ||
|
||
if err := os.WriteFile(tmp.Name(), archiveBytes, file.FileHeader.Mode()); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit weird, because tmp
created above is already an open file. So you have two open descriptors to the same "name", and below when you close tmp
I'm not sure what the end state is. You should probably call tmp.Close()
before this line.
Overall I'm a bit confused about all the code you're adding... to me only the changes in this file are needed, so I haven't even looked at the rest.
Hey @alecsammon thank you for all the effort here and apologies for the delay in getting this all merged in. I'll likely do a little tidy up and try to up coverage again but will look to get a pre-release cut for this shortly so that you and others can begin testing. @vanzin thanks for looking over the code as well 🙏 |
…oading the Postgres archive (fergusstrange#105) * Add sync.Mutex to prevent collisions * * add atomic download, and use defer to ensure mutex unlock * * move mutex to global * * fix tests * * update platform-test * * update examples * * remove atomic dependency * * remove code duplication * * reduce test parallel run count * * fix race condition in tests * * run tests * * attempt to fix windows * * attempt a different solution for windows * * revert changes * * try additional fix for windows * * add another test * * catch syscall.EEXIST * * fix test * * fix test * * add extra debugging * * attempt to fix windows * * add additional error message * * fix race in decompression * * more fixes * * use atomic * * add extra debug * * try catching the error * * try different permissions * * add more debugging * * more debug * * more debug * * test dest * * attempt to close temp file * * simplify * * remove atomic * * clean up code * * add more tests * * clean up temporary files * * prevent file being opened twice
Related to: #96
os.Rename
to move into the correct position. As this is an atomic operation this should prevent corruption on Unix machines. (On windows devices then this will fail if the destination already exists, but we can ignore the error, as we know the source and destination should be the same)sync.Mutex
to lock the resource before attempting to check for the cache, to prevent duplicate downloads within the same process.