Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Filesystem remove() function can fail when run concurrently #27578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danepowell opened this issue Jun 11, 2018 · 11 comments · Fixed by #40144
Closed

Filesystem remove() function can fail when run concurrently #27578

danepowell opened this issue Jun 11, 2018 · 11 comments · Fixed by #40144

Comments

@danepowell
Copy link
Contributor

danepowell commented Jun 11, 2018

Symfony version(s) affected: 3.4.11

Description
If you invoke the Filesystem remove() method on a directory multiple times concurrently, it can fail on certain filesystems with the error:
Failed to remove directory "/home/foo/.drush/cache/bar": rmdir(/home/foo/.drush/cache/bar): Directory not empty

How to reproduce
I'm currently regularly encountering this using Drush and a Gluster filesystem. I have a script that calls drush cc drush, which invokes remove() on a directory that's hosted on Gluster.

Here's where Drush calls remove(): https://github.com/drush-ops/drush/blob/0e2abf43ad0d2f398a7afb23772c556a906d840d/src/Cache/FileCache.php#L132

When my script runs multiple times in parallel, it frequently fails with the above error.

I'm not totally sure if this is due to a race condition with Gluster, or if it's a race condition within Symfony that only becomes apparent when disk i/o is heavily throttled (as on a shared filesystem).

I've also seen this happen (albeit somewhat less frequently) on a mounted EC2 filesystem, so it's not just a Gluster problem.

Other folks have reported similar issues running Symfony's internal cache clear on shared filesystems, although I don't know if they were running cache clears concurrently: #2600

@nicolas-grekas
Copy link
Member

I'm not sure we can do much here. Would you have an idea solving this at the Filesystem component level? Instead, would you be able to put a lock around race-condition sensitive parts?

@danepowell
Copy link
Contributor Author

danepowell commented Jun 28, 2018

Here's an example of how I'm going to attempt to solve this for Drush:
https://github.com/drush-ops/drush/pull/3594/files

However, instead of every upstream library trying to implement that in its own way, I think it might make sense to implement it in the Filesystem, especially since Symfony itself has suffered tremendously from this same bug, which many people consider to still be unresolved: #2600

I think roughly the same approach could be applied as for Drush (basically, just drop a semaphore file in the directory that's about to be removed, and don't remove directories with that file)

If you're worried about adding complexity to the existing remove() function, it could be a separate safeRemove() function or something.

@danepowell
Copy link
Contributor Author

Through some testing I think I've discovered a little more about this problem.

I don't think the problem is multiple concurrent calls to remove(). I think the problem is when a process is creating files in a directory at the same time that Symfony is trying to remove it. This makes total sense, because Symfony enumerates all subdirectories and files before recursively deleting them, and this gives other processes time to create more files that would prevent deletion of the directory.

This could be a problem regardless of the underlying filesystem, but it's exacerbated on shared file systems simply because the deletion process takes longer and gives other processes more time to interfere.

I'd really love to see a way for Symfony to work around this. I'm assuming this kind of problem wouldn't happen if you simply used rm -rf to delete a directory, is there a technical reason Symfony couldn't do the same?

@xabbuh
Copy link
Member

xabbuh commented Jun 30, 2018

IIRC there is no way to do that without recursively removing the directory tree (except from using a native command). I could imagine though that we could mitigate the issue by renaming the directory to be removed first, couldn't we?

@danepowell
Copy link
Contributor Author

danepowell commented Jul 5, 2018

@xabbuh good idea, I tested a PoC of that approach and it seems to more or less work. You'd need to rename the directory in-place (as opposed to i.e. moving it to a tmp directory), since it turns out PHP has problems moving directories across devices, and even the Unix mv utility is vulnerable to this race condition, especially when moving directories across devices! This also means that Symfony would need to gracefully swallow exceptions during the rename.

@carsonbot
Copy link

Hey, thanks for your report!
There has not been a lot of activity here for a while. Is this bug still relevant? Have you managed to find a workaround?

@carsonbot
Copy link

Hello? This issue is about to be closed if nobody replies.

@xabbuh
Copy link
Member

xabbuh commented Jan 3, 2021

This is an actual issue. Our AppVeyor builds fails randomly because of this.

@carsonbot carsonbot removed the Stalled label Jan 3, 2021
@danepowell
Copy link
Contributor Author

Yeah definitely still an issue. I think the two most recently proposed solutions are still valid: rename the directory in-situ before deleting it, so other processes are less likely to write to it, or use OS-native commands (rm -rf) to remove directories.

@nicolas-grekas
Copy link
Member

rename the directory in-situ before deleting it

@danepowell would you like to give it a try in a PR?

@danepowell
Copy link
Contributor Author

I'm no longer in a position where I deal with this on a regular basis, so I'm afraid I can't commit much time to testing and validation. But here's at least a proof of concept solution: #39984

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants