Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a3ababd

Browse files
Al Virogregkh
authored andcommitted
fix mntput/mntput race
commit 9ea0a46 upstream. mntput_no_expire() does the calculation of total refcount under mount_lock; unfortunately, the decrement (as well as all increments) are done outside of it, leading to false positives in the "are we dropping the last reference" test. Consider the following situation: * mnt is a lazy-umounted mount, kept alive by two opened files. One of those files gets closed. Total refcount of mnt is 2. On CPU 42 mntput(mnt) (called from __fput()) drops one reference, decrementing component * After it has looked at component #0, the process on CPU 0 does mntget(), incrementing component #0, gets preempted and gets to run again - on CPU 69. There it does mntput(), which drops the reference (component hardkernel#69) and proceeds to spin on mount_lock. * On CPU 42 our first mntput() finishes counting. It observes the decrement of component hardkernel#69, but not the increment of component #0. As the result, the total it gets is not 1 as it should've been - it's 0. At which point we decide that vfsmount needs to be killed and proceed to free it and shut the filesystem down. However, there's still another opened file on that filesystem, with reference to (now freed) vfsmount, etc. and we are screwed. It's not a wide race, but it can be reproduced with artificial slowdown of the mnt_get_count() loop, and it should be easier to hit on SMP KVM setups. Fix consists of moving the refcount decrement under mount_lock; the tricky part is that we want (and can) keep the fast case (i.e. mount that still has non-NULL ->mnt_ns) entirely out of mount_lock. All places that zero mnt->mnt_ns are dropping some reference to mnt and they call synchronize_rcu() before that mntput(). IOW, if mntput() observes (under rcu_read_lock()) a non-NULL ->mnt_ns, it is guaranteed that there is another reference yet to be dropped. Reported-by: Jann Horn <[email protected]> Tested-by: Jann Horn <[email protected]> Fixes: 48a066e ("RCU'd vsfmounts") Cc: [email protected] Signed-off-by: Al Viro <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent ba74414 commit a3ababd

File tree

1 file changed

+12
-2
lines changed

1 file changed

+12
-2
lines changed

fs/namespace.c

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1124,12 +1124,22 @@ static DECLARE_DELAYED_WORK(delayed_mntput_work, delayed_mntput);
11241124
static void mntput_no_expire(struct mount *mnt)
11251125
{
11261126
rcu_read_lock();
1127-
mnt_add_count(mnt, -1);
1128-
if (likely(mnt->mnt_ns)) { /* shouldn't be the last one */
1127+
if (likely(READ_ONCE(mnt->mnt_ns))) {
1128+
/*
1129+
* Since we don't do lock_mount_hash() here,
1130+
* ->mnt_ns can change under us. However, if it's
1131+
* non-NULL, then there's a reference that won't
1132+
* be dropped until after an RCU delay done after
1133+
* turning ->mnt_ns NULL. So if we observe it
1134+
* non-NULL under rcu_read_lock(), the reference
1135+
* we are dropping is not the final one.
1136+
*/
1137+
mnt_add_count(mnt, -1);
11291138
rcu_read_unlock();
11301139
return;
11311140
}
11321141
lock_mount_hash();
1142+
mnt_add_count(mnt, -1);
11331143
if (mnt_get_count(mnt)) {
11341144
rcu_read_unlock();
11351145
unlock_mount_hash();

0 commit comments

Comments
 (0)