-
Notifications
You must be signed in to change notification settings - Fork 29
Add stop_fuse_from_another_thread in fuse_api #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stop_fuse_from_another_thread in fuse_api #14
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! This is the right approach, but you overengineered it a bit. See detailed comments.
Please also update the documentation and ChangeLog, and add "Fixes: #nnn" to the commit message so that merging it fixes the bug that you opened about this.
src/fuse_api.pxi
Outdated
@@ -467,6 +467,25 @@ cdef session_loop_mt(workers): | |||
stdlib.free(wd) | |||
|
|||
|
|||
def stop_fuse_from_another_thread(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a rather awkward name. It doesn't match the convention for the other names, and is misleading since you can call the method from any thread. Just call it "stop". Same for the docstring (which doesn't specify what an "other thread" is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the name is pretty poor. My concern was there is already a close
method which can be confused with a stop
one (from my understanding, I would say close
is more a clean
given is is called after main
returns, of course renaming close
is a no-go because it would break the API).
So maybe we should name it stop_main
to disambiguate ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think stop
is perfect :-P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so be it ^^
src/fuse_api.pxi
Outdated
def stop_fuse_from_another_thread(): | ||
'''Stop the FUSE main loop from another thread. | ||
''' | ||
res = pthread_mutex_lock(&exc_info_mutex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not doing anything with exc_info (which is what this mutex protects), so no need to acquire it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
test/test_threaded.py
Outdated
@@ -0,0 +1,115 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of duplicated code here. I think it should be sufficient to add a single new method to test_fs.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree.
However most of the duplicated code due to the fact the vanilla version consider the fuse fs will be started in a separate process where here we use a thread (wait_for_mount
takes a process as argument, test_fs.Fs
takes a cross_process
).
Do you think there is a way to mock those attributes or to make them optional ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is good reason why the test is using a separate process, the risk of deadlocks is way too high. That applies to this test as well, just call stop() in the other process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked the test to use test_fs.py
's testfs
fixture (I had to modify it a bit to make the process start a thread monitoring an event to call stop
, given multiprocessing.Event
cannot be serialized by Manager.Namespace
I had to create a need_stop
attribute along with cross_process
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, please take a look at the unit tests. They're currently failing, but it shouldn't be hard to see why :-)
test/test_threaded.py
Outdated
thread.start() | ||
wait_for_mount(mountpoint) | ||
|
||
llfuse.schedule_close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/fuse_api.pxi
Outdated
def stop(): | ||
'''Stop the FUSE main loop. | ||
|
||
This function is thread safe. Note once the `main` function has returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you forgot a "that" ("Note that ..."). Or just omit the preable and say "Once the main function.."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mount_process = mp.Process(target=run_fs, | ||
args=(mnt_dir, cross_process)) | ||
args=(mnt_dir, cross_process, need_stop)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need_stop is already an attribute of cross_process, why do you pass it in separately as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cross_process.need_stop = need_stop
doesn't work because Event
type cannot be serialized by this a Namespace
. I forgot to remove this buggy line ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but why does it even need to be a mgr.Event()? Just make it a boolean :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A boolean wouldn't be convenient given _stop_watcher
has to wait for it to know when it should stop the fuse server.
os.stat(path) | ||
assert not fs_state.lookup_called | ||
|
||
need_stop.set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that you really only need the first and last line of this function. What is the purpose of the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the fs accesses are not strictly needed but make sure the fs is up and running (i.e. not in a "process-just-spawned" state). I can remove this if you think it's useless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, but let's just use a single os.stat
call and add a "Ensure FS is ready" comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/fuse_api.pxi
Outdated
def stop(): | ||
'''Stop the FUSE main loop. | ||
|
||
This function is thread safe. Note once the `main` function has returned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either "Note that once the main
..." or just "Once the main function.."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(re)done ^^
need_stop.wait() | ||
llfuse.stop() | ||
|
||
threading.Thread(target=_stop_watcher).start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can replace the separate thread by checking the state of cross_process.need_stop
in the getattr
handler. The threads are fundamentally the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say the point oftest_call_stop_from_another_thread
is precisely to trigger the end of the fuse server from another thread, if we check cross_process.need_stop
from a thread handled fuse the usecase is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between a "thread handled by fuse" and "another thread"? It seems to me that the only difference is where in the code the thread is created, and what the entry function is. But why does that make any difference when calling stop()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean by "thread handled by fuse", is a thread that spend most of it time idle within libfuse waiting for a request coming from the fuse kernel driver.
The point of the stop
function is to allow to tell this thread to stop waiting and give the hand back to Python.
From my tests, calling fuse_session_exit
from a thread not handled by fuse is not enough to close the fuse server: a system call to the file system is needed to trigger the fuse thread which then will realize it should stop itself.
That why I fell calling stop
from the fuse operation callback (so from the "fuse thread") or from a python thread not related to the fuse server are two different usecases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for thre delay! I think I understand now. What you want is that there are threads blocked on reading from the kernel - no matter who has started them. That is a valid concern. I think in practice it is already satisfied: unless you pass workers=1
to llfuse.main
, python-llfuse starts 30 workers by default, so even if one thread is busy handling the getattr() request, there are 29 others that are probably waiting. In theory some other process could be issuing requests too, so that all 29 threads are busy - but in this situation having the getattr
thread return before calling stop
would not help either.
But maybe I am missing something, so let me make sure I understand you correctly. Are you saying that if you (1) remove the listdir
call in stop
and otherwise leave the changes as-is, the test fails (correctly indicating that the listdir
is necessary), but if you (2) remove the listdir
call and call stop
in the threading handling the getattr
request, the test succeds (even though it should not)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm the one sorry for the delay ;-)
As you suggested, I've tried to remove the listdir
in the stop
function:
$ py.test test/ -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 15 items
test/test_api.py::test_inquire_bits PASSED
test/test_api.py::test_listdir PASSED
test/test_api.py::test_sup_groups PASSED
test/test_api.py::test_entry_res PASSED
test/test_api.py::test_xattr PASSED
test/test_api.py::test_copy PASSED
test/test_examples.py::test_lltest PASSED
test/test_examples.py::test_tmpfs PASSED
test/test_examples.py::test_passthroughfs PASSED
test/test_fs.py::test_invalidate_entry PASSED
test/test_fs.py::test_invalidate_inode PASSED
test/test_fs.py::test_notify_store PASSED
test/test_fs.py::test_entry_timeout PASSED
test/test_fs.py::test_attr_timeout PASSED
test/test_fs.py::test_call_stop_from_another_thread PASSED
test/test_fs.py::test_call_stop_from_another_thread ERROR
==================================================================================== ERRORS =====================================================================================
____________________________________________________________ ERROR at teardown of test_call_stop_from_another_thread ____________________________________________________________
Traceback (most recent call last):
File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 64, in testfs
wait_for_mount_process_termination(mount_process)
File "/home/emmanuel/projects/python-llfuse/test/util.py", line 134, in wait_for_mount_process_termination
pytest.fail('mount process did not terminate')
File "/home/emmanuel/projects/python-llfuse/venv/lib/python3.5/site-packages/_pytest/outcomes.py", line 97, in fail
raise Failed(msg=msg, pytrace=pytrace)
Failed: mount process did not terminate
====================================================================== 15 passed, 1 error in 35.18 seconds ======================================================================
^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt
On the other hand, trying to call llfuse.stop()
from the thread handling the getattr, I got a pretty weird error:
(venv) ~/projects/python-llfuse close-from-another-thread ±2*1615 ♦ py.test test/test_fs.py::test_call_stop_from_same_thread -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 1 item
test/test_fs.py::test_call_stop_from_same_thread in ==> Namespace(attr_timeout=2, entry_timeout=2, getattr_called=True, lookup_called=False, need_stop=False, read_called=False)
PASSEDfusermount: entry for /tmp/pytest-of-emmanuel/pytest-174/test_call_stop_from_same_threa0 not found in /etc/mtab
test/test_fs.py::test_call_stop_from_same_thread ERROR
==================================================================================== ERRORS =====================================================================================
_____________________________________________________________ ERROR at teardown of test_call_stop_from_same_thread ______________________________________________________________
Traceback (most recent call last):
File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 66, in testfs
umount(mount_process, mnt_dir)
File "/home/emmanuel/projects/python-llfuse/test/util.py", line 117, in umount
subprocess.check_call(['fusermount', '-z', '-u', mnt_dir])
File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['fusermount', '-z', '-u', '/tmp/pytest-of-emmanuel/pytest-174/test_call_stop_from_same_threa0']' returned non-zero exit status 1
======================================================================= 1 passed, 1 error in 0.21 seconds =======================================================================
Note test_call_stop_from_same_thread
is called only once even if pytest consider is has one SUCCESS and one ERROR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a weird error. It means the test passes, but then the cleanup (which calls fusermount) fails. This is probably because the umount() races with the stop(). This is because without listdir() the fs threads remain blocked until the cleanup method accesses the mountpoint again. You should be able to confirm this by adding a long sleep in the cleanup method - the test should then block as it did in the first case.
In other words, as far as I can see this confirms that you can safely omit the separate thread and call stop() in getattr(). This will test all aspects of the functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to remove the thread part but doing that make the test hanging forever (and being pretty ressistent to ^C and even kill -9
, I've dead processes and zombies all over due to busy mountpoint target...)
At first I thought the trouble was, as you mentionned before, the fact that llfuse is started with a single worker, but specifying multiple worker doesn't change anything.
(not sure why, but github don't want me to add the patch as enclosed file, so I'm dumping it here)
diff --git a/test/test_fs.py b/test/test_fs.py
index 78e5812..321838c 100755
--- a/test/test_fs.py
+++ b/test/test_fs.py
@@ -18,8 +18,6 @@ if __name__ == '__main__':
import llfuse
from llfuse import FUSEError
-from threading import Thread
-from unittest.mock import Mock
import multiprocessing
import os
import errno
@@ -48,25 +46,24 @@ def testfs(tmpdir):
mnt_dir = str(tmpdir)
with mp.Manager() as mgr:
cross_process = mgr.Namespace()
- need_stop = mgr.Event()
mount_process = mp.Process(target=run_fs,
- args=(mnt_dir, cross_process, need_stop))
+ args=(mnt_dir, cross_process))
mount_process.start()
try:
wait_for_mount(mount_process, mnt_dir)
- yield (mnt_dir, cross_process, need_stop)
+ yield (mnt_dir, cross_process)
except:
cleanup(mnt_dir)
raise
else:
- if need_stop.is_set():
+ if cross_process.need_stop:
wait_for_mount_process_termination(mount_process)
else:
umount(mount_process, mnt_dir)
def test_invalidate_entry(testfs):
- (mnt_dir, fs_state, _) = testfs
+ (mnt_dir, fs_state) = testfs
path = os.path.join(mnt_dir, 'message')
os.stat(path)
assert fs_state.lookup_called
@@ -87,7 +84,7 @@ def test_invalidate_entry(testfs):
assert wait_for(check)
def test_invalidate_inode(testfs):
- (mnt_dir, fs_state, _) = testfs
+ (mnt_dir, fs_state) = testfs
with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
assert fh.read() == 'hello world\n'
assert fs_state.read_called
@@ -110,7 +107,7 @@ def test_invalidate_inode(testfs):
assert wait_for(check)
def test_notify_store(testfs):
- (mnt_dir, fs_state, _) = testfs
+ (mnt_dir, fs_state) = testfs
with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
llfuse.setxattr(mnt_dir, 'command', b'store')
fs_state.read_called = False
@@ -118,7 +115,7 @@ def test_notify_store(testfs):
assert not fs_state.read_called
def test_entry_timeout(testfs):
- (mnt_dir, fs_state, _) = testfs
+ (mnt_dir, fs_state) = testfs
fs_state.entry_timeout = 1
path = os.path.join(mnt_dir, 'message')
@@ -134,7 +131,7 @@ def test_entry_timeout(testfs):
assert fs_state.lookup_called
def test_attr_timeout(testfs):
- (mnt_dir, fs_state, _) = testfs
+ (mnt_dir, fs_state) = testfs
fs_state.attr_timeout = 1
with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
os.fstat(fh.fileno())
@@ -148,14 +145,15 @@ def test_attr_timeout(testfs):
os.fstat(fh.fileno())
assert fs_state.getattr_called
-def test_call_stop_from_another_thread(testfs):
- (mnt_dir, fs_state, need_stop) = testfs
+def test_call_stop(testfs):
+ (mnt_dir, fs_state) = testfs
path = os.path.join(mnt_dir, 'message')
- # Ensure FS is ready
+ fs_state.need_stop = True
os.stat(path)
- need_stop.set()
+ with pytest.raises(OSError):
+ os.stat(path)
class Fs(llfuse.Operations):
def __init__(self, cross_process):
@@ -166,6 +164,7 @@ class Fs(llfuse.Operations):
self.status = cross_process
self.lookup_cnt = 0
self.status.getattr_called = False
+ self.status.need_stop = False
self.status.lookup_called = False
self.status.read_called = False
self.status.entry_timeout = 2
@@ -193,6 +192,10 @@ class Fs(llfuse.Operations):
entry.attr_timeout = self.status.attr_timeout
self.status.getattr_called = True
+
+ if self.status.need_stop:
+ llfuse.stop()
+
return entry
def forget(self, inode_list):
@@ -246,8 +249,7 @@ class Fs(llfuse.Operations):
else:
raise FUSEError(errno.EINVAL)
-def run_fs(mountpoint, cross_process, need_stop):
-
+def run_fs(mountpoint, cross_process):
# Logging (note that we run in a new process, so we can't
# rely on direct log capture and instead print to stdout)
root_logger = logging.getLogger()
@@ -260,17 +262,11 @@ def run_fs(mountpoint, cross_process, need_stop):
root_logger.addHandler(handler)
root_logger.setLevel(logging.DEBUG)
- def _stop_watcher():
- need_stop.wait()
- llfuse.stop()
-
- threading.Thread(target=_stop_watcher).start()
-
testfs = Fs(cross_process)
fuse_options = set(llfuse.default_options)
fuse_options.add('fsname=llfuse_testfs')
llfuse.init(testfs, mountpoint, fuse_options)
try:
- llfuse.main(workers=1)
+ llfuse.main(workers=2)
finally:
llfuse.close()
Worse than that: I've retried the "call stop from getattr thread" test from my previous post (applying the patch I've submitted), and I ended up with the same kind of trouble (fuse totally blocked, preventing processes from being stopped even with SIGKILL)
(venv) ~/projects/python-llfuse close-from-another-thread ±2*1617 ♦ py.test test/test_fs.py::test_call_stop_from_same_thread -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 1 item
test/test_fs.py::test_call_stop_from_same_thread should have stopped
^CException in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 277, in _stop_watcher
need_stop.wait()
File "/usr/lib/python3.5/multiprocessing/managers.py", line 988, in wait
return self._callmethod('wait', (timeout,))
File "/usr/lib/python3.5/multiprocessing/managers.py", line 717, in _callmethod
kind, result = conn.recv()
File "/usr/lib/python3.5/multiprocessing/connection.py", line 250, in recv
buf = self._recv_bytes()
File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib/python3.5/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
^C^C^C^C^Z^Z^Z
At this point I'm very puzzled :(
My intuition is still that doing a syscall on a fuse mountpoint from one of it own threads is not ideal (recursive call is more of a special case, the real usecase is to call stop
from an arbitrary 3rd party thread).
On the other hand, it should be something possible (or, if not, we should specify this in the documentation) which is currently not the case.
@Nikratio just a friendly reminder I've updated the PR (but no rush on this really 😃 ) |
Sorry for the delay! I'm somewhat confused about the current state. Is everything working now? Your last comment mentioned a lot of sudden problems. |
Friendly ping.. |
ping - this PR is getting rather old. can it be finished? |
ping |
Seems like this one is stuck. Close? |
No progress since >4y, guess we must close this. |
Implement #13, the thing is pretty rough right now (beside first time I'm working on a cython-based project) so review appreciated ;-)