Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add stop_fuse_from_another_thread in fuse_api #14

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

touilleMan
Copy link
Contributor

Implement #13, the thing is pretty rough right now (beside first time I'm working on a cython-based project) so review appreciated ;-)

Copy link
Contributor

@Nikratio Nikratio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! This is the right approach, but you overengineered it a bit. See detailed comments.

Please also update the documentation and ChangeLog, and add "Fixes: #nnn" to the commit message so that merging it fixes the bug that you opened about this.

src/fuse_api.pxi Outdated
@@ -467,6 +467,25 @@ cdef session_loop_mt(workers):
stdlib.free(wd)


def stop_fuse_from_another_thread():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a rather awkward name. It doesn't match the convention for the other names, and is misleading since you can call the method from any thread. Just call it "stop". Same for the docstring (which doesn't specify what an "other thread" is).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the name is pretty poor. My concern was there is already a close method which can be confused with a stop one (from my understanding, I would say close is more a clean given is is called after main returns, of course renaming close is a no-go because it would break the API).

So maybe we should name it stop_main to disambiguate ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think stop is perfect :-P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so be it ^^

src/fuse_api.pxi Outdated
def stop_fuse_from_another_thread():
'''Stop the FUSE main loop from another thread.
'''
res = pthread_mutex_lock(&exc_info_mutex)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not doing anything with exc_info (which is what this mutex protects), so no need to acquire it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -0,0 +1,115 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of duplicated code here. I think it should be sufficient to add a single new method to test_fs.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.
However most of the duplicated code due to the fact the vanilla version consider the fuse fs will be started in a separate process where here we use a thread (wait_for_mount takes a process as argument, test_fs.Fs takes a cross_process).
Do you think there is a way to mock those attributes or to make them optional ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is good reason why the test is using a separate process, the risk of deadlocks is way too high. That applies to this test as well, just call stop() in the other process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked the test to use test_fs.py's testfs fixture (I had to modify it a bit to make the process start a thread monitoring an event to call stop, given multiprocessing.Event cannot be serialized by Manager.Namespace I had to create a need_stop attribute along with cross_process)

Copy link
Contributor

@Nikratio Nikratio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please take a look at the unit tests. They're currently failing, but it shouldn't be hard to see why :-)

thread.start()
wait_for_mount(mountpoint)

llfuse.schedule_close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

src/fuse_api.pxi Outdated
def stop():
'''Stop the FUSE main loop.

This function is thread safe. Note once the `main` function has returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot a "that" ("Note that ..."). Or just omit the preable and say "Once the main function.."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

mount_process = mp.Process(target=run_fs,
args=(mnt_dir, cross_process))
args=(mnt_dir, cross_process, need_stop))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need_stop is already an attribute of cross_process, why do you pass it in separately as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross_process.need_stop = need_stop doesn't work because Event type cannot be serialized by this a Namespace. I forgot to remove this buggy line ;-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but why does it even need to be a mgr.Event()? Just make it a boolean :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A boolean wouldn't be convenient given _stop_watcher has to wait for it to know when it should stop the fuse server.

os.stat(path)
assert not fs_state.lookup_called

need_stop.set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that you really only need the first and last line of this function. What is the purpose of the others?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the fs accesses are not strictly needed but make sure the fs is up and running (i.e. not in a "process-just-spawned" state). I can remove this if you think it's useless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, but let's just use a single os.stat call and add a "Ensure FS is ready" comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

src/fuse_api.pxi Outdated
def stop():
'''Stop the FUSE main loop.

This function is thread safe. Note once the `main` function has returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either "Note that once the main..." or just "Once the main function.."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(re)done ^^

need_stop.wait()
llfuse.stop()

threading.Thread(target=_stop_watcher).start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can replace the separate thread by checking the state of cross_process.need_stop in the getattr handler. The threads are fundamentally the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the point oftest_call_stop_from_another_thread is precisely to trigger the end of the fuse server from another thread, if we check cross_process.need_stop from a thread handled fuse the usecase is different.

Copy link
Contributor

@Nikratio Nikratio Jul 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between a "thread handled by fuse" and "another thread"? It seems to me that the only difference is where in the code the thread is created, and what the entry function is. But why does that make any difference when calling stop()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean by "thread handled by fuse", is a thread that spend most of it time idle within libfuse waiting for a request coming from the fuse kernel driver.
The point of the stop function is to allow to tell this thread to stop waiting and give the hand back to Python.
From my tests, calling fuse_session_exit from a thread not handled by fuse is not enough to close the fuse server: a system call to the file system is needed to trigger the fuse thread which then will realize it should stop itself.

That why I fell calling stop from the fuse operation callback (so from the "fuse thread") or from a python thread not related to the fuse server are two different usecases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for thre delay! I think I understand now. What you want is that there are threads blocked on reading from the kernel - no matter who has started them. That is a valid concern. I think in practice it is already satisfied: unless you pass workers=1 to llfuse.main, python-llfuse starts 30 workers by default, so even if one thread is busy handling the getattr() request, there are 29 others that are probably waiting. In theory some other process could be issuing requests too, so that all 29 threads are busy - but in this situation having the getattr thread return before calling stop would not help either.

But maybe I am missing something, so let me make sure I understand you correctly. Are you saying that if you (1) remove the listdir call in stop and otherwise leave the changes as-is, the test fails (correctly indicating that the listdir is necessary), but if you (2) remove the listdir call and call stop in the threading handling the getattr request, the test succeds (even though it should not)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I'm the one sorry for the delay ;-)

As you suggested, I've tried to remove the listdir in the stop function:

$ py.test test/ -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 15 items                                                                                                                                                              

test/test_api.py::test_inquire_bits PASSED
test/test_api.py::test_listdir PASSED
test/test_api.py::test_sup_groups PASSED
test/test_api.py::test_entry_res PASSED
test/test_api.py::test_xattr PASSED
test/test_api.py::test_copy PASSED
test/test_examples.py::test_lltest PASSED
test/test_examples.py::test_tmpfs PASSED
test/test_examples.py::test_passthroughfs PASSED
test/test_fs.py::test_invalidate_entry PASSED
test/test_fs.py::test_invalidate_inode PASSED
test/test_fs.py::test_notify_store PASSED
test/test_fs.py::test_entry_timeout PASSED
test/test_fs.py::test_attr_timeout PASSED
test/test_fs.py::test_call_stop_from_another_thread PASSED
test/test_fs.py::test_call_stop_from_another_thread ERROR

==================================================================================== ERRORS =====================================================================================
____________________________________________________________ ERROR at teardown of test_call_stop_from_another_thread ____________________________________________________________
Traceback (most recent call last):
  File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 64, in testfs
    wait_for_mount_process_termination(mount_process)
  File "/home/emmanuel/projects/python-llfuse/test/util.py", line 134, in wait_for_mount_process_termination
    pytest.fail('mount process did not terminate')
  File "/home/emmanuel/projects/python-llfuse/venv/lib/python3.5/site-packages/_pytest/outcomes.py", line 97, in fail
    raise Failed(msg=msg, pytrace=pytrace)
Failed: mount process did not terminate
====================================================================== 15 passed, 1 error in 35.18 seconds ======================================================================

^CError in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 29, in poll
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

On the other hand, trying to call llfuse.stop() from the thread handling the getattr, I got a pretty weird error:

(venv) ~/projects/python-llfuse close-from-another-thread ±2*1615 ♦ py.test test/test_fs.py::test_call_stop_from_same_thread -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 1 item                                                                                                                                                                

test/test_fs.py::test_call_stop_from_same_thread in ==>  Namespace(attr_timeout=2, entry_timeout=2, getattr_called=True, lookup_called=False, need_stop=False, read_called=False)
PASSEDfusermount: entry for /tmp/pytest-of-emmanuel/pytest-174/test_call_stop_from_same_threa0 not found in /etc/mtab

test/test_fs.py::test_call_stop_from_same_thread ERROR

==================================================================================== ERRORS =====================================================================================
_____________________________________________________________ ERROR at teardown of test_call_stop_from_same_thread ______________________________________________________________
Traceback (most recent call last):
  File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 66, in testfs
    umount(mount_process, mnt_dir)
  File "/home/emmanuel/projects/python-llfuse/test/util.py", line 117, in umount
    subprocess.check_call(['fusermount', '-z', '-u', mnt_dir])
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['fusermount', '-z', '-u', '/tmp/pytest-of-emmanuel/pytest-174/test_call_stop_from_same_threa0']' returned non-zero exit status 1
======================================================================= 1 passed, 1 error in 0.21 seconds =======================================================================

Note test_call_stop_from_same_thread is called only once even if pytest consider is has one SUCCESS and one ERROR...

patch.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a weird error. It means the test passes, but then the cleanup (which calls fusermount) fails. This is probably because the umount() races with the stop(). This is because without listdir() the fs threads remain blocked until the cleanup method accesses the mountpoint again. You should be able to confirm this by adding a long sleep in the cleanup method - the test should then block as it did in the first case.

In other words, as far as I can see this confirms that you can safely omit the separate thread and call stop() in getattr(). This will test all aspects of the functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to remove the thread part but doing that make the test hanging forever (and being pretty ressistent to ^C and even kill -9, I've dead processes and zombies all over due to busy mountpoint target...)

At first I thought the trouble was, as you mentionned before, the fact that llfuse is started with a single worker, but specifying multiple worker doesn't change anything.

(not sure why, but github don't want me to add the patch as enclosed file, so I'm dumping it here)

diff --git a/test/test_fs.py b/test/test_fs.py
index 78e5812..321838c 100755
--- a/test/test_fs.py
+++ b/test/test_fs.py
@@ -18,8 +18,6 @@ if __name__ == '__main__':
 
 import llfuse
 from llfuse import FUSEError
-from threading import Thread
-from unittest.mock import Mock
 import multiprocessing
 import os
 import errno
@@ -48,25 +46,24 @@ def testfs(tmpdir):
     mnt_dir = str(tmpdir)
     with mp.Manager() as mgr:
         cross_process = mgr.Namespace()
-        need_stop = mgr.Event()
         mount_process = mp.Process(target=run_fs,
-                                   args=(mnt_dir, cross_process, need_stop))
+                                   args=(mnt_dir, cross_process))
 
         mount_process.start()
         try:
             wait_for_mount(mount_process, mnt_dir)
-            yield (mnt_dir, cross_process, need_stop)
+            yield (mnt_dir, cross_process)
         except:
             cleanup(mnt_dir)
             raise
         else:
-            if need_stop.is_set():
+            if cross_process.need_stop:
                 wait_for_mount_process_termination(mount_process)
             else:
                 umount(mount_process, mnt_dir)
 
 def test_invalidate_entry(testfs):
-    (mnt_dir, fs_state, _) = testfs
+    (mnt_dir, fs_state) = testfs
     path = os.path.join(mnt_dir, 'message')
     os.stat(path)
     assert fs_state.lookup_called
@@ -87,7 +84,7 @@ def test_invalidate_entry(testfs):
     assert wait_for(check)
 
 def test_invalidate_inode(testfs):
-    (mnt_dir, fs_state, _) = testfs
+    (mnt_dir, fs_state) = testfs
     with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
         assert fh.read() == 'hello world\n'
         assert fs_state.read_called
@@ -110,7 +107,7 @@ def test_invalidate_inode(testfs):
         assert wait_for(check)
 
 def test_notify_store(testfs):
-    (mnt_dir, fs_state, _) = testfs
+    (mnt_dir, fs_state) = testfs
     with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
         llfuse.setxattr(mnt_dir, 'command', b'store')
         fs_state.read_called = False
@@ -118,7 +115,7 @@ def test_notify_store(testfs):
         assert not fs_state.read_called
 
 def test_entry_timeout(testfs):
-    (mnt_dir, fs_state, _) = testfs
+    (mnt_dir, fs_state) = testfs
     fs_state.entry_timeout = 1
     path = os.path.join(mnt_dir, 'message')
 
@@ -134,7 +131,7 @@ def test_entry_timeout(testfs):
     assert fs_state.lookup_called
 
 def test_attr_timeout(testfs):
-    (mnt_dir, fs_state, _) = testfs
+    (mnt_dir, fs_state) = testfs
     fs_state.attr_timeout = 1
     with open(os.path.join(mnt_dir, 'message'), 'r') as fh:
         os.fstat(fh.fileno())
@@ -148,14 +145,15 @@ def test_attr_timeout(testfs):
         os.fstat(fh.fileno())
         assert fs_state.getattr_called
 
-def test_call_stop_from_another_thread(testfs):
-    (mnt_dir, fs_state, need_stop) = testfs
+def test_call_stop(testfs):
+    (mnt_dir, fs_state) = testfs
     path = os.path.join(mnt_dir, 'message')
 
-    # Ensure FS is ready
+    fs_state.need_stop = True
     os.stat(path)
 
-    need_stop.set()
+    with pytest.raises(OSError):
+        os.stat(path)
 
 class Fs(llfuse.Operations):
     def __init__(self, cross_process):
@@ -166,6 +164,7 @@ class Fs(llfuse.Operations):
         self.status = cross_process
         self.lookup_cnt = 0
         self.status.getattr_called = False
+        self.status.need_stop = False
         self.status.lookup_called = False
         self.status.read_called = False
         self.status.entry_timeout = 2
@@ -193,6 +192,10 @@ class Fs(llfuse.Operations):
         entry.attr_timeout = self.status.attr_timeout
 
         self.status.getattr_called = True
+
+        if self.status.need_stop:
+            llfuse.stop()
+
         return entry
 
     def forget(self, inode_list):
@@ -246,8 +249,7 @@ class Fs(llfuse.Operations):
         else:
             raise FUSEError(errno.EINVAL)
 
-def run_fs(mountpoint, cross_process, need_stop):
-
+def run_fs(mountpoint, cross_process):
     # Logging (note that we run in a new process, so we can't
     # rely on direct log capture and instead print to stdout)
     root_logger = logging.getLogger()
@@ -260,17 +262,11 @@ def run_fs(mountpoint, cross_process, need_stop):
     root_logger.addHandler(handler)
     root_logger.setLevel(logging.DEBUG)
 
-    def _stop_watcher():
-        need_stop.wait()
-        llfuse.stop()
-
-    threading.Thread(target=_stop_watcher).start()
-
     testfs = Fs(cross_process)
     fuse_options = set(llfuse.default_options)
     fuse_options.add('fsname=llfuse_testfs')
     llfuse.init(testfs, mountpoint, fuse_options)
     try:
-        llfuse.main(workers=1)
+        llfuse.main(workers=2)
     finally:
         llfuse.close()

Worse than that: I've retried the "call stop from getattr thread" test from my previous post (applying the patch I've submitted), and I ended up with the same kind of trouble (fuse totally blocked, preventing processes from being stopped even with SIGKILL)

(venv) ~/projects/python-llfuse close-from-another-thread ±2*1617 ♦ py.test test/test_fs.py::test_call_stop_from_same_thread -s
============================================================================== test session starts ==============================================================================
platform linux -- Python 3.5.2, pytest-3.6.3, py-1.5.4, pluggy-0.6.0 -- /home/emmanuel/projects/python-llfuse/venv/bin/python3
cachedir: test/.pytest_cache
rootdir: /home/emmanuel/projects/python-llfuse/test, inifile: pytest.ini
collected 1 item                                                                                                                                                                

test/test_fs.py::test_call_stop_from_same_thread should have stopped


^CException in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/emmanuel/projects/python-llfuse/test/test_fs.py", line 277, in _stop_watcher
    need_stop.wait()
  File "/usr/lib/python3.5/multiprocessing/managers.py", line 988, in wait
    return self._callmethod('wait', (timeout,))
  File "/usr/lib/python3.5/multiprocessing/managers.py", line 717, in _callmethod
    kind, result = conn.recv()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

^C^C^C^C^Z^Z^Z

At this point I'm very puzzled :(
My intuition is still that doing a syscall on a fuse mountpoint from one of it own threads is not ideal (recursive call is more of a special case, the real usecase is to call stop from an arbitrary 3rd party thread).
On the other hand, it should be something possible (or, if not, we should specify this in the documentation) which is currently not the case.

@touilleMan
Copy link
Contributor Author

@Nikratio just a friendly reminder I've updated the PR (but no rush on this really 😃 )

@Nikratio
Copy link
Contributor

Nikratio commented Nov 3, 2018

Sorry for the delay! I'm somewhat confused about the current state. Is everything working now? Your last comment mentioned a lot of sudden problems.

@Nikratio
Copy link
Contributor

Friendly ping..

@ThomasWaldmann
Copy link
Collaborator

ping - this PR is getting rather old. can it be finished?

@ThomasWaldmann
Copy link
Collaborator

ping

@ThomasWaldmann
Copy link
Collaborator

ThomasWaldmann commented May 25, 2022

Seems like this one is stuck. Close?

@ThomasWaldmann
Copy link
Collaborator

No progress since >4y, guess we must close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants