Description
Summary
I think we have got to the stage we should support threading in RustPython.
Detailed Explanation
There was already some discussion on threading but I wanted to start a dedicated issue as it is a big decision. I will try to break the issue in order to make some order:
Definition of done
We need a user to be able to create a new thread in python and execute code on the same Python object from multiple threads:
import threading
def worker(num):
"""thread worker function"""
print 'Worker: %s' % num
return
threads = []
for i in range(5):
t = threading.Thread(target=worker, args=(i,))
threads.append(t)
t.start()
GIL
I think that one of the biggest discussion is the GIL. Should we add a GIL to RustPython? Should we allow only one thread to use each object at a time?
Suggested approach
I suggest the following changes in order to reach the ability to spawn a new thread:
- I suggest we will create a new VirtualMachine for each thread. Thus making VirtualMachine
!sync
and!Send
. - Use
Arc
instead ofRc
inPyObjectRef
.pub type PyObjectRef = Arc<PyObject<dyn PyObjectPayload>>
PyValue
andPyObjectPayload
traits will implementSync
andSend
. Thus forcing us to make all Py* structs sync as well.- We will need to convert most of the code
Rc
,Cell
andRefCell
use to thread safe options. For exampleArc
,Mutex
andAtomic*
. - When accessing data of an Object wrapped in
Mutex
we will lock the mutex for our use. This will require careful handling when using internal objects to avoid deadlocks when used between threads.
A simple example of the start_new_thread
method in _thread
will look something like:
fn start_new_thread(func: PyFunctionRef, args: PyFuncArgs, vm: &VirtualMachine) -> u64 {
let handle = thread::spawn(move || {
let thread_vm = VirtualMachine::new(PySettings::default()); // Should get some params from original VM
thread_vm.invoke(func.as_object(), args).unwrap();
});
get_id(handle.thread())
}
Drawbacks, Rationale, and Alternatives
-
I suggest that we will ignore the GIL and allow multiple threads to execute python code simultaneously. This can be a big advantage of RustPython on CPython. The user will be expected to make his own code thread safe. This might prove problematic to code that rely on the GIL.
-
What about third party library's? We do not yet have an API but would we force them to implement
Sync
andSend
as well? -
There was a lot of talk about alternatives and I cannot find all so please fill free to add in the comments. One alternatives that was suggested is Crossbream.
Unresolved Questions
- How can we do this change in small steps?
- How to test for deadlocks?
- Is this the right time to do this?