Unify fd_readdir impl between *nixes#613
Conversation
This commit unifies the implementation of `fd_readdir` between Linux and BSD hosts. In particular, it re-uses the `Dirent`, `Entry`, and `Dir` (among others) building blocks introduced recently when `fd_readdir` was being implemented on Windows. Notable changes: * on BSD, wraps `readdir` syscall in an `Iterator` of the mutex-locked `Dir` struct * on BSD, removes `DirStream` struct from `OsFile`; `OsFile` now holds a mutex to `Dir` * makes `Dir` iterators implementation specific (Linux has its own, and so does BSD)
3e1481e to
fc23cde
Compare
| // descriptor, or to modify the state of the associated description other | ||
| // than by means of closedir(), readdir(), readdir_r(), or rewinddir(), | ||
| // the behaviour is undefined. | ||
| let fd = (*os_file).try_clone()?; |
There was a problem hiding this comment.
If osfile.dir already is Some, then you're making a useless syscall for dup with a throwaway result.
There was a problem hiding this comment.
Could you elaborate? This leg is only executed if os_file.dir is not set, i.e., None, so I'm not sure what you mean here.
There was a problem hiding this comment.
Oh, I focused on the get_or_insert too much :) It's fine then!
| // control of the system, and if any attempt is made to close the file | ||
| // descriptor, or to modify the state of the associated description other | ||
| // than by means of closedir(), readdir(), readdir_r(), or rewinddir(), | ||
| // the behaviour is undefined. |
There was a problem hiding this comment.
We seem to be mostly using the American English variant behavior, so I think it's best to choose one variant and stay consistent :)
There was a problem hiding this comment.
As a British citizen I have to respectfully decline :-P
| // new items may not be returned to the caller. | ||
| if cookie == wasi::__WASI_DIRCOOKIE_START { | ||
| log::trace!(" | fd_readdir: doing rewinddir"); | ||
| dir.lock().unwrap().rewind(); |
There was a problem hiding this comment.
Please lock the mutex only once.
| })) | ||
| } | ||
|
|
||
| struct DirIter<'a>(MutexGuard<'a, Dir>); |
There was a problem hiding this comment.
Could we possibly have a common DirIter implementation for all *nixes?
On Linux, nothing will prevent the caller to execute concurrent fd_readdir's on the same directory stream, so I don't see why we should prevent it on BSD. In particular readdir(2) on Linux says: (emphasis added)
In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function
This means, that code which doesn't have do any synchronization itself for readdiring a single directory stram from multiple threads will be invalid anyway, so we can probably get away with not having mutexes here. Unfortunately, this would probably require adding some unsafe code, but we might want to do it for performance reasons.
In particular, when threads are introduced to WASI, thread-safety of various function will have to be evaluated.
There was a problem hiding this comment.
I think I'd prefer to leave it behind a Mutex as-is for now. This change was not introduced in this PR, but only refactored out from the actual implementation of fd_readdir on BSD that we've had until now. I'm OK with some code duplication between Linux and BSD, especially since on BSD we still need to recoved the current location using telldir as it's not available to be read directly from the dirent entry.
There was a problem hiding this comment.
Then we could possibly have a system-dependent implementation of Entry which would reuse the d_off value on Linux and call telldir on other nixes, or just call telldir everywhere (it's just a glibc call, so while it's not free, it's cheap).
But I'm fine with addressing this in a subsequent PR.
There was a problem hiding this comment.
Agreed to address this in a subsequent PR. :-)
| } | ||
|
|
||
| Ok(dir.into_iter().map(|entry| { | ||
| Ok(DirIter(dir).map(|entry| { |
There was a problem hiding this comment.
into_iter seems more idiomatic to me, is there any particular reason why you switched to literal DirIter?
There was a problem hiding this comment.
Since it's an internal structure, I wanted to keep DirIter private to the module. By implementing an IntoIterator for Dir we have to leak the struct outside of the module we seems like a lot of noise for little to be gained.
| if errno != Errno::last() { | ||
| // According to POSIX man (for Linux though!), there was an error | ||
| // if the errno value has changed at some point during the sequence | ||
| // of readdir calls |
There was a problem hiding this comment.
Linux man says something different:
To distinguish end of stream and from an error, set errno to zero before calling readdir() and then check the value of errno if NULL is returned.
POSIX says the same:
Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred.
There was a problem hiding this comment.
Interesting, since this reference says something different:
If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.
Also, please note that this bit of code was introduced in 4982878 as a fix to a subtle bug that only emerged when testing the crate in release.
There was a problem hiding this comment.
The version I quoted is from
This page is part of release 5.03 of the Linux man-pages project.
and
This is POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.
respectively.
There was a problem hiding this comment.
Hmm, interesting. Perhaps this is a BSD-only thing (we won't know for sure until we rewrite fd_readdir on Linux using readdir rather than readdir_r. Anyhow, as a compromise, I can remove the mention of Linux in the comment.
| pub(crate) struct OsFile { | ||
| pub(crate) file: fs::File, | ||
| pub(crate) dir_stream: Option<Mutex<DirStream>>, | ||
| pub(crate) dir: Option<Mutex<Dir>>, |
There was a problem hiding this comment.
Please add a comment why we're storing it in OsFile (or mentioned where the reason is described in more detail)
There was a problem hiding this comment.
Can you add a reference that BSD nixes require the client to do the caching?
There was a problem hiding this comment.
Sure thing. It's actually taken directly from the man pages. Let me add that in.
| pub(crate) struct OsFile { | ||
| pub(crate) file: fs::File, | ||
| pub(crate) dir_stream: Option<Mutex<DirStream>>, | ||
| pub(crate) dir: Option<Mutex<Dir>>, |
There was a problem hiding this comment.
Can you add a reference that BSD nixes require the client to do the caching?
|
I'm gonna go ahead and merge this so that I don't necessarily block other avenues including #520. Feel free to reopen or submit an issue if you feel something is amiss though! |
DO NOT MERGE until wasm-tools release with bytecodealliance/wasm-tools#2453 Points wasm-tools to PR branch `wasmparser-implements` Add support for the component model `[implements=<I>]L` (spec PR [bytecodealliance#613](WebAssembly/component-model#613)), which allows components to import/export the same interface multiple times under different plain names. A component can import the same interface twice under different labels, each bound to a distinct host implementation: ```wit import primary: wasi:keyvalue/store; import secondary: wasi:keyvalue/store; ``` Guest code sees two separate namespaces with identical shapes: ```rust let val = primary::get("my-key"); // calls the primary store let val = secondary::get("my-key"); // calls the secondary store ``` From the host, wit-bindgen generates a separate Host trait per label: ```rust impl primary::Host for MyState { fn get(&mut self, key: String) -> String { self.primary_db.get(&key).cloned().unwrap_or_default() } } impl secondary::Host for MyState { fn get(&mut self, key: String) -> String { self.secondary_db.get(&key).cloned().unwrap_or_default() } } primary::add_to_linker(&mut linker, |state| state)?; secondary::add_to_linker(&mut linker, |state| state)?; ``` The linker also supports registering by plain label without knowing the annotation: ```rust // Component imports [implements=<wasi:keyvalue/store>]primary // but the host just registers "primary" — label fallback handles it linker.root().instance("primary")?.func_wrap("get", /* ... */)?; ``` Users can also register to the linker with the full encoded `implements` name ```rust let mut linker = Linker::<()>::new(engine); linker .root() .instance("[implements=<wasi:keyvalue/store>]primary")? .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?; ``` Semver matching works inside the implements annotation, just like regular interface imports: ```rust // Host provides v1.0.1 linker .root() .instance("[implements=<wasi:keyvalue/[email protected]>]primary")? .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?; // Component requests v1.0.0, matches via semver let component = Component::new(&engine, r#"(component (type $store (instance (export "get" (func (param "key" string) (result string))) )) (import "[implements=<wasi:keyvalue/[email protected]>]primary" (instance (type $store))) )"#)?; linker.instantiate(&mut store, &component)?; // works, 1.0.1 is semver-compatible with 1.0.0 ``` ## Changes ### Runtime name resolution - Add three-tier lookup in NameMap::get: exact → semver → label fallback - Add implements_label_key() helper for extracting plain labels from `[implements=<I>]L` - Add unit tests for all lookup tiers ### Code generation for multi-import/export - Track first-seen implements imports/exports per `InterfaceId` - Duplicate imports: re-export types via `pub use super::{first}::*`, generate fresh Host trait + add_to_linker - Duplicate exports: same pattern with fresh Guest/GuestIndices, plus regenerate resource wrapper structs to reference the local Guest type - Use `name_world_key_with_item` for export instance name lookups - Guard `populate_world_and_interface_options` with `entry()` to avoid overwriting link options for duplicate interfaces
DO NOT MERGE until wasm-tools release with bytecodealliance/wasm-tools#2453 Points wasm-tools to PR branch `wasmparser-implements` Add support for the component model `[implements=<I>]L` (spec PR [bytecodealliance#613](WebAssembly/component-model#613)), which allows components to import/export the same interface multiple times under different plain names. A component can import the same interface twice under different labels, each bound to a distinct host implementation: ```wit import primary: wasi:keyvalue/store; import secondary: wasi:keyvalue/store; ``` Guest code sees two separate namespaces with identical shapes: ```rust let val = primary::get("my-key"); // calls the primary store let val = secondary::get("my-key"); // calls the secondary store ``` Host Import-side codegen: shared trait + label-parameterized add_to_linker For imports, wit-bindgen generates one Host trait per interface (not per label). The add_to_linker function takes a name: &str parameter so the same trait implementation can be registered under different instance labels. Duplicate implements imports don't generate separate modules — only the first import produces bindings. ```rust struct PrimaryBackend; impl primary::Host for PrimaryBackend { fn get(&mut self, key: String) -> String { self.primary_db.get(&key).cloned().unwrap_or_default() } } struct SecondaryBackend; impl primary::Host for SecondaryBackend { fn get(&mut self, key: String) -> String { self.secondary_db.get(&key).cloned().unwrap_or_default() } } // Same add_to_linker, different labels and host_getter closures primary::add_to_linker(&mut linker, "primary", |s| &mut s.primary)?; primary::add_to_linker(&mut linker, "secondary", |s| &mut s.secondary)?; ``` Export-side codegen: per-label modules with shared types For exports, each label gets its own module with fresh Guest/GuestIndices types but re-exports shared interface types from the first module via `pub use super::{first}::*`. Runtime name resolution The linker supports registering by plain label without knowing the annotation: ```rust // Component imports [implements=<wasi:keyvalue/store>]primary // but the host just registers "primary" — label fallback handles it linker.root().instance("primary")?.func_wrap("get", /* ... */)?; Users can also register to the linker with the full encoded implements name: linker .root() .instance("[implements=<wasi:keyvalue/store>]primary")? .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?; ``` Semver matching works inside the implements annotation, just like regular interface imports: ```rust // Host provides v1.0.1 linker .root() .instance("[implements=<wasi:keyvalue/[email protected]>]primary")? .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?; // Component requests v1.0.0, matches via semver let component = Component::new(&engine, r#"(component (type $store (instance (export "get" (func (param "key" string) (result string))) )) (import "[implements=<wasi:keyvalue/[email protected]>]primary" (instance (type $store))) )"#)?; linker.instantiate(&mut store, &component)?; // works, 1.0.1 is semver-compatible with 1.0.0 ``` - Add three-tier lookup in NameMap::get: exact → semver → label fallback - Add implements_label_key() helper for extracting plain labels from `[implements=<I>]L` - Add unit tests for all lookup tiers - Track first-seen implements imports per `InterfaceId` - One `Host` trait per interface; `generate_add_to_linker` takes `named: bool` — when true, emits `name: &str` parameter instead of hardcoding the instance name - Duplicate `implements` imports: just record the label in `implements_labels`, no module generation - `world_add_to_linker`: iterate over `implements_labels` to emit one `add_to_linker` call per label, passing label as name argument - Guard `populate_world_and_interface_options` with `entry()` to avoid overwriting link options for duplicate interfaces - Duplicate exports: re-export types via `pub use super::{first}::*`, generate fresh `Guest`/`GuestIndices`, plus regenerate resource wrapper structs to reference the local `Guest` type - Use `name_world_key_with_item` for export instance name lookups
This commit unifies the implementation of
fd_readdirbetween Linuxand BSD hosts. In particular, it re-uses the
Dirent,Entry, andDir(among others) building blocks introduced recently whenfd_readdirwas being implemented on Windows.Notable changes:
readdirsyscall in anIteratorof the mutex-lockedDirstructDirStreamstruct fromOsFile;OsFilenow holds amutex to
DirDiriterators implementation specific (Linux has its own,and so does BSD)
cc @marmistrz
Note: this PR depends on #612 as a by-product of me accidentally disabling the WASI tests in #600 (sorry!).This has now been sorted out.