Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unify fd_readdir impl between *nixes#613

Merged
kubkon merged 3 commits intobytecodealliance:masterfrom
kubkon:fd_readdir_bsd
Nov 24, 2019
Merged

Unify fd_readdir impl between *nixes#613
kubkon merged 3 commits intobytecodealliance:masterfrom
kubkon:fd_readdir_bsd

Conversation

@kubkon
Copy link
Member

@kubkon kubkon commented Nov 21, 2019

This commit unifies the implementation of fd_readdir between Linux
and BSD hosts. In particular, it re-uses the Dirent, Entry, and
Dir (among others) building blocks introduced recently when
fd_readdir was being implemented on Windows.

Notable changes:

  • on BSD, wraps readdir syscall in an Iterator of the mutex-locked
    Dir struct
  • on BSD, removes DirStream struct from OsFile; OsFile now holds a
    mutex to Dir
  • makes Dir iterators implementation specific (Linux has its own,
    and so does BSD)

cc @marmistrz

Note: this PR depends on #612 as a by-product of me accidentally disabling the WASI tests in #600 (sorry!). This has now been sorted out.

@kubkon kubkon added the wasi:impl Issues pertaining to WASI implementation in Wasmtime label Nov 21, 2019
This commit unifies the implementation of `fd_readdir` between Linux
and BSD hosts. In particular, it re-uses the `Dirent`, `Entry`, and
`Dir` (among others) building blocks introduced recently when
`fd_readdir` was being implemented on Windows.

Notable changes:
* on BSD, wraps `readdir` syscall in an `Iterator` of the mutex-locked
  `Dir` struct
* on BSD, removes `DirStream` struct from `OsFile`; `OsFile` now holds a
  mutex to `Dir`
* makes `Dir` iterators implementation specific (Linux has its own,
  and so does BSD)
// descriptor, or to modify the state of the associated description other
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(),
// the behaviour is undefined.
let fd = (*os_file).try_clone()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If osfile.dir already is Some, then you're making a useless syscall for dup with a throwaway result.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? This leg is only executed if os_file.dir is not set, i.e., None, so I'm not sure what you mean here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I focused on the get_or_insert too much :) It's fine then!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

// control of the system, and if any attempt is made to close the file
// descriptor, or to modify the state of the associated description other
// than by means of closedir(), readdir(), readdir_r(), or rewinddir(),
// the behaviour is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be mostly using the American English variant behavior, so I think it's best to choose one variant and stay consistent :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a British citizen I have to respectfully decline :-P

// new items may not be returned to the caller.
if cookie == wasi::__WASI_DIRCOOKIE_START {
log::trace!(" | fd_readdir: doing rewinddir");
dir.lock().unwrap().rewind();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please lock the mutex only once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}))
}

struct DirIter<'a>(MutexGuard<'a, Dir>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we possibly have a common DirIter implementation for all *nixes?

On Linux, nothing will prevent the caller to execute concurrent fd_readdir's on the same directory stream, so I don't see why we should prevent it on BSD. In particular readdir(2) on Linux says: (emphasis added)

In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated readdir_r(3) function

This means, that code which doesn't have do any synchronization itself for readdiring a single directory stram from multiple threads will be invalid anyway, so we can probably get away with not having mutexes here. Unfortunately, this would probably require adding some unsafe code, but we might want to do it for performance reasons.

In particular, when threads are introduced to WASI, thread-safety of various function will have to be evaluated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer to leave it behind a Mutex as-is for now. This change was not introduced in this PR, but only refactored out from the actual implementation of fd_readdir on BSD that we've had until now. I'm OK with some code duplication between Linux and BSD, especially since on BSD we still need to recoved the current location using telldir as it's not available to be read directly from the dirent entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we could possibly have a system-dependent implementation of Entry which would reuse the d_off value on Linux and call telldir on other nixes, or just call telldir everywhere (it's just a glibc call, so while it's not free, it's cheap).

But I'm fine with addressing this in a subsequent PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to address this in a subsequent PR. :-)

}

Ok(dir.into_iter().map(|entry| {
Ok(DirIter(dir).map(|entry| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

into_iter seems more idiomatic to me, is there any particular reason why you switched to literal DirIter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's an internal structure, I wanted to keep DirIter private to the module. By implementing an IntoIterator for Dir we have to leak the struct outside of the module we seems like a lot of noise for little to be gained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough!

if errno != Errno::last() {
// According to POSIX man (for Linux though!), there was an error
// if the errno value has changed at some point during the sequence
// of readdir calls
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux man says something different:

To distinguish end of stream and from an error, set errno to zero before calling readdir() and then check the value of errno if NULL is returned.

POSIX says the same:

Applications wishing to check for error situations should set errno to 0 before calling readdir(). If errno is set to non-zero on return, an error occurred.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, since this reference says something different:

If the end of the directory stream is reached, NULL is returned and errno is not changed. If an error occurs, NULL is returned and errno is set appropriately.

Also, please note that this bit of code was introduced in 4982878 as a fix to a subtle bug that only emerged when testing the crate in release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version I quoted is from

This page is part of release 5.03 of the Linux man-pages project.

and

This is POSIX.1-2008 with the 2013 Technical Corrigendum 1 applied.

respectively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting. Perhaps this is a BSD-only thing (we won't know for sure until we rewrite fd_readdir on Linux using readdir rather than readdir_r. Anyhow, as a compromise, I can remove the mention of Linux in the comment.

pub(crate) struct OsFile {
pub(crate) file: fs::File,
pub(crate) dir_stream: Option<Mutex<DirStream>>,
pub(crate) dir: Option<Mutex<Dir>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why we're storing it in OsFile (or mentioned where the reason is described in more detail)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference that BSD nixes require the client to do the caching?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. It's actually taken directly from the man pages. Let me add that in.

Copy link
Contributor

@marmistrz marmistrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

pub(crate) struct OsFile {
pub(crate) file: fs::File,
pub(crate) dir_stream: Option<Mutex<DirStream>>,
pub(crate) dir: Option<Mutex<Dir>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a reference that BSD nixes require the client to do the caching?

@kubkon
Copy link
Member Author

kubkon commented Nov 24, 2019

I'm gonna go ahead and merge this so that I don't necessarily block other avenues including #520. Feel free to reopen or submit an issue if you feel something is amiss though!

@kubkon kubkon merged commit c45f709 into bytecodealliance:master Nov 24, 2019
@kubkon kubkon deleted the fd_readdir_bsd branch November 24, 2019 09:29
ricochet added a commit to ricochet/wasmtime that referenced this pull request Feb 28, 2026
DO NOT MERGE until wasm-tools release with
bytecodealliance/wasm-tools#2453
Points wasm-tools to PR branch  `wasmparser-implements`

Add support for the component model `[implements=<I>]L`
(spec PR [bytecodealliance#613](WebAssembly/component-model#613)),
which allows components to import/export the same
interface multiple times under different plain names.

A component can import the same interface twice under different labels,
each bound to a distinct host implementation:

```wit
import primary: wasi:keyvalue/store;
import secondary: wasi:keyvalue/store;
```

Guest code sees two separate namespaces with identical shapes:

```rust
let val = primary::get("my-key");       // calls the primary store
let val = secondary::get("my-key");     // calls the secondary store
```

From the host, wit-bindgen generates a separate Host trait per label:

```rust
impl primary::Host for MyState {
    fn get(&mut self, key: String) -> String {
        self.primary_db.get(&key).cloned().unwrap_or_default()
    }
}

impl secondary::Host for MyState {
    fn get(&mut self, key: String) -> String {
        self.secondary_db.get(&key).cloned().unwrap_or_default()
    }
}

primary::add_to_linker(&mut linker, |state| state)?;
secondary::add_to_linker(&mut linker, |state| state)?;
```

The linker also supports registering by plain label without knowing the annotation:

```rust
// Component imports [implements=<wasi:keyvalue/store>]primary
// but the host just registers "primary" — label fallback handles it
linker.root().instance("primary")?.func_wrap("get", /* ... */)?;
```

Users can also register to the linker with the full encoded `implements` name

```rust
let mut linker = Linker::<()>::new(engine);

linker
    .root()
    .instance("[implements=<wasi:keyvalue/store>]primary")?
    .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?;
```

Semver matching works inside the implements annotation, just like regular interface imports:

```rust
// Host provides v1.0.1
linker
    .root()
    .instance("[implements=<wasi:keyvalue/[email protected]>]primary")?
    .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?;

// Component requests v1.0.0, matches via semver
let component = Component::new(&engine, r#"(component
    (type $store (instance
        (export "get" (func (param "key" string) (result string)))
    ))
    (import "[implements=<wasi:keyvalue/[email protected]>]primary" (instance (type $store)))
)"#)?;
linker.instantiate(&mut store, &component)?; // works, 1.0.1 is semver-compatible with 1.0.0
```

## Changes

### Runtime name resolution

- Add three-tier lookup in NameMap::get: exact → semver → label fallback
- Add implements_label_key() helper for extracting plain labels from `[implements=<I>]L`
- Add unit tests for all lookup tiers

### Code generation for multi-import/export

- Track first-seen implements imports/exports per `InterfaceId`
- Duplicate imports: re-export types via `pub use super::{first}::*`,
  generate fresh Host trait + add_to_linker
- Duplicate exports: same pattern with fresh Guest/GuestIndices,
  plus regenerate resource wrapper structs to reference the local Guest type
- Use `name_world_key_with_item` for export instance name lookups
- Guard `populate_world_and_interface_options` with `entry()` to avoid
  overwriting link options for duplicate interfaces
ricochet added a commit to ricochet/wasmtime that referenced this pull request Mar 1, 2026
DO NOT MERGE until wasm-tools release with
bytecodealliance/wasm-tools#2453
Points wasm-tools to PR branch  `wasmparser-implements`

Add support for the component model `[implements=<I>]L`
(spec PR [bytecodealliance#613](WebAssembly/component-model#613)),
which allows components to import/export the same
interface multiple times under different plain names.

A component can import the same interface twice under different labels,
each bound to a distinct host implementation:

```wit
import primary: wasi:keyvalue/store;
import secondary: wasi:keyvalue/store;
```

Guest code sees two separate namespaces with identical shapes:

```rust
let val = primary::get("my-key");       // calls the primary store
let val = secondary::get("my-key");     // calls the secondary store
```

Host Import-side codegen: shared trait + label-parameterized add_to_linker

For imports, wit-bindgen generates one Host trait per interface (not per
label). The add_to_linker function takes a name: &str parameter so the
same trait implementation can be registered under different instance labels.
Duplicate implements imports don't generate separate modules — only the
first import produces bindings.

```rust
struct PrimaryBackend;
impl primary::Host for PrimaryBackend {
    fn get(&mut self, key: String) -> String {
        self.primary_db.get(&key).cloned().unwrap_or_default()
    }
}

struct SecondaryBackend;
impl primary::Host for SecondaryBackend {
    fn get(&mut self, key: String) -> String {
        self.secondary_db.get(&key).cloned().unwrap_or_default()
    }
}

// Same add_to_linker, different labels and host_getter closures
primary::add_to_linker(&mut linker, "primary", |s| &mut s.primary)?;
primary::add_to_linker(&mut linker, "secondary", |s| &mut s.secondary)?;
```

Export-side codegen: per-label modules with shared types

For exports, each label gets its own module with fresh Guest/GuestIndices
types but re-exports shared interface types from the first module via
`pub use super::{first}::*`.

Runtime name resolution

The linker supports registering by plain label without knowing the annotation:

```rust
// Component imports [implements=<wasi:keyvalue/store>]primary
// but the host just registers "primary" — label fallback handles it
linker.root().instance("primary")?.func_wrap("get", /* ... */)?;

Users can also register to the linker with the full encoded implements name:

linker
    .root()
    .instance("[implements=<wasi:keyvalue/store>]primary")?
    .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?;
```

Semver matching works inside the implements annotation, just like
regular interface imports:

```rust
// Host provides v1.0.1
linker
    .root()
    .instance("[implements=<wasi:keyvalue/[email protected]>]primary")?
    .func_wrap("get", |_, (key,): (String,)| Ok((String::new(),)))?;

// Component requests v1.0.0, matches via semver
let component = Component::new(&engine, r#"(component
    (type $store (instance
        (export "get" (func (param "key" string) (result string)))
    ))
    (import "[implements=<wasi:keyvalue/[email protected]>]primary" (instance (type $store)))
)"#)?;
linker.instantiate(&mut store, &component)?; // works, 1.0.1 is semver-compatible with 1.0.0
```

- Add three-tier lookup in NameMap::get: exact → semver → label fallback
- Add implements_label_key() helper for extracting plain labels from
  `[implements=<I>]L`
- Add unit tests for all lookup tiers

- Track first-seen implements imports per `InterfaceId`
- One `Host` trait per interface; `generate_add_to_linker` takes
  `named: bool` — when true, emits `name: &str` parameter instead of
  hardcoding the instance name
- Duplicate `implements` imports: just record the label in
  `implements_labels`, no module generation
- `world_add_to_linker`: iterate over `implements_labels` to emit one
  `add_to_linker` call per label, passing label as name argument
- Guard `populate_world_and_interface_options` with `entry()` to avoid
  overwriting link options for duplicate interfaces

- Duplicate exports: re-export types via `pub use super::{first}::*`,
  generate fresh `Guest`/`GuestIndices`, plus regenerate resource wrapper
  structs to reference the local `Guest` type
- Use `name_world_key_with_item` for export instance name lookups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wasi:impl Issues pertaining to WASI implementation in Wasmtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants