Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c7b72a0
Lay groundwork for SIMD.
huonw Jul 7, 2015
272ce9b
First round of changes: remove extraneous traits etc.
huonw Jul 8, 2015
5893b16
Second round of changes: minor tweaks.
huonw Jul 8, 2015
f9e48d1
Clarify/fix typos.
huonw Jul 9, 2015
e2fc223
Note that fixed-length arrays could be repr(simd)'d.
huonw Jul 9, 2015
efeafdb
Remove the simd_primitive_trait attribute.
huonw Jul 10, 2015
a7c409b
Mention alignment changes due to repr(simd).
huonw Jul 10, 2015
f4e2ecf
Note pre-RFC discussion.
huonw Jul 10, 2015
1132ede
Clarify how the intrinsics' structural typing works.
huonw Jul 13, 2015
8317ea4
Add arithmetic intrinsics alternative.
huonw Jul 13, 2015
c6ed18a
Write down an answer to "why not `asm!`?".
huonw Jul 13, 2015
67f78ec
point to cfg-if.
huonw Jul 14, 2015
f71c4b3
Use intrinsics for arithmetic instead of built-in operators.
huonw Aug 3, 2015
8b2ec8c
Accidentally:
huonw Aug 3, 2015
47f6ae9
Use the platform-intrinsic ABI instead of rust-intrinsic.
huonw Aug 6, 2015
4a4e6ae
feature(simd_basics) -> feature(repr_simd)
huonw Aug 6, 2015
c4bf5e1
Remove struct flattening.
huonw Aug 12, 2015
6532670
Change shuffles to use arrays of indices.
huonw Aug 12, 2015
8e3a0de
shuffles don't rely on generic types for return values.
huonw Aug 12, 2015
54b0927
Intrinsics-for-operations is now the RFC, not an alternative.
huonw Aug 12, 2015
9e31ad3
Out of bounds indices are errors (backwards compat to relax).
huonw Aug 12, 2015
91a2b36
Only invalid to *call* intrinsics on bad platforms.
huonw Aug 12, 2015
60931df
There can be more shuffles.
huonw Aug 12, 2015
135ba7d
Internal references are legal.
huonw Aug 14, 2015
67fea6e
Type-level integer/values alternatives for shuffles.
huonw Aug 14, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
First round of changes: remove extraneous traits etc.
  • Loading branch information
huonw committed Jul 8, 2015
commit 272ce9b458b32c1e6a2f4e349e5a2b4458673107
176 changes: 50 additions & 126 deletions text/0000-simd-infrastructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,24 @@ This RFC lays the ground-work for building nice SIMD functionality,
but doesn't fill everything out. The goal here is to provide the raw
types and access to the raw instructions on each platform.

## Where does this code go? Aka. why not in `std`?

This RFC is focused on building stable, powerful SIMD functionality in
external crates, not `std`.

This makes it much easier to support functionality only "occasionally"
available with Rust's preexisting `cfg` system. If it were in `std`,
there would need to be some highly delayed `cfg` system so that
functions that only work with (say) AVX-2 support:

- don't break compilation on systems that don't support it, but
- are still usable on systems that do support it.

With an external crate, we can leverage `cargo`'s existing build
infrastructure: compiling with some target features will rebuild with
those features enabled.


# Detailed design

The design comes in three parts:
Expand All @@ -39,113 +57,42 @@ There is definitely a common core of SIMD functionality shared across
many platforms, but this RFC doesn't try to extract that, it is just
building tools that can be wrapped into a more uniform API later.

## Background: Where does this code go?

This RFC is focused on building stable, powerful SIMD functionality in
external crates, not `std`. This makes it much easier to support
functionality only "occasionally" available with Rust's preexisting
`cfg` system. If it were in `std`, there would need to be some highly
delayed `cfg` system so that functions that only work with AVX-2
support:

- don't break compilation on systems that don't support it, but
- are still usable on systems that do support it.

## Types & traits

A type designed to be used as a SIMD vector is indicated by the
`repr(simd)` attribute. A type marked as such will be compiled to
behave like a SIMD register (as well as the target platform can
support it).

The types/traits will be defined as follows:
There are two new attributes: `repr(simd)` and `simd_primitive_trait`

```rust
#[repr(simd)]
struct Simd2<T: SimdPrim>(T, T);
#[repr(simd)]
struct Simd4<T: SimdPrim>(T, T, T, T);
#[repr(simd)]
struct Simd8<T: SimdPrim>(T, T, T, T, T, T, T, T);
#[repr(simd)]
struct Simd16<T: SimdPrim>(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T);
#[repr(simd)]
struct Simd32<T: SimdPrim>(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,
T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T);
#[repr(simd)]
struct Simd64<T: SimdPrim>(T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,
T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,
T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T,
T, T, T, T, T, T, T, T, T, T, T, T, T, T, T, T);

trait SimdVector {
type Elem: SimdPrim;
type Bool: SimdVector<Elem = <Self::Elem as SimdPrim>::Bool>;
}

impl<T: SimdPrim> for Simd2<T> {
type Elem = T;
type Bool = Simd2<T::Bool>;
}
impl<T: SimdPrim> for Simd4<T> {
type Elem = T;
type Bool = Simd4<T::Bool>;
}
// ...
impl<T: SimdPrim> for Simd64<T> {
type Elem = T;
type Bool = Simd64<T::Bool>;
}
struct f32x4(f32, f32, f23, f23);

#[simd_prim_trait]
trait SimdPrim {
type Bool: SimdPrim;
}
#[repr(simd)]
struct Simd2<T>(T, T);

// boolean types, see below
struct bool8i(...);
struct bool16i(...);
struct bool32i(...);
struct bool64i(...);
struct bool32f(...);
struct bool64f(...);

// specifying what types are SIMD-able.
impl SimdPrim for u8 { type Bool = bool8i; }
impl SimdPrim for i8 { type Bool = bool8i; }
impl SimdPrim for u16 { type Bool = bool16i; }
// ...
impl SimdPrim for i64 { type Bool = bool64i; }

impl SimdPrim for f32 { type Bool = bool32f; }
impl SimdPrim for f64 { type Bool = bool64f; }

impl SimdPrim for bool8i { type Bool = bool8i; }
// ...
impl SimdPrim for bool64i { type Bool = bool64i; }

impl SimdPrim for bool32f { type Bool = bool32f; }
impl SimdPrim for bool64f { type Bool = bool64f; }
#[simd_primitive_trait]
trait SimdPrim {}
```

It is illegal to take an internal reference to the fields of a
`repr(simd)` type.

### `repr(simd)`

The `simd` `repr` can be attached to a struct and will cause such a
struct to be compiled to a SIMD vector. It is required that the
monomorphised vector consist of only a single "primitive" type,
repeated some number of times. The restrictions on the element type
are exactly the same restrictions as `#[simd_primitive_trait]` traits
impose on their implementing types.
struct to be compiled to a SIMD vector. It can be generic, but it is
required that any fully monomorphised instance of the type consist of
only a single "primitive" type, repeated some number of times. The
restrictions on the element type are exactly the same restrictions as
`#[simd_primitive_trait]` traits impose on their implementing types.

The `repr(simd)` may not enforce that the trait bound exists/does the
right thing at the type checking level for generic `repr(simd)`
types. As such, it will be possible to get the code-generator to error
out (ala the old `transmute` size errosr), however, this shouldn't
cause problems in practice: libraries wrapping this functionality
would layer type-safety on top (i.e. the `SimdPrim` trait).
would layer type-safety on top (i.e. generic `repr(simd)` types would
use the `SimdPrim` trait as a bound).

It is illegal to take an internal reference to the fields of a
`repr(simd)` type, because the representation of booleans may require
to change, so that booleans are bit-packed.

### `simd_primitive_trait`

Expand All @@ -160,35 +107,6 @@ restriction and possibly tweaks type's internal representation (as
such, it's legal for a single type to implement multiple traits with
the attribute, if a bit pointless).

### Booleans

SIMD booleans are non-trivial. Many conventional APIs e.g. SSE, and
NEON, use "wide booleans": a large number of bits set to all-zeros
(false) or all-ones, e.g. equality between `Simd4(0_u32, 1, 2, 3)` and
`Simd4(0_u32, 0, 2, 3)` gives (on the CPU) `Simd4(!0_u32, 0, !0,
!0)`. Hence, the boolean types need to have width. It's tempting to
just use the integer types of the appropriate width, but this falls
down for two reasons:

1. booleans aren't always this format
2. the source of the boolean matters

The second is easiest: CPUs are complicated beasts, and the hardware
that handles floating point vector operations may be very different to
the hardware that handles integer ones: instructions use different
execution units. It can take several cycles to transfer data between
them. Encoding the provenance/execution unit of the value in the type
makes costs explicit.

The first is much harder to solve. Some architectures/instruction sets
model booleans as single bits. For example, equality between
`Simd4(0_u32, 1, 2, 3)` and `Simd(0_u32, 0, 2, 3)` gives `1 + 4 + 8 ==
0b1101`. One example is AVX-512 which essentially replaces all of the
older SSE through AVX2 boolean-returning instructions with versions
that return those. Using separate types for booleans (and restricting
their API) allows for some serious magic: `Simd4<bool32>` becomes
`u4`. (This is where the reference-restriction above comes in.)

## Operations

CPU vendors usually offer "standard" C headers for their CPU specific
Expand Down Expand Up @@ -295,19 +213,23 @@ Comparisons are implemented via intrinsics, because the current
comparison operator infrastructure doesn't easily lend itself to
return vectors, as required.

A library could give signatures like:
The raw signatures would look like:

```rust
extern "rust-intrinsic" {
fn simd_eq<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_ne<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_lt<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_le<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_gt<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_ge<T: SimdVector>(v: T, w: T) -> T::Bool;
fn simd_eq<T, U>(v: T, w: T) -> U;
fn simd_ne<T, U>(v: T, w: T) -> U;
fn simd_lt<T, U>(v: T, w: T) -> U;
fn simd_le<T, U>(v: T, w: T) -> U;
fn simd_gt<T, U>(v: T, w: T) -> U;
fn simd_ge<T, U>(v: T, w: T) -> U;
}
```

However, these will be type checked, to ensure that `T` and `U` are
the same length, and that `U` is appropriately shaped for a boolean. A
library actually importing them might use some trait bounds to get
actual type-safety.

### Built-in functionality

Expand Down Expand Up @@ -340,8 +262,10 @@ exact target e.g.
- compiling with `-C target-cpu=native` on a modern CPU might set
`target_feature = "avx2"`, `target_feature = "avx"`, ...

(There are other non-SIMD features that might have `target_feature`s
set too, such as `popcnt` and `rdrnd` on x86/x86-64.)
The possible values of `target_feature` will be a selected whitelist,
not necessarily just everything LLVM understands. There are other
non-SIMD features that might have `target_feature`s set too, such as
`popcnt` and `rdrnd` on x86/x86-64.)

With a `cfg_if_else!` macro that expands to the first `cfg` that is
satisfied (ala [@alexcrichton's cascade][cascade]), code might look
Expand Down