AVX2RVV

AVX512 Intrinsics Implementation for RISC-V Vector Extension

Overview

avx2rvv is a header-only translation layer that maps Intel x86 SIMD intrinsics (SSE, AVX, AVX2, AVX512) to RISC‑V Vector (RVV) intrinsics. It enables existing x86 SIMD code to run on RISC‑V platforms with minimal source changes, allowing rapid workload bring-up, performance profiling, and hot path identification on RISC‑V targets.

Built on and extending sse2rvv, avx2rvv provides broader coverage for the AVX/AVX2/AVX512 families, implementing many functions from Intel’s intrinsic headers (such as <immintrin.h>) using RVV equivalents to match x86 semantics.

Mapping and Coverage

Header file	Extension
`<mmintrin.h>`	MMX
`<xmmintrin.h>`	SSE
`<emmintrin.h>`	SSE2
`<pmmintrin.h>`	SSE3
`<tmmintrin.h>`	SSSE3
`<smmintrin.h>`	SSE4.1
`<nmmintrin.h>`	SSE4.2
`<wmmintrin.h>`	AES
`<immintrin.h>`	AVX

Supported instruction sets:

SSE, SSE2, SSE3, SSSE3
SSE4.1, SSE4.2, AES
AVX, AVX2, AVX512

Design principle:

Prefer one-to-one mappings to RVV intrinsics whenever possible
Otherwise, emulate semantics using concise RVV instruction sequences

Examples:

Direct mapping: _mm_add_epi16 → __riscv_vadd_vv_i16m1
Composed mapping: _mm_maddubs_epi16 implemented via multiple RVV ops

Integration

Add Headers:
Place avx2rvv.h or sse2rvv.h in your project's include path.
Replace x86 SIMD Headers:
Locate and replace x86 SIMD header inclusions:
```
#include <xmmintrin.h>
#include <emmintrin.h>
#include <immintrin.h>
```
with:
```
#include "sse2rvv.h"
#include "avx2rvv.h"
```
- Replace {x,e,p,t,s,n,w}mmintrin.h with "sse2rvv.h"
- Replace {avx,avx2,avx512f,avx512vl}intrin.h or immintrin.h with "avx2rvv.h"
Compiler Options:
For RISC‑V (example for riscv64, adjust for your toolchain/CPU features):
```
-march=rv64gcv_zba -mabi=lp64d
```

Run Built-in Test Suite

avx2rvv ships with a built-in test suite under the tests/ directory. You can run all tests or a single test. Test inputs are provided at runtime and results are printed to stdout.

Prerequisites

A host RISC‑V toolchain (native or cross), or a host compiler plus QEMU for RISC‑V emulation
GNU Make

Run all tests (native toolchain)

make test

Run a single test (example: mm_crc32_u8)

# Build test binaries
make

# Run test case help:
AVX2RVV Test Suite
Usage: ./tests/main [OPTIONS] [TEST_NAME]

Options:
  -h, --help              Show this help message
  -l, --list              List all available test cases
  -v, --verbose           Enable verbose output
  -q, --quiet             Suppress output except for errors
  -i, --index INDEX       Run test by index number
  -s, --suites CASETYPE   Select test suite (default: all → run SSE first, then AVX)
  TEST_NAME               Run specific test by name (supports partial matching)

Examples:
  ./tests/main                        # Run all tests
  ./tests/main mm_add_ps              # Run mm_add_ps test
  ./tests/main --index 5              # Run test at index 5
  ./tests/main --list                 # List all available tests
  ./tests/main --suite avx            # Run only AVX tests\n
  ./tests/main --suite sse --index 5  # Run SSE test at index 5
  ./tests/main --verbose add          # Run tests matching 'add' with verbose output

# Run one case by name
  ./tests/main mm_crc32_u8

Expected output (sample):

  Test mm_crc32_u8                    passed
  SSE2RVV_TEST Complete!
  Passed:  1
  Failed:  0
  Ignored: 0
  Coverage rate: 100.00%

Cross-compile for RISC‑V and run with QEMU

# Build with a cross toolchain
make CROSS_COMPILE=riscv64-unknown-elf-

# Run with qemu-riscv64 (if your tests are built as Linux user binaries)
# Example (adjust path/binary as needed):
qemu-riscv64 ./tests/main

Notes:

Use tests/main to run the entire test matrix.
For single tests, pass the exact test name to tests/main $CASE.
If you target bare‑metal outputs, integrate with your runner or board bring‑up scripts accordingly.

Real-World Migration Examples

Case Study 1: Basic SSE&AVX Operations Migration

Objective: Demonstrate seamless migration from x86 SSE&AVX to RISC-V RVV

Source Code (testsse.cpp):

#include <stdio.h>
#include <stdint.h>
#include "sse2rvv.h"  // Replace <pmmintrin.h> for RISC-V
#include "avx2rvv.h"  // Replace <immintrin.h> for RISC-V

void sse_example() {
    int32_t a[4] = {-5, 13, 4, -20};
    int32_t b[4] = {12, 3, 0, 7};
    int32_t c[4] = {0};

    // Load 128-bit data (4 x 32-bit integers)
    __m128i t1 = _mm_loadu_si128((const __m128i*)a);
    __m128i t2 = _mm_loadu_si128((const __m128i*)b);

    // Execute 128-bit parallel addition
    __m128i dst = _mm_add_epi32(t1, t2);

    // Store results
    _mm_storeu_si128((__m128i*)c, dst);

    printf("Result: %d %d %d %d\n", c[0], c[1], c[2], c[3]);
}

int main(void) {
    sse_example();
    return 0;
}

Migration Steps:

Replace #include <pmmintrin.h> with #include "sse2rvv.h"
No source code changes required
Update compiler flags: -march=rv64gcv_zba

Compilation & Execution:

# Compile for RISC-V
riscv64-unknown-linux-gnu-g++ testsse.cpp -o testsse -march=rv64gcv_zba

# Run with QEMU
QEMU_LD_PREFIX=/opt/riscv/sysroot/ qemu-riscv64 ./testsse
# Output: Result: 7 16 4 -13

Case Study 2: Image Processing Application

Project: Prefetcher - Image processing with SIMD optimization

Migration Process:

Header Replacement:

// Before (x86)
#include <xmmintrin.h>

// After (RISC-V)
#include "sse2rvv.h"

Build Configuration:

CC=/path/to/riscv64-unknown-linux-gnu-gcc
CFLAGS = -O3 -march=rv64gcv_zba

Results:

Matrix transpose operation:
0  1  2  3
4  5  6  7
8  9 10 11
12 13 14 15

Transposed:
 0  4  8 12
1  5  9 13
2  6 10 14
3  7 11 15

SSE processing time: 1553431 us

Case Study 3: Base64 Encoding/Decoding

Project: Base64 - High-performance encoding library

Migration Steps:

Header Updates:

// lib/arch/avx512/codec.c
// #include <immintrin.h>
#include "avx2rvv.h"

// lib/arch/ssse3/codec.c  
// #include <immintrin.h>
#include "avx2rvv.h"

Build Configuration:

CC=/path/to/riscv64-unknown-linux-gnu-gcc
CFLAGS += -O3 -march=rv64gcv_zba -Wall -Wextra -pedantic \
          -DBASE64_STATIC_DEFINE -DBASE64_SSSE3_USE_ASM=0 -I.
LD=/path/to/riscv64-unknown-linux-gnu-ld
OBJCOPY=/path/to/riscv64-unknown-linux-gnu-objcopy

Compilation:

export SSSE3_CFLAGS=
export AVX512_CFLAGS=
make clean && OPENMP=1 make && OPENMP=1 make -C test

Result (10MB buffer):

Plain   encode:  801.69 MB/sec
Plain   decode: 1195.70 MB/sec
SSSE3   encode:  503.53 MB/sec
SSSE3   decode:  556.33 MB/sec
SSE41   encode:  705.87 MB/sec
SSE41   decode:  598.00 MB/sec
AVX512  encode:  269.52 MB/sec

Migration Best Practices

1. Systematic Migration Approach:

# Step 1: Identify SIMD usage
grep -r "_mm_" src/ | grep -E "(include|#include)"

# Step 2: Replace headers
find . -name "*.c" -o -name "*.cpp" | xargs sed -i 's/#include <.*mmintrin\.h>/#include "sse2rvv.h"/g'
find . -name "*.c" -o -name "*.cpp" | xargs sed -i 's/#include <immintrin\.h>/#include "avx2rvv.h"/g'

# Step 3: Update build system
sed -i 's/-march=native/-march=rv64gcv_zba/g' Makefile

2. Compatibility Verification:

// Add runtime checks for unsupported functions
#ifdef __riscv
    // Check for RVV support
    if (__riscv_v_elen < 128) {
        fprintf(stderr, "Warning: RVV not supported, falling back to scalar\n");
        use_scalar_implementation();
        return;
    }
#endif

3. Performance Validation:

// Benchmark both implementations
void benchmark_migration() {
    clock_t start, end;
    
    // Test original x86 implementation
    start = clock();
    x86_simd_function();
    end = clock();
    double x86_time = ((double)(end - start)) / CLOCKS_PER_SEC;
    
    // Test RISC-V implementation  
    start = clock();
    riscv_simd_function();
    end = clock();
    double riscv_time = ((double)(end - start)) / CLOCKS_PER_SEC;
    
    printf("Performance ratio (RISC-V/x86): %.2f\n", riscv_time / x86_time);
}

Common Migration Challenges and Solutions

1. Unsupported Intrinsics:

// Problem: _mm_prefetch not implemented
// Solution: Use compiler hints or manual prefetching
#ifdef __riscv
    // Manual prefetch simulation
    __builtin_prefetch(ptr, 0, 3);  // Read, high temporal locality
#else
    _mm_prefetch(ptr, _MM_HINT_T1);
#endif

2. Assembly Code Compatibility:

// Problem: Inline assembly not portable
// Solution: Conditional compilation
#ifdef __riscv
    // Use RVV intrinsics instead of assembly
    __m512i result = _mm512_add_epi32(a, b);
#elif defined(__x86_64__)
    // Original x86 assembly
    asm volatile ("vpaddd %0, %1, %2" : "=x"(result) : "x"(a), "x"(b));
#endif

3. Memory Alignment Issues:

// Problem: Different alignment requirements
// Solution: Use portable alignment
void* aligned_alloc_portable(size_t alignment, size_t size) {
#ifdef __riscv
    return aligned_alloc(alignment, size);
#else
    return _mm_malloc(size, alignment);
#endif
}

References

Development Roadmap

This project is under active development with a clear roadmap for expanding SIMD intrinsic support across different Intel x86 architectures.

Function Statistics & Priority Analysis

Based on comprehensive analysis of x86 SIMD intrinsic usage patterns, we have identified the following function distribution:

Vector Type	Header Files	Total Functions	High-Frequency Functions	Priority Level
AVX	`immintrin.h`	191	191 (100%)	🔴 Critical
AVX2	`immintrin.h`	233	191 (82%)	🔴 Critical
AVX512	`immintrin.h` `avx512fintrin.h` `avx512vfintrin.h`	2,665	2,624 (98%)	🟡 High
AVX512 Extensions	Various	2,459	1,855 (75%)	🟢 Medium
AVX512 Compute	Various	132	64 (48%)	🔵 Low
Total	-	6,327	5,554 (88%)	-

Phase 1: Foundation & High-Priority Implementation

Target Completion: December, 2025

Priority Focus: Complete AVX and AVX2 intrinsic libraries (384 functions)
Scope: Organize and submit existing developed functions and test cases
Deliverables:
- Complete AVX/AVX2 function library (100% coverage)
- Performance benchmarking suite
Rationale: AVX/AVX2 show 100% and 82% high-frequency usage respectively

Phase 2: AVX512 Core Implementation

Target Completion: July 2026

Implementation Strategy: Focus on high-frequency AVX512 functions (2,624 functions)
Batch Development: 200-300 functions per month
Coverage Goals:
- Complete core AVX512 intrinsic function library (98% high-frequency coverage)
- Comprehensive test suite for all implemented functions
- Performance optimization for RISC-V Vector Extension
Quality Assurance: Each batch includes comprehensive testing, documentation, and benchmarking

Phase 3: AVX512 Extensions & Specialized Functions

Target Completion: October, 2026

Implementation Strategy: Batch development approach (200-300 functions per month)
Coverage Goals:
- Complete AVX intrinsic function library
- Complete AVX2 intrinsic function library
- Complete AVX512 intrinsic function library
- Full test suite coverage for all implemented functions
Quality Assurance: Each batch includes comprehensive testing and documentation

Development Philosophy

Data-Driven: Prioritize functions based on comprehensive usage analysis (88% high-frequency coverage)
Quality-First: Every function includes comprehensive test cases and documentation
Performance-Oriented: Optimize for RISC-V Vector Extension efficiency
Community-Focused: Open development process with regular milestone releases
Backward Compatible: Maintain API compatibility throughout development phases

Contributing

We welcome contributions! Please see our Contributing Guidelines for details on how to:

Report bugs and request features
Submit pull requests
Contribute to test cases
Help with documentation

License

"avx2rvv is freely redistributable under the Apache License 2.0."

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
avx2rvv.h		avx2rvv.h
sse2rvv.h		sse2rvv.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AVX2RVV

Overview

Mapping and Coverage

Integration

Run Built-in Test Suite

Prerequisites

Run all tests (native toolchain)

Run a single test (example: mm_crc32_u8)

Cross-compile for RISC‑V and run with QEMU

Real-World Migration Examples

Case Study 1: Basic SSE&AVX Operations Migration

Case Study 2: Image Processing Application

Case Study 3: Base64 Encoding/Decoding

Migration Best Practices

Common Migration Challenges and Solutions

References

Development Roadmap

Function Statistics & Priority Analysis

Phase 1: Foundation & High-Priority Implementation

Phase 2: AVX512 Core Implementation

Phase 3: AVX512 Extensions & Specialized Functions

Development Philosophy

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

zte-riscv/avx2rvv

Folders and files

Latest commit

History

Repository files navigation

AVX2RVV

Overview

Mapping and Coverage

Integration

Run Built-in Test Suite

Prerequisites

Run all tests (native toolchain)

Run a single test (example: mm_crc32_u8)

Cross-compile for RISC‑V and run with QEMU

Real-World Migration Examples

Case Study 1: Basic SSE&AVX Operations Migration

Case Study 2: Image Processing Application

Case Study 3: Base64 Encoding/Decoding

Migration Best Practices

Common Migration Challenges and Solutions

References

Development Roadmap

Function Statistics & Priority Analysis

Phase 1: Foundation & High-Priority Implementation

Phase 2: AVX512 Core Implementation

Phase 3: AVX512 Extensions & Specialized Functions

Development Philosophy

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages