(Research-only Tool) The Most Efficient Chunked or Target Quality AV1/AV2 Encoding Framework
- Dependencies
- Description
- Features
- Design Decisions
- Why Is It Fast and Minimal Especially Compared to Av1an
- Usage
- Building
- Video Showcase
- Credits
- SVT-AV1 (mainline or a fork)
- mkvmerge (to concatenate chunks)
- FFMS2 (a hard dependency)
- VSHIP (optional - needed for target quality encoding with CVVDP)
- ZIMG (optional - provides color conversion features needed by VSHIP)
xav aims to be the fastest, most minimal AV1 (and potentially AV2) encoding framework. By keeping its feature scope limited, the potential for the best encoder and the best video quality metric can be maximized without getting limited by extensive features.
As the author has been involved with the av1an project since its inception as a user and continues to develop it; creating a direct competitor without purpose was not the objective. xav is a faster, more minimal alternative to Av1an's most popular features and the author acknowledges that av1an is the most powerful & feature-rich video encoding framework. This tool was developed with a strong interest and focus on the "av1an" concept.
- Parses the new fancy progress output on SVT-AV1 encoders (there is an example in below video).
- Parses color and video metadata (container & frame based) to encoders automatically, including HDR metadata (Dolby Vision RPU automation for chunking is considered), FPS and resolution.
- Offers fun process monitoring with almost no overhead for indexing, SCD, encoding, TQ processes.
- Fastest chunked encoding with
svt-av1. - Fastest target quality encoding with
CVVDP.
- Uses only absolute bleeding-edge tools with an opinionated setup.
- No flexibility or extensive feature support (such as VapourSynth filtering, zoning, different encoders, metrics or statistical pooling for TQ).
yuv420p10leonly. No 8 or 12bit support, as well as yuv422, yuv444 support.
- Uses a direct memory pipeline (zero external process overhead). Everything runs within one Rust process with direct memory access.
- Direct C FFI bindings to FFMS2. FFMS2 is currently the most efficient library to open/index/decode videos. With this way, we also get rid of Python/Vapoursynth/FFMPEG dependencies.
- Frames flow directly from decoder -> memory buffers -> encoder stdin via pipes.
- Uses zero-copy frame handling.
- If the input is 10bit, custom 4-pixel-to-5-byte packing reduces memory by
37.5%. The bit packing overhead is literally 0. - If the input is 8bit, we can store the chunk in memory as 8bit reducing almost
50%. - On demand 10bit conversion is only done efficiently when needed.
- Uses contiguous YUV420 layout optimized for cache locality.
- The producer-consumer pipeline is lockless.
- Single thread extracts frames using FFMS2 -> Multiple encoder threads process chunks in parallel -> Lockless MPSC crossbeam channel communication with backpressure
- There is no thread contention: Single decoder eliminates seeking conflicts.
- Bounded channels prevent memory explosion.
- Workers operate on independent memory regions.
- All components share the same address space.
- OS can optimize single-process thread scheduling in an easier way.
- Minimal data movement between processing stages.
- Sequential memory access
- Only a single index needed for SCD/encoding.
- No interpreter overhead.
- TQ: Can directly use already handled frames for encoding, for metric comparison as well by utilizing
vshipAPI directly instead of using VapourSynth based CVVDP with inefficient seeking/decoding/computing.
Av1an on the other hand:
Relies on Python -> Vapoursynth -> FFmpeg -> Encoder and it means multiple pipe/subprocess calls with serialization overhead. And it must also parse and execute .vpy scripts.
The whole overhead can be summed up as:
- Python interpreter startup
- VapourSynth initialization
- FFmpeg subprocess spawning
- Multiple encoder process creation
- Python objects <-> VapourSynth frames
- FFmpeg -> VapourSynth -> Encoder pipes and inter process communication between them. Let's say you use 32 workers: It means 32 independent ffmpeg instances, 32 vapoursynth instances and also 32 encoder instances (96 processes communicating with each other and creating memory explosion)
- If you add TQ into the equation, separate decoding/seeking and using VapourSynth based metrics create extra significant overhead
Run the build_all_static.sh script to build dependencies statically and build the main tool with them. This is the intended way for maximum performance. Though this is not particularly trivial.
For dynamic builds, you need ffmpegsource (ffms2) installed on your system and need to run build_dynamic.sh.
For TQ support, you need zimg, ffms2, vship.
NOTE: Building this tool statically requires you to have static libraries in your system for the C library (glibc), CXX library (libstdc++), llvm-libunwind, compiler-rt. They are usually found with -static, -dev, -git suffixes in package managers. Some package managers do not provide them, in this case; they need to be compiled manually.
Rust Nightly is also needed for -Z based optimizations.
NOTE: The tool is still in pre-beta. Even though it works, especially static building has complexities that are hard to handle universally. I will provide arch specific optimized builds soon with or without TQ support.
i.mp4
- SVT-AV1 / SVT-AV1-HDR / SVT-AV1-PSYEX
- FFMS2
- ZIMG (for RGB conversion needed by VSHIP CVVDP computation)
- VSHIP
- CVVDP (re-implemented by VSHIP)
Huge thanks to Soda for the tremendous help & motivation & support to build this tool, and more importantly, for his friendship along the way. He is the partner in crime.
Also thanks Lumen for her great contributions on GPU based accessible state-of-the-art metric implementations and general help around the tooling.