Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ xav Public

(Research-only Tool) eXtreme AOMedia Video: The Most Efficient Chunked or Target Quality AV1/AV2 Encoding Framework

License

Notifications You must be signed in to change notification settings

emrakyz/xav

xav - eXtreme AOMedia Video

(Research-only Tool) The Most Efficient Chunked or Target Quality AV1/AV2 Encoding Framework

Table of Contents

  1. Dependencies
  2. Description
  3. Features
  4. Design Decisions
  5. Why Is It Fast and Minimal Especially Compared to Av1an
  6. Usage
  7. Building
  8. Video Showcase
  9. Credits

Dependencies

  • SVT-AV1 (mainline or a fork)
  • mkvmerge (to concatenate chunks)
  • FFMS2 (a hard dependency)
  • VSHIP (optional - needed for target quality encoding with CVVDP)
  • ZIMG (optional - provides color conversion features needed by VSHIP)

Description

xav aims to be the fastest, most minimal AV1 (and potentially AV2) encoding framework. By keeping its feature scope limited, the potential for the best encoder and the best video quality metric can be maximized without getting limited by extensive features.

As the author has been involved with the av1an project since its inception as a user and continues to develop it; creating a direct competitor without purpose was not the objective. xav is a faster, more minimal alternative to Av1an's most popular features and the author acknowledges that av1an is the most powerful & feature-rich video encoding framework. This tool was developed with a strong interest and focus on the "av1an" concept.

Features

  • Parses the new fancy progress output on SVT-AV1 encoders (there is an example in below video).
  • Parses color and video metadata (container & frame based) to encoders automatically, including HDR metadata (Dolby Vision RPU automation for chunking is considered), FPS and resolution.
  • Offers fun process monitoring with almost no overhead for indexing, SCD, encoding, TQ processes.
  • Fastest chunked encoding with svt-av1.
  • Fastest target quality encoding with CVVDP.

Design Decisions

  • Uses only absolute bleeding-edge tools with an opinionated setup.
  • No flexibility or extensive feature support (such as VapourSynth filtering, zoning, different encoders, metrics or statistical pooling for TQ).
  • yuv420p10le only. No 8 or 12bit support, as well as yuv422, yuv444 support.

Why Is It Fast and Minimal Especially Compared to Av1an

  • Uses a direct memory pipeline (zero external process overhead). Everything runs within one Rust process with direct memory access.
  • Direct C FFI bindings to FFMS2. FFMS2 is currently the most efficient library to open/index/decode videos. With this way, we also get rid of Python/Vapoursynth/FFMPEG dependencies.
  • Frames flow directly from decoder -> memory buffers -> encoder stdin via pipes.
  • Uses zero-copy frame handling.
  • If the input is 10bit, custom 4-pixel-to-5-byte packing reduces memory by 37.5%. The bit packing overhead is literally 0.
  • If the input is 8bit, we can store the chunk in memory as 8bit reducing almost 50%.
  • On demand 10bit conversion is only done efficiently when needed.
  • Uses contiguous YUV420 layout optimized for cache locality.
  • The producer-consumer pipeline is lockless.
  • Single thread extracts frames using FFMS2 -> Multiple encoder threads process chunks in parallel -> Lockless MPSC crossbeam channel communication with backpressure
  • There is no thread contention: Single decoder eliminates seeking conflicts.
  • Bounded channels prevent memory explosion.
  • Workers operate on independent memory regions.
  • All components share the same address space.
  • OS can optimize single-process thread scheduling in an easier way.
  • Minimal data movement between processing stages.
  • Sequential memory access
  • Only a single index needed for SCD/encoding.
  • No interpreter overhead.
  • TQ: Can directly use already handled frames for encoding, for metric comparison as well by utilizing vship API directly instead of using VapourSynth based CVVDP with inefficient seeking/decoding/computing.

Av1an on the other hand: Relies on Python -> Vapoursynth -> FFmpeg -> Encoder and it means multiple pipe/subprocess calls with serialization overhead. And it must also parse and execute .vpy scripts. The whole overhead can be summed up as:

  • Python interpreter startup
  • VapourSynth initialization
  • FFmpeg subprocess spawning
  • Multiple encoder process creation
  • Python objects <-> VapourSynth frames
  • FFmpeg -> VapourSynth -> Encoder pipes and inter process communication between them. Let's say you use 32 workers: It means 32 independent ffmpeg instances, 32 vapoursynth instances and also 32 encoder instances (96 processes communicating with each other and creating memory explosion)
  • If you add TQ into the equation, separate decoding/seeking and using VapourSynth based metrics create extra significant overhead

Usage

image

Building

Run the build_all_static.sh script to build dependencies statically and build the main tool with them. This is the intended way for maximum performance. Though this is not particularly trivial.

For dynamic builds, you need ffmpegsource (ffms2) installed on your system and need to run build_dynamic.sh.

For TQ support, you need zimg, ffms2, vship.

NOTE: Building this tool statically requires you to have static libraries in your system for the C library (glibc), CXX library (libstdc++), llvm-libunwind, compiler-rt. They are usually found with -static, -dev, -git suffixes in package managers. Some package managers do not provide them, in this case; they need to be compiled manually.

Rust Nightly is also needed for -Z based optimizations.

NOTE: The tool is still in pre-beta. Even though it works, especially static building has complexities that are hard to handle universally. I will provide arch specific optimized builds soon with or without TQ support.

Video Showcase

i.mp4

Software Used by This Project

Credits

Huge thanks to Soda for the tremendous help & motivation & support to build this tool, and more importantly, for his friendship along the way. He is the partner in crime.

Also thanks Lumen for her great contributions on GPU based accessible state-of-the-art metric implementations and general help around the tooling.

About

(Research-only Tool) eXtreme AOMedia Video: The Most Efficient Chunked or Target Quality AV1/AV2 Encoding Framework

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published