Artem Titov | a617867 | 2023-01-30 10:51:01 | [diff] [blame] | 1 | <!-- go/cmark --> |
| 2 | <!--* freshness: {owner: 'brandtr' reviewed: '2021-04-15'} *--> |
Rasmus Brandt | b291da8 | 2021-04-16 08:17:04 | [diff] [blame] | 3 | |
| 4 | # Video coding in WebRTC |
| 5 | |
| 6 | ## Introduction to layered video coding |
| 7 | |
| 8 | [Video coding][video-coding-wiki] is the process of encoding a stream of |
| 9 | uncompressed video frames into a compressed bitstream, whose bitrate is lower |
| 10 | than that of the original stream. |
| 11 | |
| 12 | ### Block-based hybrid video coding |
| 13 | |
| 14 | All video codecs in WebRTC are based on the block-based hybrid video coding |
| 15 | paradigm, which entails prediction of the original video frame using either |
| 16 | [information from previously encoded frames][motion-compensation-wiki] or |
| 17 | information from previously encoded portions of the current frame, subtraction |
| 18 | of the prediction from the original video, and |
| 19 | [transform][transform-coding-wiki] and [quantization][quantization-wiki] of the |
| 20 | resulting difference. The output of the quantization process, quantized |
| 21 | transform coefficients, is losslessly [entropy coded][entropy-coding-wiki] along |
| 22 | with other encoder parameters (e.g., those related to the prediction process) |
| 23 | and then a reconstruction is constructed by inverse quantizing and inverse |
| 24 | transforming the quantized transform coefficients and adding the result to the |
| 25 | prediction. Finally, in-loop filtering is applied and the resulting |
| 26 | reconstruction is stored as a reference frame to be used to develop predictions |
| 27 | for future frames. |
| 28 | |
| 29 | ### Frame types |
| 30 | |
| 31 | When an encoded frame depends on previously encoded frames (i.e., it has one or |
| 32 | more inter-frame dependencies), the prior frames must be available at the |
| 33 | receiver before the current frame can be decoded. In order for a receiver to |
| 34 | start decoding an encoded bitstream, a frame which has no prior dependencies is |
| 35 | required. Such a frame is called a "key frame". For real-time-communications |
| 36 | encoding, key frames typically compress less efficiently than "delta frames" |
| 37 | (i.e., frames whose predictions are derived from previously encoded frames). |
| 38 | |
| 39 | ### Single-layer coding |
| 40 | |
| 41 | In 1:1 calls, the encoded bitstream has a single recipient. Using end-to-end |
| 42 | bandwidth estimation, the target bitrate can thus be well tailored for the |
| 43 | intended recipient. The number of key frames can be kept to a minimum and the |
| 44 | compressability of the stream can be maximized. One way of achiving this is by |
| 45 | using "single-layer coding", where each delta frame only depends on the frame |
| 46 | that was most recently encoded. |
| 47 | |
| 48 | ### Scalable video coding |
| 49 | |
| 50 | In multiway conferences, on the other hand, the encoded bitstream has multiple |
| 51 | recipients each of whom may have different downlink bandwidths. In order to |
| 52 | tailor the encoded bitstreams to a heterogeneous network of receivers, |
| 53 | [scalable video coding][svc-wiki] can be used. The idea is to introduce |
| 54 | structure into the dependency graph of the encoded bitstream, such that _layers_ of |
| 55 | the full stream can be decoded using only available lower layers. This structure |
| 56 | allows for a [selective forwarding unit][sfu-webrtc-glossary] to discard upper |
| 57 | layers of the of the bitstream in order to achieve the intended downlink |
| 58 | bandwidth. |
| 59 | |
| 60 | There are multiple types of scalability: |
| 61 | |
| 62 | * _Temporal scalability_ are layers whose framerate (and bitrate) is lower than that of the upper layer(s) |
| 63 | * _Spatial scalability_ are layers whose resolution (and bitrate) is lower than that of the upper layer(s) |
| 64 | * _Quality scalability_ are layers whose bitrate is lower than that of the upper layer(s) |
| 65 | |
| 66 | WebRTC supports temporal scalability for `VP8`, `VP9` and `AV1`, and spatial |
| 67 | scalability for `VP9` and `AV1`. |
| 68 | |
| 69 | ### Simulcast |
| 70 | |
| 71 | Simulcast is another approach for multiway conferencing, where multiple |
| 72 | _independent_ bitstreams are produced by the encoder. |
| 73 | |
| 74 | In cases where multiple encodings of the same source are required (e.g., uplink |
| 75 | transmission in a multiway call), spatial scalability with inter-layer |
| 76 | prediction generally offers superior coding efficiency compared with simulcast. |
| 77 | When a single encoding is required (e.g., downlink transmission in any call), |
| 78 | simulcast generally provides better coding efficiency for the upper spatial |
| 79 | layers. The `K-SVC` concept, where spatial inter-layer dependencies are only |
| 80 | used to encode key frames, for which inter-layer prediction is typically |
| 81 | significantly more effective than it is for delta frames, can be seen as a |
| 82 | compromise between full spatial scalability and simulcast. |
| 83 | |
| 84 | ## Overview of implementation in `modules/video_coding` |
| 85 | |
| 86 | Given the general introduction to video coding above, we now describe some |
| 87 | specifics of the [`modules/video_coding`][modules-video-coding] folder in WebRTC. |
| 88 | |
| 89 | ### Built-in software codecs in [`modules/video_coding/codecs`][modules-video-coding-codecs] |
| 90 | |
| 91 | This folder contains WebRTC-specific classes that wrap software codec |
| 92 | implementations for different video coding standards: |
| 93 | |
| 94 | * [libaom][libaom-src] for [AV1][av1-spec] |
| 95 | * [libvpx][libvpx-src] for [VP8][vp8-spec] and [VP9][vp9-spec] |
| 96 | * [OpenH264][openh264-src] for [H.264 constrained baseline profile][h264-spec] |
| 97 | |
| 98 | Users of the library can also inject their own codecs, using the |
| 99 | [VideoEncoderFactory][video-encoder-factory-interface] and |
| 100 | [VideoDecoderFactory][video-decoder-factory-interface] interfaces. This is how |
| 101 | platform-supported codecs, such as hardware backed codecs, are implemented. |
| 102 | |
| 103 | ### Video codec test framework in [`modules/video_coding/codecs/test`][modules-video-coding-codecs-test] |
| 104 | |
| 105 | This folder contains a test framework that can be used to evaluate video quality |
| 106 | performance of different video codec implementations. |
| 107 | |
| 108 | ### SVC helper classes in [`modules/video_coding/svc`][modules-video-coding-svc] |
| 109 | |
| 110 | * [`ScalabilityStructure*`][scalabilitystructure] - different |
| 111 | [standardized scalability structures][scalability-structure-spec] |
| 112 | * [`ScalableVideoController`][scalablevideocontroller] - provides instructions to the video encoder how |
| 113 | to create a scalable stream |
| 114 | * [`SvcRateAllocator`][svcrateallocator] - bitrate allocation to different spatial and temporal |
| 115 | layers |
| 116 | |
| 117 | ### Utility classes in [`modules/video_coding/utility`][modules-video-coding-utility] |
| 118 | |
| 119 | * [`FrameDropper`][framedropper] - drops incoming frames when encoder systematically |
| 120 | overshoots its target bitrate |
| 121 | * [`FramerateController`][frameratecontroller] - drops incoming frames to achieve a target framerate |
| 122 | * [`QpParser`][qpparser] - parses the quantization parameter from a bitstream |
| 123 | * [`QualityScaler`][qualityscaler] - signals when an encoder generates encoded frames whose |
| 124 | quantization parameter is outside the window of acceptable values |
| 125 | * [`SimulcastRateAllocator`][simulcastrateallocator] - bitrate allocation to simulcast layers |
| 126 | |
| 127 | ### General helper classes in [`modules/video_coding`][modules-video-coding] |
| 128 | |
| 129 | * [`FecControllerDefault`][feccontrollerdefault] - provides a default implementation for rate |
| 130 | allocation to [forward error correction][fec-wiki] |
| 131 | * [`VideoCodecInitializer`][videocodecinitializer] - converts between different encoder configuration |
| 132 | structs |
| 133 | |
| 134 | ### Receiver buffer classes in [`modules/video_coding`][modules-video-coding] |
| 135 | |
| 136 | * [`PacketBuffer`][packetbuffer] - (re-)combines RTP packets into frames |
| 137 | * [`RtpFrameReferenceFinder`][rtpframereferencefinder] - determines dependencies between frames based on information in the RTP header, payload header and RTP extensions |
| 138 | * [`FrameBuffer`][framebuffer] - order frames based on their dependencies to be fed to the decoder |
| 139 | |
| 140 | [video-coding-wiki]: https://en.wikipedia.org/wiki/Video_coding_format |
| 141 | [motion-compensation-wiki]: https://en.wikipedia.org/wiki/Motion_compensation |
| 142 | [transform-coding-wiki]: https://en.wikipedia.org/wiki/Transform_coding |
| 143 | [motion-vector-wiki]: https://en.wikipedia.org/wiki/Motion_vector |
| 144 | [mpeg-wiki]: https://en.wikipedia.org/wiki/Moving_Picture_Experts_Group |
| 145 | [svc-wiki]: https://en.wikipedia.org/wiki/Scalable_Video_Coding |
| 146 | [sfu-webrtc-glossary]: https://webrtcglossary.com/sfu/ |
| 147 | [libvpx-src]: https://chromium.googlesource.com/webm/libvpx/ |
| 148 | [libaom-src]: https://aomedia.googlesource.com/aom/ |
| 149 | [openh264-src]: https://github.com/cisco/openh264 |
| 150 | [vp8-spec]: https://tools.ietf.org/html/rfc6386 |
| 151 | [vp9-spec]: https://storage.googleapis.com/downloads.webmproject.org/docs/vp9/vp9-bitstream-specification-v0.6-20160331-draft.pdf |
| 152 | [av1-spec]: https://aomediacodec.github.io/av1-spec/ |
| 153 | [h264-spec]: https://www.itu.int/rec/T-REC-H.264-201906-I/en |
Tony Herre | b0ed120 | 2021-07-22 15:40:44 | [diff] [blame] | 154 | [video-encoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_encoder_factory.h;l=27;drc=afadfb24a5e608da6ae102b20b0add53a083dcf3 |
| 155 | [video-decoder-factory-interface]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video_codecs/video_decoder_factory.h;l=27;drc=49c293f03d8f593aa3aca282577fcb14daa63207 |
Rasmus Brandt | b291da8 | 2021-04-16 08:17:04 | [diff] [blame] | 156 | [scalability-structure-spec]: https://w3c.github.io/webrtc-svc/#scalabilitymodes* |
| 157 | [fec-wiki]: https://en.wikipedia.org/wiki/Error_correction_code#Forward_error_correction |
| 158 | [entropy-coding-wiki]: https://en.wikipedia.org/wiki/Entropy_encoding |
Tony Herre | b0ed120 | 2021-07-22 15:40:44 | [diff] [blame] | 159 | [modules-video-coding]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/ |
| 160 | [modules-video-coding-codecs]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/ |
| 161 | [modules-video-coding-codecs-test]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/codecs/test/ |
| 162 | [modules-video-coding-svc]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/ |
| 163 | [modules-video-coding-utility]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/ |
| 164 | [scalabilitystructure]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/create_scalability_structure.h?q=CreateScalabilityStructure |
| 165 | [scalablevideocontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/scalable_video_controller.h?q=ScalableVideoController |
| 166 | [svcrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/svc/svc_rate_allocator.h?q=SvcRateAllocator |
| 167 | [framedropper]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/frame_dropper.h?q=FrameDropper |
| 168 | [frameratecontroller]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/framerate_controller.h?q=FramerateController |
| 169 | [qpparser]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/qp_parser.h?q=QpParser |
| 170 | [qualityscaler]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/quality_scaler.h?q=QualityScaler |
| 171 | [simulcastrateallocator]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/utility/simulcast_rate_allocator.h?q=SimulcastRateAllocator |
| 172 | [feccontrollerdefault]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/fec_controller_default.h?q=FecControllerDefault |
| 173 | [videocodecinitializer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/include/video_codec_initializer.h?q=VideoCodecInitializer |
| 174 | [packetbuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/packet_buffer.h?q=PacketBuffer |
| 175 | [rtpframereferencefinder]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/modules/video_coding/rtp_frame_reference_finder.h?q=RtpFrameReferenceFinder |
philipel | 04e9354 | 2023-02-03 13:42:32 | [diff] [blame] | 176 | [framebuffer]: https://source.chromium.org/chromium/chromium/src/+/main:third_party/webrtc/api/video/frame_buffer.h |
Rasmus Brandt | b291da8 | 2021-04-16 08:17:04 | [diff] [blame] | 177 | [quantization-wiki]: https://en.wikipedia.org/wiki/Quantization_(signal_processing) |