MPEG-H Inside – 2D Video – Riding the Media Bits

The video coding layer of HEVC is based on the typical “hybrid” approach (inter- and intra-picture prediction and 2D transform coding) with some key differences that enhance compression. Figure 1 gives a high-level reference diagram

Figure 1 – HEVC encoder block diagram

Here is a list of the main technical elements and innovations.

Technology	Features
Coding Tree Units and Coding Tree Block structure	Instead of the macroblock Coding tree unit (CTU) consisting of coding tree blocks (CTBs) for luma and chroma. Luma CTB can have a size of16x16, 32×32, or 64×64 samples. CTBs are partitioned into coding blocks (CBs), signaled via a quadtree structure. Coding unit (CU) includes one luma CB and the two corresponding chroma CBs and associated syntax elements. Below the CU level: prediction units (PUs) and a tree of transform units (TUs). Inter/intra-picture encoding decision is made at the CU level.
Transform Units and Transform Blocks	The prediction residual is coded using block transforms. A transform unit (TU) tree structure has its root at the CU level, where the CBs may be further split into smaller transform blocks (TBs). Integer basis functions approximating the discrete cosine transform (DCT) are defined for dyadic TB sizes from 4×4 to 32×32. For the 4×4 transform of intra-picture prediction residuals, an integer transform derived from the discrete sine transform (DST) is additionally specified.
Motion compensation	Quarter-sample precision, 7-tap or 8-tap filters for interpolation of any fractional-sample positions.
Intra-picture prediction	Decoded boundary samples from adjacent blocks are used as prediction reference data for spatial prediction in PB regions when inter-picture prediction is not performed.
Entropy coding	Five generic binarisation schemes for symbol encoding; specification of which of these is applied to each type of syntax element. Context-adaptive binary arithmetic coding (CABAC) used for entropy coding.
In-loop filtering	One or two filtering stages optionally applied (within the inter-picture prediction loop) before writing the reconstructed picture into the decoded picture buffer. A deblocking filter (DBF) is also used.
Slices, tiles and wavefronts	A slice is a series of CTUs that can be decoded independently from other slices of the same picture (except for in-loop filtering of the edges of the slice, and except for the case of “dependent slices” as described below). To enable parallel processing and localized access to picture regions, the encoder can partition a picture into rectangular regions called tiles.
High-level syntax	Above the coding layer, many of the high-level syntax features of AVC have been retained or extended. Important elements of high-level syntax are specifications of access structures, management of coded and decoded picture buffers, signaling of video usability information (VUI) and supplemental enhancement information (SEI).
Extended format and quality ranges	4:2:0, 4:2:2 and 4:4:4color sampling; components represented by a maximum bit depth up to 16
Multi-view and scalable coding	the high-level syntax has sophisticated layering mechanisms allowing to establish hierarchical bitstream structures. This supports stereo/multi-view coding, where a decoder of a dependent view can refer to previously decoded pictures from another view

The performance of HEVC was verified for a range of video sequences not used during the development of the standard. For all more than a 50 % bit rate savings was observed – with an average of 60 % for the high quality range.