Riding the Media Bits

Last update: 2014/08/12

Riding the media bits



Inside MPEG-4 - Part B


An overview of the technical content of MPEG-4 Visual and Audio components.

MPEG-4 Visual provides a coding algorithm for natural video that is capable of operating from 5 kbit/s with a spatial resolution of QCIF (144x176 pixels) scaling up to bitrates of some Mbit/s for ITU-R 601 resolution pictures (288x720@50Hz and 240x720@59.94 Hz). Additionally the Studio Profile addresses an operation range in excess of 1 Gbit/s. It is ITU-T H.263 compatible in the sense that a basic H.263 bitstream is correctly decoded by an MPEG-4 Video decoder. 

As mentioned before, MPEG-4 Video supports conventional rectangular images and video (upper portion of Fig. 1 below) as well as images and video of arbitrary shape (lower portion of figure).

Fig. 1 - The MPEG-4 Video Core and the Generic MPEG-4 Coder

The coding of conventional images and video is similar to conventional MPEG-1/2 coding. It involves motion prediction/compensation followed by texture coding. For content-based functionalities, where the image sequence input may be of arbitrary shape and location, coding shape and transparency information is encoded as well. Shape may be either represented by an 8-bit transparency component - which allows the description of transparency if one Video Object (VO) is composed with other objects - or by a binary mask.

The basic coding structure is represented in the figure below. This involves shape coding (for arbitrarily shaped VOs) and motion compensation as well as DCT-based texture coding (using standard 8x8 DCT or shape adaptive DCT).

Fig. 2 - The MPEG-4 Video coding scheme

If the a-priori knowledge of the scene is exploited MPEG-4 Video can offer unexpectedly high compression ratios. In Fig. 3 coding of the top left figure would require a considerable amount of information but, if it is possible to separate the background and the sprite (top right), coding of the picture below can be achieved with relatively few bit/s.

Fig. 3 - Background and sprites in MPEG-4 Video

MPEG-4 Audio provides complete coverage of the bitrate range of 2 to 64 kbit/s. Good coded speech is obtained already at 2 kbit/s and transparent quality of monophonic music sampled at 48 kHz and 16 bits/sample is obtained at 64 kbit/s. Three classes of algorithms are used in the standard. The first covers the low bitrate range and has been designed to encode speech. The second can be used in the midrange to encode both speech and music. The third can be used in the high bitrate range and can be used for any audio signal.

MPEG-4 Audio contains a large set of coding tools through which it is possible to construct several audio and speech coding algorithms

  • MPEG-4 AAC, an extension of MPEG-2 AAC
  • Twin Vector Quantisation (VQ)
  • Speech coding based on Code Excited Linear Predictive (CELP) coding and on Parametric representation
  • Various usages of the Spatial Band Replication (SBR) technologies to provide high quality music at ever reduced bitrate, such as High Efficiency AAC (HE AAC)
  • Various forms of audio lossless coding.

MPEG-4 AAC is MPEG-2 AAC with the addition of one tool: Perceptual Noise Substitution (PNS). This tool identifies and codes as random noise segments of spectral coefficients that appear to be noise-like. This is achieved by indicating that PNS is used and the value of the average power of the noise. A decoder uses a pseudo-random noise generator weighted by the signaled power value to reconstruct those coefficients.

Fig. 5 - The MPEG-4 AAC encoder

Spectral Band Replication (SBR) has been added to MPEG-4 Audio to provide a significant improvement of its performance. SBR consists of replicating the highband, i.e. the high frequency part of the spectrum. A small amount of data representing a parametric description of the highband is encoded and used in the decoding process. The data rate is by far below the data rate required when using conventional AAC coding of the highband.

Fig. 5 - The SBR operating principle