Riding the Media Bits

Last update: 2011/08/21

Riding the media bits

 

 

No more audio-video-systems triads?

 

In its first 3 steps MPEG had a tight bundling of Audio and Video with a Systems layer. But the three can have independent lives.


It has always been a point of pride for MPEG to have been able to create the MPEG-1 Audio-Video-Systems package overcoming the informal, but nonetheless effective barriers that used to separate video coding people, audio coding people and those more "engineering minded" that MPEG calls "systems people". This "package approach" continued with MPEG-2, MPEG-4 and MPEG-7.

The arrival of the AVC standard, disconnected from an audio component and without a specific systems layer, but actually connectable with any audio and usable in MPEG-2 TS and IP and the appearance of new systems layer technologies and audio compression ideas led to a decision of establishing 3 new "containers" of systems, video and audio standards. the 3 containers were nicknamed MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio).

MPEG-B

Part 1 “Binary MPEG format for XML” (BiM) was originally developed a the technology to compress MPEG-7 Ds and DSs,then made a generic technology and moved to MPEG-B part 1. It provides a standard set of generic technologies to transmit and compress XML documents, addressing a broad spectrum of applications and requirements. It relies on schema knowledge between encoder and decoder in order to reach high compression efficiency, and provides fragmentation mechanisms for ensuring transmission and processing flexibility.

Part 2 “Fragment Request Unit” specifies a technology enabling a terminal to request XML fragments of immediate interest. This significantly reduces processing and storage requirements at the terminal and can enable applications on constrained devices that would not otherwise be possible.

Part 3 “XML Representation of IPMP-X Messages” provides an XML representation of the IPMP-X messages defined in MPEG-4 part 13 with extensions.

Part 4 “Codec Configuration Representation” provides a compressed digital representation of a video decoder and of the corresponding bitstream, assuming that the receiving terminal shares a library of video coding tools with the transmitter.

Part 5 “Bitstream Syntax Description Language” provides a normative grammar to describe, in XML, the high-level syntax of a bitstream. The resulting XML document is called a Bitstream Syntax Description (BSD). BSD does replace the original binary format and, in most cases, it does not describe the bitstream on a bit-per-bit basis, but rather its high-level structure, e.g., how the bitstream is organized in layers or packets of data. BSD is itself scalable, i.e. it may describe the bitstream at different syntactic layers (e.g., finer or coarser levels of detail), depending on the application.

MPEG-C

Part 1 “Accuracy specification for implementation of integer-output IDCT” specifies the IDCT accuracy that is equivalent to or extends the IEEE 1180 standard which has been withdrawn.

Part 2 “Fixed-point 8x8 inverse discrete cosine transform and discrete cosine transform” specifies a particular fixed-point approximation to the ideal 8x8 IDCT and DCT function, fulfilling the 8x8 IDCT conformance requirements for the MPEG-1, MPEG-2 and MPEG-4 part 2 video coding standards.

Part 3 “Auxiliary Video Data Representation” specifies how auxiliary data such as pixel-related depth or parallax values, are to be represented when encoded by MPEG video standards in the same way as ordinary picture data.

Part 4 “Video Tool Library” contains a collection of descriptions of video coding tools, called Functional Units, as referenced in MPEG-B Part 4..

MPEG-D

Part 1 “MPEG Surround” provides an efficient bridge between stereo and multichannel presentations in low-bitrate applications. The MPEG Surround technology supports very efficient parametric coding of multi-channel audio signals, so as to permit transmission of such signals over channels that typically support only transmission of stereo (or even mono) signals. Moreover, MPEG Surround provides complete backward compatibility with non-multichannel audio systems.

Part 2 “Spatial Audio Object Coding” represents several audio objects by first combining the object signals into a mono or stereo signal, whilst extracting parameters from the individual object signals based on knowledge of human perception of the sound stage.  These parameters are coded as a low bitrate side-channel that the decoder uses to render an audio scene from the stereo or mono down-mix, such that the aspects of the output composition can be decided at the time of decoding.

Part 3 “Unified speech and audio coding”, a standard  defining a single technology that codes speech, music, and speech mixed with music, and that is consistently as good as the best of the state-of-the-art speech coders such as Adaptive Multi Rate – WideBand plus (AMR-WB+) and the state-of-the-art music coders (HE-AAC V2) in the 24 kbit/s stereo to 12 kbit/s mono operating range.