Riding the Media Bits

Last update: 2011/08/21

Riding the media bits

 

 

No more audio-video-systems triads?

 

In its first 4 steps MPEG had a tight bundling of Audio and Video with a Systems layer. But the three can have independent lives.


It has always been a point of pride for MPEG to have been able to create the MPEG-1 Audio-Video-Systems package overcoming the informal, but nonetheless effective barriers that used to separate video coding people, audio coding people and those more "engineering minded" that MPEG calls "systems people". This "package approach" continued with MPEG-2, MPEG-4 and MPEG-7.

The arrival of the AVC standard, disconnected from an audio component and without a specific systems layer, but actually connectable with any audio and usable in MPEG-2 TS and IP and the appearance of new systems layer and audio compression technologies led to a decision of establishing 3 new "container standards" for systems, video and audio. The 3 containers were nicknamed MPEG-B (Systems), MPEG-C (Video) and MPEG-D (Audio).

MPEG-B

Part 1 “Binary MPEG format for XML” (BiM) was originally developed as the technology to compress MPEG-7 Ds and DSs. It was then converted to a generic technology and moved to MPEG-B part 1.BiM provides a standard set of generic technologies to transmit and compress XML documents, addressing a broad spectrum of applications and requirements. It relies on shared knowledge between encoder and decoder of the schema in order to reach high compression efficiency, and provides fragmentation mechanisms to ensure transmission and processing flexibility.

Part 2 “Fragment Request Unit” specifies a technology to enable a terminal to request XML fragments of immediate interest. This significantly reduces processing and storage requirements at the terminal and can enable applications on constrained devices that would not otherwise be possible.

Part 3 “XML Representation of IPMP-X Messages” provides an XML representation of the IPMP-X messages defined in MPEG-4 part 13 with extensions. This work was a contribution from the Digital Media Project.

Part 4 “Codec Configuration Representation” provides a compressed digital representation of a video decoder and of the corresponding bitstream, assuming that the receiving terminal shares a library of video coding tools with the transmitter.

Part 5 “Bitstream Syntax Description Language” provides a normative grammar to describe, in XML, the high-level syntax of a bitstream. The resulting XML document is called a Bitstream Syntax Description (BSD). BSD does replace the original binary format and, in most cases, it does not describe the bitstream on a bit-per-bit basis, but rather its high-level structure, e.g., how the bitstream is organized in layers or packets of data. BSD is itself scalable, i.e. it may describe the bitstream at different syntactic layers (e.g., finer or coarser levels of detail), depending on the application.

Part 6 "Common Encryption Format for ISO Base Media File Format" defines a way to encrypt audio, video, etc. in files of the ISO base media file format family. To reduce or even eliminate the implementation complexity caused by having duplicate files and formats for the same content, a common encryption format is used so that a single media asset can be used by several services and devices using different digital rights management systems.

Part 7 is void.

Part 8 "Coding independent media description code points" defines various code-points and fields of a video or audio stream that are bit-rate and compression independent, to avoid the need to repeat, possibly with some small changes, the same data in different stabndards. The code-points describe the characteristics of the signal before compression or after decompression ofthe signal.

Part 9 "Common Encryption Format for MPEG-2 Transport Stream" plays a similar role as Part 6 when the MPEG-2 TS transport mechanism is used instead of ISOBMFF.

MPEG-C

Part 1 “Accuracy specification for implementation of integer-output IDCT” specifies the IDCT accuracy that is equivalent to or extends the IEEE 1180 standard which has been withdrawn by the IEEE.

Part 2 “Fixed-point 8x8 inverse discrete cosine transform and discrete cosine transform” specifies a particular fixed-point approximation to the ideal 8x8 IDCT and DCT function, fulfilling the 8x8 IDCT conformance requirements for the MPEG-1, MPEG-2 and MPEG-4 part 2 video coding standards (MPEG-4 part 10 uses an integer transform).

Part 3 “Auxiliary Video Data Representation” specifies how auxiliary data such as pixel-related depth or parallax values, are to be represented when encoded by MPEG video standards in the same way as ordinary picture data.

Part 4 “Media Tool Library” contains a collection of descriptions of video and 3DG coding tools, called Functional Units, as referenced in MPEG-B Part 4.

MPEG-D

Part 1 “MPEG Surround” provides an efficient bridge between stereo and multichannel presentations in low-bitrate applications. The MPEG Surround technology supports very efficient parametric coding of multi-channel audio signals, so as to permit transmission of such signals over channels that typically support only transmission of stereo (or even mono) signals. Moreover, MPEG Surround provides complete backward compatibility with non-multichannel audio systems.

Fig. 1 shows that MPEG Surround is essentially adding suitable spatial information to a legacy encoder.

Fig. 1 - A model of MPEG Surround encoder-decoder

Part 2 “Spatial Audio Object Coding” represents several audio objects by first combining the object signals into a mono or stereo signal, whilst extracting parameters from the individual object signals based on knowledge of human perception of the sound stage.  These parameters are coded as a low bitrate side-channel that the decoder uses to render an audio scene from the stereo or mono down-mix, such that the aspects of the output composition can be decided at the time of decoding.

Part 3 “Unified speech and audio coding”, a standard  defining a single technology that codes speech, music, and speech mixed with music, and that is consistently as good as the best of the state-of-the-art speech coders such as Adaptive Multi Rate – WideBand plus (AMR-WB+) and the state-of-the-art music coders (HE-AAC V2) in the 24 kbit/s stereo to 12 kbit/s mono operating range.