Riding the Media Bits

Last update: 2011/08/21

Riding the media bits

 

 

Inside MPEG-4 - Part C

 

An overview of the other parts of the MPEG-4 standard.


Part 4 is Conformance and part 5, called Reference Software, u nlike MPEG-1 and -2 is not a Technical report (TR) with an informational value, but an International Standard (see here for more about this). Part 6 “Delivery Multimedia Integration Framework” (DMIF) provides a standard interface to access various transport mechanisms as described before. MPEG-4 also includes Part 7 “Optimised software for MPEG-4 tools” providing examples of reference software that not just implements the standard correctly but also in optimised form. Part 8 “4 on IP framework” complements the generic MPEG-4 RTP payload defined by IETF as RFC 3640 [8]. Part 9 “Reference Hardware Description” where the reference software is in Very-High-Speed Integrated Circuit (VHSIC) Hardware Description Language (VHDL) to synthesise VLSI chips.

MPEG 1 and MPEG-2 assume that information in decoded form leaves the decoder as sequences of PCM samples but the standards are silent on what is done with them. As described before MPEG-4 Scene Description (part 11), originally bundled with Part 1, provides technologies for the new functionality of “composing” different information elements in a “scene”.

The original technology is called Binary Format for MPEG-4 Scenes (BIFS) of which there exists a Java powered version called MPEG-J. A newer technology with similar functionalities, but restricted to 2D scenes, is provided by Part 20 “Lightweight Application Scene Representation” (LASeR).

MPEG-4 provides standard solutions for coding of synthetic visual information for 3D graphics. These tools are specified in Part 2 - Face and Body Animation and 3D Mesh Compression, Part 11 - Interpolator Compression - and 16 - a complete framework, called Animation Framework eXtension (AFX), for efficiently coding the shape, texture and animation of interactive synthetic 3D objects. AFX is an attempt at unifying MPEG-4’s tools related to 3D graphics.

An important component of AFX is 3D Mesh Coding to provide efficient encoding of 3-D polygonal meshes with

  • Incremental representation: to enable a decoder to reconstruct a number of faces in a mesh proportional to the number of bits in the bit stream that have been processed.
  • Error resilience: to enable a decoder to partially recover a mesh when subsets of the bit stream are missing and/or corrupted.
  • Level of Detail (LOD) scalability: to enable a decoder to reconstruct a simplified version of the original mesh containing a reduced number of vertices from a subset of the bit stream with the advantage of reducing the rendering time of objects which are distant from the viewer (LOD management) and enabling less powerful rendering engines to render the object at a reduced quality.

AFX introduces as well an advanced animation model for articulated models, a hierarchical representation of urban environments and several modern coding tools for 3D data.

Part 25 “3D Graphics Compression Model” specifies an architectural model able to accommodate third-party eXtensible Markup Language (XML) based description of scene graphs and graphics primitives with (potential) binarisation tools and with MPEG-4 3D Graphics Compression tools.

Synthetic Audio, called “Structured Audio”, is included in part 3. It provides the means to code sound using structured descriptions that are interpreted by a Structured Audio decoder to perform music and sound-effect synthesis. The Structured Audio Tools are: Structured Audio Orchestra Language (SAOL) providing synthesis methods, Structured Audio Score Language (SASL/MIDI) providing control parameters and Structured Audio Sample Bank Format (SASBF) providing the actual sample data.

The ISO Base Media File Format (part 12 of MPEG-4) is designed to contain timed media information for a presentation in a flexible, extensible format that facilitates interchange, management, editing, and presentation of the media. These may be ‘local’ to the system containing the presentation, or may be via a network or other stream delivery mechanism. Part 14 “MP4 File Format” extends the File Format to cover the needs of MPEG-4 scenes while part 15 “AVC File Format” supports the storage of AVC and MVC bitstreams.

The Streaming Text Format (part 17 of MPEG-4) defines text streams that are capable of carrying Third Generation Partnership Program (3GPP) Timed Text (specified in 3GPP TS 26.245). To transport the text streams, a flexible framing structure is specified that can be adapted to the various transport layers, such as RTP/UDP/IP and MPEG-2 Transport and Program Stream, for use in media such as broadcast and optical discs.

Among the remaining MPEG-4 technologies the Open Font Format (part 22) is worth mentioning. Thus is the result of the action taken by MPEG when it received a request from rights holders to convert the widely adopted OpenType specification to an ISO standard. As is the rule with MPEG standards, the OpenType specification was converted to a Working Draft and then balloted through the ISO-specified process of Committee Draft (CD), Final Committee Draft (FCD) and Final Draft International Standard (FDIS) stages.

The figure below provides a conceptual diagram of the structure of an MPEG-4 decoder with the role played by the main MPEG-4 technologies.

Figure 3 – MPEG-4 reference diagram

With reference to the figure the parts of the MPEG-4 standard specify the blocks as follows:

  • Part 1 specifies “MPEG-4 stream decoder”
  • Part 2 specifies “Video decoder”
  • Part 3 specifies “Audio decoder”
  • Part 6 specifies “Interaction”
  • Part 8, 12, 14 and 15 specify “Transport”
  • Part 11 and 20 specify “Composition decoder” and “Composition”
  • Part 16 specifies “3DG decoder”
  • Part 17 specifies “Stream text decoder”
  • Part 18 and 22 specify “Font decoder”
  • Part 19 specify “Synthesised texture decoder”
  • Part 21 specifies “Rendering”