Last update: 2011/08/21
T
At the Tokyo meeting in July 1995 Hiroshi Yasuda showed up and proposed to address the coding of information that is partly natutal (e.g. a video, a music) and partly synthetic (e.g. 2D and 3D graphics). Work started in earnest, first as part of the AOE group and then, after Cliff Reader left MPEG, in the Synthetic-Natural Hybrid Coding subgroup. Peter Doenges of Evans and Sutherland, a company that had played a major role in the early years of development of the 3D Graphics industry. The results of the first years of work were 2D and 3D Graphics (3D mesh) compression, Face and Body Animation (FBA), Text-To-Speech (TTS), (Structured Audio Orchestra Language (SAOL), a language used to define an “orchestra” made up of “instruments” downloaded in the bitstream and Structured Audio Score Language (SASL), a rich language with significant more functionalities than MIDI.
The ‘facial animation object’ can be used to render an animated face. The face object contains a generic face with a neutral expression. This can be rendered as such. The shape, texture and expressions of the face are controlled by Facial Definition Parametres (FDP) and/or Facial Animation Parametres (FAP).
Fig. 1 - Face Definition Parametres
Upon receiving the animation parameters from the bitstream, the face can be animated: expressions, speech, etc. and FDPs can be sent to change the appearance of the face from something generic to a particular face with its own shape and texture. If so desired, a complete face model can be downloaded via the FDP set. Face models themselves are not mandated by the standard. It is also possible to use specific configurations of the lips and the mood of the speaker.
The Body is an object capable of producing virtual body models and animations in the form of a set of 3D polygonal meshes ready for rendering. Two sets of parameters are defined for the body: the Body Definition Parametre (BDP) set, and the Body Animation Parametre (BAP) set. The BDP set defines the set of parametres to transform the default body to a customised body with its body surface, body dimensions, and (optionally) texture. The BAPs will produce reasonably similar high level results in terms of body posture and animation on different body models.
In the area of synthetic audio two important technologies are available. The first is a Text To Speech (TTS) Interface (TTSI), i.e. a standard way to represent prosodic parameters, such as pitch contour, phoneme duration, and so on. Typically these can be used in a proprietary TTS system to improve the synthesised speech quality and to create, with the synthetic face, a complete audio-visual talking face. The TTS can also be synchronised with the facial expressions of an animated talking head as in the figure below.
Fig. 2 - TTS-driven Face Animation
The second technology provides a rich toolset for creating synthetic sounds and music, called Structured Audio (SA). Using newly developed formats to specify synthesis algorithms and their control, any current or future sound-synthesis technique can be used to create and process sound in MPEG-4. The sound quality is guaranteed to be exactly the same on every MPEG-4 decoder.
At the Melbourne meeting in October 1999 Euee S. Jang, then with Samsung, took over from Peter to complete FBA address the important area of 3D mesh compression, in particular efficient encoding of generic 3D model animation framework, later to be called Animation Framework eXtension (AFX) and to become part 16 of MPEG-4. At the Fairfax meeting in March 2002 Mikaël Bourges-Sévenier took over from Euee to continue AFX and develop Part 21 MPEG-J Graphics Framework eXtensions (GFX).
The MPEG‑4 Animation Framework eXtension (AFX) —ISO/IEC 14496‑16— contains a set of 3D tools for interactive 3D content operating at the geometry, modeling and biomechanical level and encompassing existing tools previously defined in MPEG-4. The tools available in AFX and related illustrations are summarized in Figure 1.
Tool name |
Objective |
Example |
|
| Parametric curve and surface representations | Delivering smooth shapes with a high level deformation control | ![]() |
|
| Subdivision Surfaces | Simplification and progressive transmission of large scale models | ![]() |
|
| MeshGrid Surface | Representing generic models preserving volume information, and offering versatile manipulation features | ![]() |
![]() |
| Footprint Based Representation | Simplification and progressive transmission of object based on footprints (buildings, cartoons, etc) | ![]() |
|
| Depth Image-Based Representation | 3D photorealistic display of objects from a set of images | ![]() |
|
| Depth Image-Based Representation Version 2 | High-quality rendering of image- and point-based objects | ![]() |
![]() |
| Multi-Texture | Provide multiple textures for natural appearance together with view-adaptive real-time weighting | ![]() |
|
| Morphing space | Combining bilinear interpolation of several target shapes with a base shape in order to obtain precise deformations and smooth animation | ![]() |
![]() |
| Solid Modeling | Combining simple 3D primitives for a compact and exact analytical representation of manufactured and architectural models | ![]() |
|
| Deformers | Enabling controlled non rigid displacements | ![]() |
|
| Bone-Based Animation | Modeling and animation of generic articulated 3D objects | ![]() |
![]() |
Fig. 3 AFX tools
At the Hong Kong meeting in January 2005 Mahnjin Han took over from Mikaël especially to continue the AFX activity.
At the Marrakesh meeting in January 2007 Marius Preda took over from Mahnjin. Besides continuing the AFX activity Marius proposed a new area of work called 3D Graphics Compression Model with the goal of specifying an architectural model able to accommodate third-party XML based description of scene graph and graphics primitives, possibly with binarisation tools and with MPEG-4 3D Graphics Compression tools specified in MPEG-4 part 2, 11 and 16.

Fig. 4 3DG Compression Model
This layers of this architecture are (numbering from the low layer)
pl