Riding the Media Bits

Last update: 2011/08/21

Riding the media bits

 

 

Inside MPEG-21

 

The technologies inside the MPEG-21 framework.


Part 1 of the MPEG-21 standard has the title "Vision, Technologies and Strategy". It is not a standard but a Technical Report because it contains a description of part of the content of MPEG-21.

Digital Item Declaration (DID) is part 2. This normative part defines the technology supporting DIs. The purpose of the DID standard is to describe a set of abstract terms and concepts to form a useful model for defining DIs. The Digital Item Declaration Language (DIDL) is an XML language for defining a DI that provides for the standard representation in XML of a DI.

For each transaction we need a means to identify the object of the transaction. Part 3 of MPEG-21, called "Digital Item Identification" (DII), plays that role by providing the means to uniquely identify DIs. The role of DII is similar to the one played by International Standard Book Numbering (ISBN) for books and International Standard Serial Number (ISSN) for periodicals. Its scope includes: 

  • How to uniquely identify DIs and parts thereof (including resources); 
  • How to uniquely identify IP related to the DIs (and parts thereof); 
  • How to uniquely identify DIs; 
  • How to use identifiers to link DIs with related information such as descriptive metadata; 
  • How to identify different types of DIs. 

Part 4 of MPEG-21 is Intellectual Property Management and Protection (IPMP) Components provides the means to control the flow and usage of Digital Items throughout their lifecycle by specifying how to include IPMP information and protected parts of Digital Items in a DIDL document. The IPMP DIDL encapsulates and protects a part of the hierarchy of a Digital Item, and associates appropriate identification and protection information with it. A related piece of work, called MPEG-4 IPMP-eXtensions (IPMP-X), was started in 1999 as part of MPEG-4 and was completed in 2002. The same technology has been applied to MPEG-2 and has become part 11 of MPEG-2. IPMP-X defines standard ways of retrieving IPMP tools from remote locations, authenticating IPMP tools and exchanging messages between the tools used to protect a piece of content and a terminal that needs to process (e.g. decrypt, decode, present) the content. 

Already in the physical world we seldom have absolute rights to an object. In the virtual world, where the disembodiment of content from carriage augments the flexibility with which business can be carried out, this trend is likely to continue. That is why part 5 of MPEG-21 Rights Expression Language (REL) has been developed so that rights about a Resource can be expressed in a way that can be interpreted by a computer. 

The MPEG REL data model for a rights expression consists of four basic entities and their relationship. This basic relationship is defined by the MPEG REL assertion “grant”. Structurally, an MPEG REL grant consists of the following:

  • The principal to whom the grant is issued
  • The right that the grant specifies
  • The resource to which the right in the grant applies
  • The condition that must be met before the right can be exercised

This is depicted in the following figure

Fig. 1 - The MPEG-21 REL model

A right exists to perform actions on something. Today we use such verbs as: "display", "print", "copy" or "store" and, in a given context, we humans know what we mean. But computers do not and must be "taught" the meaning. That is why we have a Rights Data Dictionary (RDD) as part 6 of MPEG-21 that gives the precise semantics of all the verbs that are used in the REL in addition to a lot more verbs. 

In the digital world people should be allowed to do more than just find new ways of doing old businesses. Content and service providers used to know their customers very well. They used to know - even control - the means through which their content is delivered. Consumers used to know the meaning of well-classified services such as television, movies and music. Today we are having fewer and fewer such certainties: end users are less and less predictable, the same piece of content can reach them through a variety of delivery systems and can be enjoyed by a plethora of widely differing consuming devices. How can we cope with this unpredictability of end user features, delivery systems and consumption devices? This is where Digital Item Adaptation (DIA), part 7 of MPEG-21, comes to help, providing the means to describe how a (resource in a) DI should be adapted (i.e. transformed) so that it best matches the specific features of the User, the Network and the Device. 


Fig. 2 - The MPEG-21 DIA model

As shown in the figure, a Digital Item Adaptation Description specified by part 7 of MPEG-21 can be used by (non-normative) "resource adaptation" and "descriptor adaptation" engines to produce adapted Digital Items.

Part 8 contains the usual Reference Software of the entire MPEG-21 standard. Part 9 is the MPEG-21 File Format, the first "transport" format of a DI. The MPEG-21 file format inherits several concepts of MP4, as a DI may be a complex collection of information that contains still and dynamic media, information related to the DI such as metadata, layout information, etc. 

A DID is a static declaration defined using the DIDL. Digital Item Methods (DIM) are defined in part 10 "Digital Item Processing" (DIP) and are meant to allow Users (authors, publishers, distributors, etc.) of the DI to add functionality to a DID, such as specifying a selection of preferred procedures by which the DI should be handled at the level of the DI itself. On receipt of a DID, a list of DIMs that can be applied to the DI is presented to the User. The User chooses one DIM that is then executed by the DIP Engine. As an example, for a music album DI an "AddTrack" DIM might be provided such that a user can add a new track in the preferred format. 

Back to part 3, getting an identifier for a DI is important, but how are we going to put a "virtual sticker" on it to carry the identification? This is where Persistent Association Technologies may be of help. SDMI struggled with the selection of very advanced "Phase I" and "Phase II" screening technologies and its task was made harder by the fact that no established methods existed to assess the performance of these technologies. That is why MPEG-21 contains part 11 called "Evaluation Methods for Persistent Association Technologies". This is not meant to be a normative standard but a Technical Report, i.e. something similar to a "best practice" for those who need to assess the performance of watermarking and related technologies. 

Part 12 is called  Test Bed for MPEG-21 Resource Delivery. It is a comprehensive environment that can be used to test the effect of different conditions for delivery of media resources.

During the long study period that eventually led to the acquisition of the technologies required to develop a scalable video coding standard, it was thought that novel technologies would be required for such a form of video coding that would not fit in the MPEG-4 standard as, e.g., done by Advanced Video Coding (AVC). Part 13 Scalable Video Coding (SVC) was originally intended to host such a standard. However, when it became clear that SVC would be an extension of AVC, as opposed to a new standard, this part 13 was moved to MPEG-4 part 10 as an amendment to AVC and MPEG-21 Part 13 became void.

Conformance of an implementation is of course needed for MPEG-21 technologies as well. Therefore the purpose of Part 14 Conformance is to provide the necessary test methodologies and suites to be used to assess the conformity of an MPEG-21 entity (typically an XML document) and a decoder (typically a parser) to the relevant MPEG-21 standard.

Certain application domains require a technology that can generate an event every time a Digital Item is processed and an action identified by a verb is performed. The technology achieving this is specified in Part 15 Event Reporting (ER).

Fig. 3 - The MPEG-21 ER model

A User places an Event Report Request (ERR) in a DI. When the DI is received by a peer, the ERR is passed to an ERR Receiver and parsed. An Event Receiver senses all internal and external events and passes them to an ER Builder that creates a message and dispatches it to the address indicated in the ERR.

Since a few years (starting from MPEG-7) MPEG has standardised a technology that allows the lossless conversion of a typically very bulky XML document to a binary format, while preserving the ability to efficiently parse the binarised XML format. The technology was originally part of MPEG-7 Systems but was later moved to MPEG-B Part 1 Binary XML format (BiM). BiM is now essentially just a reference in MPEG-7 Part 1 Systems and MPEG-21 Part 16 Binary format.

There are cases where it is necessary to identify a specific fragment of a resource as opposed to the entire set of data. Part 17 Fragment Identification (FID) specifies a normative syntax for URI Fragment Identifiers to be used for addressing parts of a resource from a number of Internet Media Types.

While Part 9 provides a solution to transport a Digital Item in a file, Part 18 Digital Item Streaming (DIS) provides the technology to transport a DI over a streaming mechanism (e.g. in broadcasting when transport is done using MPEG-2 Transport Stream or over IP networks when RTP/UDP/IP is used.

DIS enables the incremental delivery of a Digital Item (DID, metadata, resources) in a piece-wise fashion and with temporal constraints so that the receiver may incrementally consume the DI. This is achieved by using the Bitstream Binding Language (BBL). BBL defines syntax and semantics of instructions applied to fragment a DI and map it into a plurality of delivery channels each employing a specific transport protocol.

The Bitstream Binding Language and Digital Item Streaming

Part 19 Media value Chain Ontology (MVCO) provides a normative core model of a knowledge domain that spans the full media value chain and that can be extended to represent other specialisations. Thus, the MVCO provides a common backbone for interoperable standard services and products (metadata, licenses, attribution etc.) offering new for new business model opportunities to a broad set of interconnected value chains and niches.

Part 20 Contract Expression Language (CEL) provides a digital representation standard for the agreements about both transactions of content packed as Digital Items as well as services provided around these contents.
The standard cover contracts about transactions of content as MPEG-21 Digital Items and the provision of MPEG-21-based services, such as delivery, identification, encryption, search and others.

Part 20 Media Contract Ontology (MCO) specifies an ontology for media contracts.