Archives: 2015-August-20

Inside Digital Media Devices

In its quest to provide the necessary standards for the Digital Media industry, MPEG started the MPEG Multimedia Middleware (M3W), a complete set of standards defining the software environment of a multimedia device. When responses to the M3W CfP were received MPEG selected the proposal coming from the Universal Home API (UHAPI) consortium that had developed a pretty complete solution.

M3W is ISO/IEC 23004 and nicknamed MPEG-E. It is organised in eight parts

Part 1 “Architecture” describes the M3W architecture and APIs. In M3W there are 3 distinct layers:

  • Applications
  • Middleware consisting of M3W and other middleware. M3W can be separated into two parts
    • Functional providing the applications and other middleware with a multimedia platform.  (‘Functional’ indicates multimedia ‘functionality’)
    • Extra-functional providing the means to manage the lifetime of, and interaction with, realisation entities. Furthermore this enables management of extra-functional (‘support’) properties, e.g. resource management, fault management and integrity management;
  • Computing platform: the API is outside of the scope of M3W.

M3W

Fig. 1 – The software structure of an M3W system

The scope of M3W is limited to the specification of the M3W API and realisation technology. In Fig. 1, this means the L-shaped yellow part and its interfaces depicted by the green line.

Part 2 “Multimedia API” specifies access to the functionalities provided by conforming multimedia platforms such as Media Processing Services (including coding, decoding and trans-coding), Media Delivery Services (through files, streams, messages), Digital Rights Management (DRM) Services, Access to data (e.g. media content), and Access to, Edit and Search Metadata.

Part 3 “Component Model” specifies a technology enabling cost effective software development and an increase in productivity through software reuse and easy software integration.

Part 4 “Resource and Quality Management” specifies a framework for resource management aiming to optimise and guarantee the Quality of Service that is delivered to the end-user in a situation where resources are constrained.

Part 5 “Component Download” specifies a download framework enabling controlled download of software components to a device.

Part 6 “Fault Management” specifies a framework for fault management with the goal to have a dependable/reliable system in the context of faults. These can be introduced due to upgrades and extensions out of the control of the device vendor, or because it is impossible to test all traces and configurations in today’s complex software systems.

Part 7 “System Integrity Management” specifies a framework for integrity management with the goal to have controlled upgrading and extension, in the sense that there is a reduced chance of breaking the system during an upgrade/extension or to provide the ability to restore a consistent configuration.

Part 8 “Reference Software and Conformance” is the usual complement as with the other MPEG standards.

Part 2 of M3W does not define a multimedia API but how one can be called from an M3W environment. The job of ISO/IEC 23006 Multimedia Service Platform Technologies ( MPEG-M) standard is to provide that multimedia API and more.

Let’s start from Fig. 2, used as a reference in Part 1 “Architecture“.

MPEG-M_architecture

Fig. 2 – The MPEG-M architecture

When an Application calls the API defined in part 2 of MPEG-M “MPEG Extensible Middleware” (MXM) to access the middleware functionbalities, different possibilities exist:

  1. The App calls just one local Technology Engines (TE). A TE is a module providing defined functionalities, such as a Media Framework to play a video. MXM defines some high level API and provides placeholders to define new ones. Part 3 of MPEG-M “Reference Software and Conformance” provides software implementations of a range of TEs released with a Berkeley Software Distribution (BSD) licence;
  2. The App calls a chain of local TEs. This TE serialisation is called “Technology Orchestration”;
  3. The App calls just one Protocol Engine (PE). A PE is an implementation of an Elementary Service (ES) such as Create Licence which in turn calls just one or a sequence of local or remote TEs. Part 4 of MPEG-M “Elementary Services” defines a set of ESs and the corresponding PEs;
  4. The App calls a chain of PEs. Part 5 of MPEG-M “Service Aggregation” defines a machine-readable representation of the PE workflow that represents the “Service Aggregation” implied by the sequence of PEs.

Figure 3 shows an Application calling the PEa→PEb→PEc Aggregated Service and where PEa calls just one TE, PEb calls 3 Orchestrated TEs and PEc calls 2 Orchestrated TEs. Typically a special TE called Orchestrator drives the sequence of TEs to accomplish the goal.

MPEG-M_aggregation&orchestration

Fig. 3 – MPEG-M Aggregation and Orchestration

The example of Fig. 4 explains how Elementary Services can be aggregated to provide full-fledged services. Assuming that there is a Service Provider (SP) for each Elementary Service, a User may ask the Post Content SP to get a sequence of songs satisfying certain Content and User Descriptions.

 MPEG-M_pt4

Fig. 4 – A possible services chain centred around Post Content SP

End User would contact Post Content SP who would get appropriate information from Describe Content SP and Describe User SP to prepare the sequence of songs using its internal logic. He would then get the necessary licences from Create Licence SP. The sequence (“titles”) of songs would then be handed over to Package Content SP. Package Content SP will get the songs (“Resources”) from Store Content SP and hand over the Packaged Content to Deliver Content SP who will stream the Packaged Content to End User.

Part 4 Elementary Services specifies a set of standard Elementary Services and related protocols to enable distributed applications to exchange information about entities playing a role in digital media services (end users and service providers), e.g. Content, Contract, Device, Event and User, and the processing that a party may wish to execute on those entities, such as Authenticate, Create, Deliver, Describe, Identify, Negotiate, Process, Request, Search and Transact.

It is clear that in the real world Service Providers would not be able to exploit the potential of the standard if they were confined to only offer “Elementary Services”. Therefore MPEG-M part 5 provides the additional standard technology that allows the standard combination of Elementary Services to create Aggregated Services. In the example of Fig. 3 Post Content SP may wish to bundle, for example, the roles of Describe User and Package Content. MPEG-M Service Aggregation provides the means to set up a chain of Services, from one or more SPs, to respond to Users’ requests.


Getting Things Done My Way

Six months after leaving my employer of 32 years, I became CEO of CEDEO, an Italian company I had already a participation in, and made it the vehicle of my forays into the new business of technology use. Today the mission of CEDEO is

To conceive, design, implement, deploy and operate advanced digital media solutions based on smart combinations of new technologies and standards for the next phase of pervasive media-enabled communication

I claim there is substance besides the nice words. CEDEO started from the consideration that many video platforms exists today on the Web, but none, if not for limited contexts, offers platform users – i.e. any player in the video content life cycle: content creators, providers, distributors etc. – the means to do business with other platform users. In other words, support to business is regularly “client- server”, not “peer-to-peer”.

Letting peer do business in a peer-to-peer fashion is exactly what WimTV does, as shown in Figure 1.

WimTV_basic

Figure 1 – The basic WimTV services and applications

A user can

  1. Import his content to a private digital locker (WimBox)
  2. Post his content to WimTrade for other users to acquire rights to it or acquire rights to another user’s content
  3. In case he has not found what he was looking for, post to WimTrade a “request for content” that satisfies specific user requirements
  4. Respond to a request for content using WimTrade or, if on the move, WimLance (an app for Android and iOS)
  5. Post content of which he has rights (e.g. acquired from WimTrade) to his WebTV for users to watch (WimVod)
  6. Create an event and make it available for other people to watch live at the scheduled time and on demand after the event is over
  7. Create a schedule that streams a sequence of videos and/or live events as if it were a TV
  8. Watch WimTV content from his PC or SmartTV/mobile device using WimView
  9. Export content of which he has rights (e.g. acquired from WimTrade) from his WimBox

A user can populate his WimBox with assets in 3 different ways:

  1. By importing content of which he has rights to WimBox
  2. By acquiring rights to other users’ content from WimTrade
  3. By creating and posting a request for content (WimTrade) satisfying specific requirements.

A user can monetise his assets in WimBox in 3 different ways:

  1. Make public and private (i.e. restricted) offers in WimTrade
  2. Export content of which he has rights for a variety of delivery channels: Web, IPTV, 3G/4G, broadcast
  3. Post to WimVod and stream content of which he has rights to end users.

It should be no surprise that, say, YouTube and WimTV are worlds away. This can be seen from two excerpts of the YouTube and WimTV Terms of Service. The bottom line is that, in the case of YouTube you upload content and then YouTube has the right to do business on it, while in the case of WimTV you upload content and you keep the right to do business with it. You can do this more successfully on WimTV than on your own because the WimTV platform offers you a variety of services.

YouTube WimTV
…These Terms of Service apply to all users of the Service, including users who are also contributors of Content on the Service.
YouTube hereby grants you permission to access and use the Service as set forth in these Terms of Service, provided that You agree not to distribute in any medium any part of the Service or the Content without YouTube’s prior written authorization…
…When you Import and Post Content to the Platform, you remain the sole owner of the rights to that Content and Company does not claim ownership of the materials.
It is your responsibility to find one or more than one User that can help you distribute that content for eventual consumption by granting them appropriate Licences or to assume yourself the Role(s) that enable you to achieve that goal…

Fig. 2 depicts the essence of WimTVin a concise way: a platform where user A and user B can do business on the basis of the most fundamental of principles: they share a business model based on which user A gives right to his content to user B and user B pays a certain amount or promises to pay a certain amount or a share of his revenues every time a transaction based on the content takes place.

WimTV_user-to-user

Figure 2 – Relationships between WimTV users

The WimTV platform provides a number of services that users can call when they do business on the platform. The platform is based on a low layer of very robust Open Source Software (OSS) layers and a middleware that makes reference to MXM.

WimTV_architecture

Figure 3 – The WimTV architecture

The platform exposes a standard HTTP interface for browser-based access and a Representational State Transfer (REST) API that applications on a variety of devices can access.

These are some API

User mgmt Import/ export On demand Live Trading Payment
– Register
– Profile
– List
– Connections
-…¦
– Upload
– Progress Bar
– Download
– Delete
– …
– List Videos
– Detail Video
– Post
– Remove
-…
– Create Event
– List Events
– Stream Event
– EPG
-…¦
– List
– Post
– Remove
– Acquire
-…¦
– VoD
– Live
– Subscription
– Download
-…¦

An example of the type of services provided by the WimTV platform is illustrated in Fig. 4. Here a user (called Event Organiser) sets up an event and makes an agreement with another user (called Event Reseller) to distribute the event, including the level of revenue sharing between them. The two users register on WimTV specifying their roles and informing WimTV of their revenue sharing arrangement. For every end user buying a “ticket” WimTV will split the revenues among itself and the two users and, between the two users according to the revenue sharing agreement.

WimTV_live

Fig. 4 – WimTV live services

WimLive offers a number of features that are not found on othersystems

  1. Event Organiser and Event Reselles may coincide
  2. An arbitrary number of parties may claim rights to event
  3. Revenues are accredited as soon as a payment is effected
  4. WimTV acts as a trusted third party
  5. The service has low administrative costs (the burden is taken over by WimTV).

The platform offers WimTVPro, a WordPress, Joomla and Drupal plugin that replicates most WimTV functionalities on the user’sContent Management System (CMS). The user can then turn a website into an full-fledged video content management, distribution and commerce. Specifically

  • To manage and publish videos on web pages or streaming dedicated widgets
  • To publish on demand videos and video playlists, and stream live events with a single plugin
  • To monetise videos published with professional pay-per-view licenses

WimTV is based on a simple but effective business model:

  1. Free use of WimTV services, apps, API to develop new services and apps
  2. Monthly subscription for the use of WimTV storage and streaming resources required by WimTV services/apps (existing/customer-developed)
  3. Revenue sharing with WimTV when services/apps generate revenues to user
  4. Platform licensing to users who intend to develop apps and offer services with licensee’s brand.

Technologies For The Internet Of The Future

My ability to present – and get approved – European projects dwindled after I left Telecom Italia. In the early days of the DMP a convinced a group of European DMP members to propose a project on “DRM Conformance” (DRM-C). In the abstract we wrote that the project was meant to be

as a first EU leading initiative towards the establishment of DRM conformance (DRM-C) practices. The project will provide a means for the development of EU based conformance testing IP in support of future development of interoperable DRM tools, components and solutions thus furthering the state of the art in digital content creation and distribution.

My lowered appreciation of the new Brussels environment was enhanced when I saw the text of the evaluation of the “relevance of the proposal”

DRM and interoperability are definitely within scope but conformance testing is less of a priority of the Strategic Objective.

This answer defies my intellect. How can one say that interoperability is important and conformance testing is not? Maybe the reviewers didn’t know what a standard is (no problem), but if you are revieweing a project on interoperability of something you cannot say that the means to assess whether there is interoperability are not important. It would be like saying that laws are important but tribunals are not. Unless, I mean, interoperability is one of the nice words which everybody pays lip service to – and then keep on erecting walls.

I had a better chance with two project proposals that I submitted in later years. The first was CONVERGENCE, a project submitted by a consortium of 12 European partners and the second GreenICN, submitted by two separate 6-partner consortia – one from Europe and one from Japan – with a common work plan. Even though the two projects are independent, and sequential in time, they are tightly connected for the latterto be considered as the continuation of the former as far as CEDEO is concerned.

Internet has been quite successful in responding to the expanding needs of its users. However, the internet has shortcomings, e.g. the unpredictable bitrates available over long distances. Peer-to-Peer (P2P) Networks and Content Distribution Networks (CDN) are overlay networks that use content replication strategies to lower the probability to findnetwork bottlenecks. The interesting point is that they move away from a host-based to a content-based communication model.

Information Centric Networking (ICN) is an emerging communication paradigm in which each information unit is stored as an individually identified entity at an appropriate granularity level, so that it is possible to retrieve it by simply using its identifier. This is alternative to retrieving an information unit from a logical location whose physical address is provided by the Domain Name System (DNS). Therefore ICN can be seen as an incorporation of P2P and CDN functionalities into the network layer.

CONVERGENCE is an ICT system consisting of a set of interconnected peers and nodes collectively called “CONVERGENCE devices”.

convergence_devices

Figure 1 – CONVERGENCE devices

Fig. 1 depicts three different kinds of ICN device, namely CoNet, a network node; CoSec, a security node; and Peer, a user device. CoNet is a node device running the ICN stack of functionalities, CoSec is a node device responsible for handling the majority of cryptographic protocols and security related tasks and Peer is a layered device, based on a 3-layer architecture. From the bottom

  1. Computing Platform includes computing resources and operating system interfaces (API) able to run concurrently CoNet and CoSec implementations;
  2. CoMid is a middleware exposing interfaces (API) to applications;
  3. Apps include applications.

GreenICN uses and expands the same architecture with different names.

Peers, the means through which users access network functionalities, are essential components for an ICN. The GreenICN peer architecture is based on MPEG-M and uses a number of other MPEG standard technologies. A complete architecture of a Peer is depicted in Fig. 2.

Each of the 3 layers has its own structure and communicates with other layers via standard APIs.

GreenICN_peer

Figure 2 – GreenICN peer architecture

The GreeICN Peer exposes the High Level API defined in the Table below.

API name Function
Storage and retrieval Called by Store and Publish app
Pub/Sub Service Called by Store and Publish, and Subscribe and Play apps
Network Service Called by Peer Management app
Energy management Called by Peer Management app
Security Called by Peer Management app

Currently the GreenICN Peer supports the Protocol and Technology Engines defined in the Table below.

Engine Acron. Definition
Authorise User AUPE A Protocol Engine that authorises a user
Create Licence CLPE A Protocol Engine that enables a user to cr eate a REL licence
Identify Content ICPE A Protocol Engine that handles requests for content identification
Identify User IUPE A Protocol Engine that handles requests for user identification
Digital Item DITE Processes Digital Items
Event Report ERTE Processes Event Report Request and generates Event Reports
GreenNet GNTE Manages communication with the Green ICN networking layer
GreenTech GTTE Controls energy consumption to achieve a given energy consumption plan
Match MATE Matches metadata with a query
Media Framework MFTE Processes media data (encoding, transport and decoding)
Media Query MQTE Handles queries related to media content
Metadata MDTE Generates and parses metadata
Orchestrator ORTE Manages operations of a plurality of Technology Engines
Overlay OLTE Manages Peer and Semantic Overlay Network communication
Rights Expression RETE Generates and parses REL licences
Security SETE Performs security operation (symm/asymm encryption etc.)

The GreeICN Peer exposes the Low Level API defined in the Table below.

Name Function
Local resources Provides access to Computing Platform specific API
GreenNet Provides access to GreenICN networking
Security Provides access to trusted environment (e.g, smartcard) functionality
Energy Storage Provides access to energy management (battery etc.)

With reference to the Publish/Subscribe (PubSub) communication paradigm, and in particular the Publish/Subscribe Aplication Format (PSAF), GreenICN peers have the embedded ability to

  1. Perform matches between PIs and SIs
  2. Communicate any match to the specified users/peers

Currently a demo has been developed that supports a use case called “Managed advertising for real estate”. In the use case Toshiki, a real estate businessman, applies a windowing policy for his offers of vacation apartments. For two weeks his clients are pre-emptively offered available apartments but, after that, everybody, including non-clients, can access the apartments on offer. The actors of the Managed advertising for real estate are:

  1. Nozomu, an apartment owner
  2. Toshiki, a real estate businessman
  3. Mary, Toshiki’s administrative assistant
  4. John and Keiko, Toshiki’s customers.

The GreenICN peer offers a number of features that map very well with Toshiki’s promotion strategy as shown by Figure 5 where the main steps of the walkthrough can be identified:

  1. Toshiki requests Segmenter Peer to prepare video (DASH segmentation, encryption, data identification) and store it to GreenICN
  2. Upon notification that Segmenter Peer has completed its job, Toshiki requests Licence Peer to create licences for his customers (currently John and Keiko)
  3. When Keiko requests to play the video, her Peer request Licence Peer to provide the means to decrypt the video
  4. Upon reception of decryption key Keiko can decrypt and see Toshiki’s video.is is the sequence of steps:

Figure 4 depicts how Publish/Subscribe works in CONVERGENCE.

 

GreenICN_REA_demo

Figure 4 – The GreenICN Real Estate Advertising demo

The demo makes use of a number of MPEG technologies, besides MXM, namely

  1. Digital Item
  2. Digital Item Identifier
  3. Simple Metadata Profile
  4. MPEG Query Format
  5. Rights Expression Language
  6. Event Reporting

Multimedia Standards For An Evolving Market

Two years after the completion of the 1st edition of the AVC standard MPEG organised a Worshop on Future Directions in Video Compression (Busan, April 2005). The purpose of the workshop was to inquire about the prospects of a new generation of video compression standards. As no definite answer could be obtained, another Workshop was held 6 months later (Nice, October 2005) with similar results. Three years later two workshops in a row on New Challenges in Video Coding Standardization (Hannover, July 2008 and Busan, October 2008) brought announcements of new, but still undocumented, algorithms providing more than 30% compression. That was enough to convince MPEG that it was worth issuing a Call for Evidence for new video coding technology (Maui, HI, April 2009). Submissions were reviewed at the following meeting (London, July 2009) and promising results detected. A draft CfP for High Efficiency Video Coding (HVC) was produced at the Xi’an meeting (October 2009).

The JVT was closed at the Kyoto meeting (January 2009) because AVC related activities were in a downward spiral. Three weeks later I went to Geneva to meet the ITU-T Director to discuss the opening of a new collaborative team termed (rather redundantly) Joint Collaborative Team on Video Coding (JCT-VC). The object of the collaboration was a new video coding standard that was eventually eventually called High Efficiency Video Coding (HEVC). Unlike the JVT which could meet independently of MPEG and VCEG, the agreement included a clause that the JCT-VC should either meet as part of ITU-T SG 15 (on average every 9 months) or as part of MPEG (on average 2 out of 3 meetings) . In the former case MPEG should meet at least in the same city (typically Geneva) in order not to disrupt the network of relationships with the other MPEG subgroups.

The HEVC Call for Proposals was developed while the discussions on Requirements were progressing. The rationale for a new video coding standard were set by the need to provide more “quality” – in terms of increased temporal and spatial resolution, color fidelity, and amplitude resolution – in an affordable fashion. The reference numbers are so-called 4k, i.e. a spatial resolution of about 2000X4000 pixels sampled at 1024 quantisation levels (10 bits), progressive scan and a frame frequency of 100 Hz. Next to these “broadcast-oriented” applications, there were also requirements for high-quality video for LTE or 4G because cell phone screens were increasing in size and resolution (even though iPad, the first tablet, had still to be placed on CE shop shelves). The new video coding standard would be required to outperform AVC by at least 50%.

The requirements were reviewed jointly with VCEG and the two organisations eventually issued a Joint Call for Proposals on Video Compression Technology (Kyoto, January 2010). The call defined a set of test sequences progressively scanned at resolutions including 416×240 pixels, 1920×1080 pixels, and 4096×2048 pixels and a set of bitrates ranging between 256 kbit/s and 14 Mbit/s.

Definitely the Forces of Nature, that had already shown their concern with MPEG-1 Video in 1989 and 1990, had remained quiet for too long. The very day the VC group held its first meeting session (Dresden, April 2010) the eruptions of the Eyjafjallajökull volcano in Iceland disrupted air travels across Western and Northern Europe for days. Therefore the travels of all people who were planning on coming to the Dresden for other MPEG meetings than the VC group were compromised. Most VC experts were already in place but many other MPEG experts were prevented from physically attending.

Twenty-seven proposals, all based on block-based hybrid coding with motion compensation, were evaluated by subjective tests. At least one codec provided a rate reduction of 50% compared to AVC High Profile for all test sequences. The technologies selected came roughly from the 5 best performing proposals and were assessed in a “Test Model Under Consideration” (TMUC) until October 2010 when the relevant technologies were consolidated into TM-H1, the common software used in the core experiments whose results were gradually included in the HEVC standard.

The development of the HEVC standard proceeded apace reaching FDIS level (San José, CA, January 2013), after just 33 months of work, providing subjective improvements of over 60% over AVC in some cases), thus exceeding the 50% target of improvement.

Toward the end of the 1st decade of the century MPEG began to realise that a new generation of Systems standards were needed because the market was fast evolving toward personalised viewing of multimedia content – over broadcast channels, the internet and jointly. RThe two most important standards developed by MPEG were the MPEG-2 Transport Stream (TS) supporting real-time streaming delivery and the ISO Base Media File Format (BMFF) supporting file exchange and progressive download applications. Both technologies were not suitable for e.g. personalised advertising or selection of a preferred language. To do this with MPEG-2 TS, stream demux/remux was required and with ISO BMFF, interleaving of metadata with the media data for synchronised playback would support progressive download of a file but it was difficult to achieve efficient access to a subset of a file.

Another important component of streaming over the internet was the ability to cope with the fact that streaming over the internet typically happens at non-guaranteed bitrates. The market had already developed independent ad hoc solutions, but these were not interoperable.

At the Dresden meeting the decision was made to go ahead with two parallel activities. The first activity, eventually called Dynamic Adaptive Streaming over HTTP (DASH), would take care of streaming over the internet and a DASH CfP was issued in Dresden. Responses to the CfP were received at the following Geneva meeting (July 2010) providing a wealth of technologies that have created a stream of activities leading to a DASH standard approved in December 2011. A very high level of activity continue today extending the scope of the specification.

The second activity, called MPEG Media Transport (MMT), was designed to develop standard Systems technologies for an IP network where in-network intelligent caches are close to the receiving entities and are in charge to actively cache, and adaptively packetise and push content to receiving entities to

  1. Enable easy access to multimedia components in multimedia contents
  2. Improve the inefficiency caused by different delivery and storage formats
  3. Combine various multimedia content components, which are located over various cashes and storages.

The MMT CfP was issued at the Geneva meeting (July 2010).

Little by little the need for a taditional “triadic” MPEG standard emerged, where the Systems part would be covered by MMT and the Video part by HEVC. The new standard was called MPEG-H with the rather long, but certainly expressive title “High Efficiency Coding and Media Delivery in Heterogeneous Environments” where coding efficiency and delivery in heterogeneous environments were the focus.

The Audio part of MPEG-H was originally less clear but everybody realised that if HEVC could provide much higher resolution video, MPEG-H Audio could not be a simple revisitation of the AAC family. Eventually the Audio component of MPEG-H was defined as coding of multichannel audio where the number and position of microphones at the sending part and loudspeakers at the receiving end were independent. Part 3 of MPEG-H was called 3D Audio and the 3D Audio CfP was published at the Geneva meeting in January 2013, the very meeting which approved the HEVC standard in its original form.

 

 


Coping With An Unpredictable Internet

Dynamic Adaptive Streaming over HTTP (DASH) is a media-streaming standard in which the client has control of the server-client information flow. Clients may request data using the HTTP protocol from standard web servers even if the servers may not have DASH-specific capabilities.

The DASH standard primarily defines two formats:

  • The Media Presentation Description (MPD): a format to announce resource identifiers (HTTP-URLs) for Segments (media chunks) and to provide the context for these identified resources within a Media Presentation;
  • The Segment formats specify the formats of the entity body of the HTTP response to an HTTP GET request or a partial HTTP GET with the indicated byte range using HTTP/1.1 to a resource identified in the MPD.

The MPD provides sufficient information for a “DASH Client” to provide a streaming service to the user by accessing the Segments through HTTP/1.1.

Figure 1 shows a possible DASH deployment architecture. In the figure

  1. Solid-line boxes are referenced in the DASH standard as they host/process DASH formats;
  2. Dashed boxes are conceptual or transparent or outside of the scope of the standard.

 DASH_model

Figure 1 – DASH model

The collection of encoded and deliverable versions of media content and the appropriate description of these form a Media Presentation. Media content is composed of a single or multiple contiguous media content periods in time. Each media content period is composed of one or multiple media content components, for example audio components in various languages and a video component. Each media content component has an assigned media content component type, for example audio or video.

Each media content component may have several encoded versions, referred to as media streams. Each media stream inherits the properties of the media content, the media content period, the media content component from which it was encoded and in addition it gets assigned the properties of the encoding process such as sub-sampling, codec parameters, encoding bitrate, etc. This describing metadata is relevant for static and dynamic selection of media content components and media

DASH_High-Level_Data_Model

Figure 2 — DASH High-Level Data Model

 A DASH Media Presentation [1] is described by a Media Presentation Description [2] (MPD). This describes the sequence of Periods [3] in time that make up the Media Presentation. A Period typically represents a media content period.

Within a Period, material is arranged into Adaptation Sets [4]. If there is other material available, for example captions or audio descriptions, then these may each have a separate Adaptation Set. Material may also be provided in multiplexed form, in which case interchangeable versions of the multiplex may be described as a single Adaptation Set, for example an Adaptation Set containing both the main audio and main video for a Period. Each of the multiplexed components may be described individually by a Media Content Component Description.

An Adaptation Set contains a set of Representations [5]. A Representation describes a deliverable encoded version of one or several media content components. A Representation includes one or more media streams (one for each media content component in the multiplex). Any single Representation within an Adaptation Set is sufficient to render the contained media content components. Typically, clients may switch from Representation to Representation within an AdaptationSet in order to adapt to network conditions or other factors.

Within a Representation, the content may be divided in time into Segments [6]. A URL is provided for each Segment meaning that a Segment is the largest unit of data that can be retrieved with a single HTTP request.

DASH defines different timelines. One key feature in DASH is that encoded versions of different Media Content Components [7] share a common timeline. The Presentation Time [8] of Access Unit [9] within the media content is mapped to the global common presentation timeline for synchronization of different media components and to enable seamless switching of different coded versions of the same media components. This timeline is referred as Media Presentation Timeline [10]. The Media Segments themselves contain accurate Media Presentation timing information enabling synchronization of components and seamless switching.

A second timeline is used to signal to clients the availability time of segments at the specified HTTP-URLs. These times are referred to as Segment Availability Times [11] and are provided in wall-clock time. Clients typically compare the wall-clock time to Segment availability times before accessing the Segments at the specified HTTP-URLs. For On-Demand services with a static MPD, the availability times of all Segments are identical. For live services when the MPD is updated, the availability times of segments depend on the position of the Segment in the Media Presentation timeline.

Segments are assigned a duration, which is the duration of the media contained in the Segment when presented at normal speed. Typically all Segments in a Representation have the same or roughly similar duration. However Segment duration may differ from Representation to Representation. A DASH presentation can be constructed with relative short segments (for example a few seconds), or longer Segments including a single Segment for the whole Representation.

Short Segments are usually required in the case of live content, where there are restrictions on end-to-end latency. The duration of a Segment is typically a lower bound on the end-to-end latency. DASH does not support the possibility for Segments to be extended over time: a Segment is a complete and discrete unit that must be made available in its entirety.

Segments may be further subdivided into Subsegments [12] each of which contains a whole number of complete access units. There may also be media-format-specific restrictions on Subsegment boundaries, for example in the ISO Base Media File Format a Subsegment must contain a whole number of complete movie fragments. If a Segment is divided into Subsegments these are described by a compact Segment index [13], which provides the presentation time range in the Representation and corresponding byte range in the Segment occupied by each Subsegment. Clients may download this index in advance and then issue requests for individual Subsegments.

Clients may switch from Representation to Representation within an Adaptation Set at any point in the media. However, switching at arbitrary positions may be complicated because of coding dependencies within Representations and other factors. It is also desirable to avoid download of ‘overlapping’ data i.e. media for the same time period from multiple Representations. Usually, switching is simplest at a random access point in the new stream. In order to formalize requirements related to switching DASH defines a codec-independent concept of Stream Access Point [14].

Segmentation and Subsegmentation may be performed in ways that make switching simpler. For example, in the very simplest cases each Segment or Subsegment begins with a random access point and the boundaries of Segments or Subsegments are aligned across the Representations of an Adaptation Set. In this case, switching Representation involves playing to the end of a (Sub)Segment of one Representation and then playing from the beginning of the next (Sub)Segment of the new Representation. The Media Presentation Description and Segment Index provide various indications, which describe properties of the Representations that may make switching simpler. Profiles of this specification may then require these indicators to be set in certain ways, making implementation of clients for those profiles simpler at the cost of requiring the media data to obey the indicated constraints.

For On-Demand services, the MPD is a static document describing the various aspects of the Media Presentation. All Segments of the Media Presentation are available on the server once any Segment is available. For live services, however, Segments become available with time as the content is produced. The MPD may be updated regularly to reflect changes in the presentation over time, for example Segment URLs for new segments may be added to the MPD and those for old, no longer available Segments may be removed. However, if Segment URLs are described using a template, this updating may not be necessary except for some redundancy/failover cases.


 

[1] Media Presentation: collection of data that establishes a bounded or unbounded presentation of media content

[2] Media Presentation Description: formalized description for a Media Presentation for the purpose of providing a streaming service

[3] Period: interval of the Media Presentation, during which a consistent set of encoded versions of the media content is available i.e. the set of available bitrates, languages, captions, subtitles etc. does not change during a Period

[4] Adaptation Set: a set of interchangeable encoded versions of one or several Media Content Components. For example there may be one Adaptation Set for the main video component and a separate one for the main audio component

[5] Representation: collection and encapsulation of one or more media streams in a delivery format and associated with descriptive metadata

[6] Segment: unit of data associated with an HTTP-URL and optionally a byte range, meaning that the Segment is contained in the provided byte range of some larger resource

[7] Media Content Component: one continuous component of the media content with an assigned media component type that can be encoded individually into a media stream

[8] Presentation Time: a time associated to an access unit that maps it to the Media Presentation timeline

[9] Access Unit: unit of a media stream with an assigned Media Presentation time

[10] Media Presentation Timeline: concatenation of the timeline of all Periods which itself is common to all Representations in the Period

[11] Segment Availability Time: The time at which a Segment becomes available at the specified HTTP-URLs

[12] Subsegment: a portion of a Segment which contains a whole number of complete Access Units

[13] Segment index: a compact index of the time range to byte range mapping within a Media Segment separately from the MPD

[14] Stream Access Point: position in a Representation enabling playback of a media stream to be started using only the information contained in Representation data starting from that position onwards (preceded by initializing data in the Initialization Segment, if any)


Inside MPEG-H – Systems

MPEG Media Transport (MMT) has been designed to support the second half of the MPEG-H title “Media Delivery in Heterogeneous Environments” and is based on the following assumptions

  • IP-based delivery: packet oriented, with relatively large jitter, and use of internet protocols and functionalities (e.g. NTP)
  • Two-planes: data plane for media delivery and control plane for signaling, and presentation and delivery management
  • Layered architecture with support to communication across layers

Figure 1 depicts the MMT protocol stack.

MMT_protocol_stack

Figure 1 – MMT protocol stack

MMT specifies technologies for three functional areas:

  1. Encapsulation. Media components are processed into an MMT-specified format called Media Processing Unit (MPU) that defines the content logical structure and the physical encapsulation format based on ISO BMFF. Media component are decomposed into Assets, and individual Assets are encapsulated into a MPU file defined by MMT based on ISO BMFF. Spatial and temporal relationships among multimedia components are represented by HTML5 and Composition Information (CI) as shown in Figure2. This MPU file produced by an encapsulation function is used either to store coded media data on a storage device for delivery preparation.
  2. Delivery defining:
    1.  The protocol supporting streaming delivery of packetised content through heterogeneous network environments
    2. The payload format for packetised encapsulated media
  3. Signalling defining the message formats to manage MMT package consumption and delivery:
    1. Consumption management messages signal the MMT package’s structure
    2. Delivery management messages signal the payload format’s structure and the protocol configuration.

This is represented in Figure 2 below

MMT_soecification_areas

Figure 2 – MMT specification areas (green blocks)

MMT content has an onion-shell structure composed of (from the center)

  1. Media Fragment Unit (MFU) composed of Access Units (AU) which can be decoded independently;
  2. Media Processing Unit (MPU) is the minimum storage and consumption unit of MMT content. It contains one/more than one MFUs and can be independently decoded. Its structure is based on ISO BMFF;
  3. MMT Asset is a logical unit for multimedia component elementary streams (e.g. audio, video and data). It contains one/more than one MPU files and has its own ID so that all MPU files of an Asset can be recognised through this ID.
  4. MMT Package is a logical unit of content, e.g. a broadcasting program and can be realised by one or more than one MMT Asset. Since an MMT package is represented by spatial and temporal relationship among Assets, and can be delivered by various networks, the MMT Package contains also these relationships and delivery network information.

The MMT content hierarchy is represented in Figure 3 below.

MMT_content_hierarchy

Figure 3 – MMT content hierarchy

MMT Composition Information provides information on how to presentan MMT Package in terms of both spatial and temporal relationships among MMT Assets in a MMT Package, and thus specifies how to describe and consume a MMT Package in terms of Assets relationships. The information about a MMT Package and the composition of MMT Assets in the MMT Package are described in MMT-CI. As an initial presentation of an MMT Package, the spatial relationship among Assets in the Package is described by HTML5. Assets in the MMT Package may also be delivered earlier or later in accordance with timing of events, and presented in certain regions. This temporal relationship is described by MMT-CI, which is based on XML.

Figure 4 depicts how an MMT file (top left) is chopped in MMT packets and vice versa.

MMT_packets

Figure 4 – MMT File packetisation and back


MPEG-H Inside – 2D Video

The video coding layer of HEVC is based on the typical “hybrid” approach (inter- and intra-picture prediction and 2D transform coding) with some key differences that enhance compression. Figure 1 gives a high-level reference diagram

HEVC_encoder

Figure 1 – HEVC encoder block diagram

Here is a list of the main technical elements and innovations.

Technology Features
Coding Tree Units and Coding Tree Block structure  Instead of the macroblock

  1. Coding tree unit (CTU) consisting of coding tree blocks (CTBs) for luma and chroma.
  2. Luma CTB can have a size of16x16, 32×32, or 64×64 samples.
  3. CTBs are partitioned into coding blocks (CBs), signaled via a quadtree structure.
  4. Coding unit (CU) includes one luma CB and the two corresponding chroma CBs and associated syntax elements.
  5. Below the CU level:
  6. prediction units (PUs) and a tree of transform units (TUs).
  7. Inter/intra-picture encoding decision is made at the CU level.
Transform Units and Transform Blocks The prediction residual is coded using block transforms. A transform unit (TU) tree structure has its root at the CU level, where the CBs may be further split into smaller transform blocks (TBs). Integer basis functions approximating the discrete cosine transform (DCT) are defined for dyadic TB sizes from 4×4 to 32×32. For the 4×4 transform of intra-picture prediction residuals, an integer transform derived from the discrete sine transform (DST) is additionally specified.
Motion compensation Quarter-sample precision, 7-tap or 8-tap filters for interpolation of any fractional-sample positions.
Intra-picture prediction Decoded boundary samples from adjacent blocks are used as prediction reference data for spatial prediction in PB regions when inter-picture prediction is not performed.
Entropy coding Five generic binarisation schemes for symbol encoding; specification of which of these is applied to each type of syntax element. Context-adaptive binary arithmetic coding (CABAC) used for entropy coding.
In-loop filtering One or two filtering stages optionally applied (within the inter-picture prediction loop) before writing the reconstructed picture into the decoded picture buffer. A deblocking filter (DBF) is also used.
Slices, tiles and wavefronts A slice is a series of CTUs that can be decoded independently from other slices of the same picture (except for in-loop filtering of the edges of the slice, and except for the case of “dependent slices” as described below). To enable parallel processing and localized access to picture regions, the encoder can partition a picture into rectangular regions called tiles.
High-level syntax Above the coding layer, many of the high-level syntax features of AVC have been retained or extended. Important elements of high-level syntax are specifications of access structures, management of coded and decoded picture buffers, signaling of video usability information (VUI) and supplemental enhancement information (SEI).
Extended format and quality ranges 4:2:0, 4:2:2 and 4:4:4color sampling; components represented by a maximum bit depth up to 16
Multi-view and scalable coding the high-level syntax has sophisticated layering mechanisms allowing to establish hierarchical bitstream structures. This supports stereo/multi-view coding, where a decoder of a dependent view can refer to previously decoded pictures from another view

The performance of HEVC was verified for a range of video sequences not used during the development of the standard. For all more than a 50 % bit rate savings was observed – with an average of 60 % for the high quality range.


Explorations

The MPEG machine churns out new standards or amendments at a regular pace, but MPEG is also continuously investigating new standardisation needs. Here is a list of some of the current areas of investigation.

Compact descriptors for video analysis (CDVA).

After the successful completion of the CDVS standard, MPEG is now addressing the following problem: find in a video data base a video resembling the one provided as input. the figure illustrates the problem

CDVA_problem

Figure 1 – CDVA for retrieval

A CDVA standard would allow a user to access different video data bases with the guarantee that the descriptors extracted from the input video will be interoperably processed in different video DBs.

Videos of close to 1000 objects/scenes taken from multiples viewpoints have been collected and will be used to test CDVA algorithms submitted in response to a CfP that has been issued at the MPEG 112 meeting.

MPEG has recently issued a CfP on “Compact Descriptors for Visual Search”. The  requirements are listed below:

  1. Self-contained (no other data necessary for matching)
  2. Independent of the image format
  3. High matching accuracy at least for special types of image (textured rigid objects, landmarks, and printed documents), and robustness to changes (vantage point, camera parameters, lighting conditions and partial occlusions)
  4. Minimal length/size
  5. Adaptation of descriptor lengths for the target performance level and database size
  6. Ability to support web-scale visual search applications and databases
  7. Extraction/matching with low memory and computation complexity
  8. Visual search algorithms that identify and localise matching regions of the query image and the database image, and provide an estimate of a geometric transformation between matching regions of the query image and the database image.

Submissions are due by MPEG 114 (February 2016).

In the future the scope of CDVA for retrieval could be further extended to cover classification as depicted in Figure 2.

cdva-classification

Figure 2 – CDVA for classification

This second type of CDVA technologies could find application for video surveillance where the detection of a person can trigger some process or automotive to alert the driver when an object of a specified nature is in front of the car. CDVA for classification could reverse the current approach where first video is compressed for transmission and then described for use by a classifier, to describe a video and then compress it for transmission.

ctd-dtc

Figure 3 – From compress then describe to describe then compress

Free-viewpoint Television (FTV)

FTV intends to ascertain the existence and performance of two types of technology

  1. How to compress efficiently a large number (e.g. 100) of signals from cameras aligned on a line or a circle and pointing to the same 3D scene (supermultiview)
  2. How to reconstruct a 3D scene taken from a virtual point knowning the signals from a limited number (e.g. 10) of cameras

The two figures below depict the two target configurations

supermultiview

Figure 4 – Super Multiview

freenavigation

Figure 5 – Free Navigation

MPEG wearables

Wearable devices are characterised by two features

  1. Can be worn by or embedded in a person or his clothes
  2. Can communicate either directly through embedded wireless connectivity or through another device (e.g. a smartphone).

Wearable devices allow users to

  1. Track time, distance, pace and calories via a set of sensors in a T-shirt or on smart shoes
  2. Wear smart glasses which combine innovative displays with novel gestural movements for interaction
  3. Wear a pacemaker or a heart rate monitor intelligent band aid.

MPEG has developed the following conceptual model

Conceptual_Model_for_MPEG_Wearable

Figure 6 – Conceptual model for MPEG wearable

A wearable sesnses/actos on the environment, communicates with a processing unit and interacts/acts on users. the processing unit can also interact with users.

The scope of MPEG Wearable is to standardise

  1. The interaction commands from User to Wearable
  2. The format of the aggregated and synchronized data sent from the Wearable to the Processing unit (represented by red arrows in the figure above);
  3. A focused list of sensors that the Wearable may integrate.

Media orchestration

Technology supports more and more different capture and display devices, and applications and services are moving towards more immersive experiences. It becomes now possible to create integral experiences by managing multiple, heterogeneous devices over multiple, heterogeneous networks. We call this process Media Orchestration: orchestrating devices, media streams and resources to create such an experience as depicted in Figure 7.

media_orchestration

Figure 7 – Media orchestration

Four dimensions have been identified

  1. Device dimension
  2. Stream dimension
  3. Spatio-temporal dimension
  4. Ingest/rendering dimension

The technical framework distinguishes three independent layers:

  1.  The Functional Architecture,
  2. Data Model and Protocols
  3. Data representation and encapsulation

Genome compression

A genome is the code that drives the life of living beings. Genomes are specific of a type of living being and, within it, of a given individual. They are highly structured

  1. Genomes are composed of chromosomes (46 in the case of humans)
  2. Chromosomes are composed of genes
  3. Genes are composed of DNA molecules specific of the gene
  4. DNAs contain a few tens to a few hundreds nucleotides
  5. Nucletides are organic molecules (bases)
  6. There are 4 types of bases depicted in Figure 8.

dna_bases

Figure 8 – The 4 molecules composing a DNA

Therefore the genome code is represented by a 4-symbol alphabet. The initials of the 4 bases are used to indicate the symbol. The human genome carries ~3.2 billion symbols.

The ability to read genomes (i.e. converting a genome in an organic sample into a machine-readable sequence) is highly desirable but was very costly until a few years ago. Today high-throughput genome sequencing technology has made sequencing of the genome affordable.

genome_sequencing_chain

Figure 9 – The genome sequencing process

High Throuput Sequencing (HTS) is opening new perspectives for the diagnosis of cancer and other genetic illnesses. However, genome sequencing generates huge amounts of data, because each base has associated “quality metadata” that describe the quality (i.e. reliability) of the reading of that base. A sequenced human genome can generate several TBytes of data each time.

A genome sequencing machine produces “reads”. Then a computer programs aligns them and produces an assembled “machine readable” genome.

 genome_sequencing_and_assembling

Figure 10 – Genome sequencing by assembling reads

The challenge is to find appropriate algorithms than can reduce the amount of data that need to be stored and make the data easily accessible for a variety of processing.

genome_lifecycle

Figure 11 – The genome life cycle

We need to be able to store, access, search and process genome sequencing data efficiently and economically, before these technologies can become generally available in healthcare and medicine. That this is the real roadblock is confirmed by the trend in sequencing data generation cost vs storage and bandwidth costs. The latter will soon be higher that the former.

Media-centric Internet of Things

MPEG has defined “Media-centric Internet of Things (MIoT)” as the collection of interfaces, protocols and associated media-related information representations that enable advanced services and applications based on human to device and device to device interaction in physical and virtual environments. an eventual MIoT standard should allow a system designer to assemble different systems by piecing together MIoTs of appropriate functionality.

The figure below represents the MIoT model

MIoT_model

Figure 12 – Media Internet of Things (MIoT) model

where the following definitions apply

  • Entity is any physical or virtual object that is sensed by and/or acted on by Things.
  •  Thing is anything that can communicate with other Things, in addition it may sense and/or act on Entities.
  •  Media Thing (MThing) is a Thing with at least one of audio/visual sensing and actuating capabilities.

Future Video Coding

MPEG approved HEVC in January 2013 and it may appear premature to consider new a standard that may replace an older standard that is barely being deployed. The reasons for doing so are manyfold:

  1. Deep-pocketed companies are producing newer generations of video codecs at an accelerated rate
  2. Mobile is a great way of consuming video and paradigms of the TV age may no longer apply
  3. New media types are appearing or just becoming more prominent and the coding environment need not stay the same
  4. Around 2020 a standard for new generation mobile networks (5G) is expected and a new video coding standard for mass consumption of video on mobile is a good fit

At this point it is time to make an assessment of the path trodden by MPEG in its quarter-of-century efforts at compressing video. This is represented by the table below

  Base Scalable Stereo Depth Selectable viewpoint
MPEG-1 Video

~VHS

MPEG-2 Video

2Mbit/s

-10%

-15%

MPEG-4 Visual

-25%

-10%

-15%

MPEG-4 AVC

-30%

-25%

-25%

-20%

5/10%

HEVC

-60%

-25%

-25%

-20%

5/10%

?

?

?

?

?

?

In the “Base” column percentage numbers refer to compression improvement compared to the previous generation of compression technology. The percentage numbers in the “Scalable”, “Stereo” and “Depth” columns refer to compression improvement compared to the technology on the immediate left. “Selectable viewpoint” refers to the ability to select and view an image from a viewpoint that was not transmitted. The last row only contains questions marks because it refers to what, potentially, can be done in a future that we do not know, but that we can well create to ourselves.

So far MPEG has produced 5 generations of video coding standards. It is the right time to ask ifit is possible to establish a comparison between the bitrates handled by MPEG video and audio codecs and the bitrates used by the eye/ear-to-brain channel.

The input bandwidth to human sensors are

  • Eyes: 2 channels of ~430–790 THz
  • Ears: 2 channels of ~20 Hz – 20 kHz

The human body uses a single technology (nerve fiber) to connect sensors to the brain. Transmission of information happens by means of trains of pulses at the rate of ~160 spikes/s (every 6 ms). Assuming that 16 spikes make a bit: 1 nerve carries ~10 bit/s, the bitrates to the brain are

  • From 1 eye: 1.2 M fibers transmit ~12 Mbit/s
  • From 1 ear: 30 k fibers transmit ~0.3 Mbit/s

End of the MPEG ride?

In an endeavour like MPEG any time is a good time to ask the question: does MPEG still have a role to play? I can see three reasons why it may be a good idea to (plan to) end this over-a-quarter-of-a-century long ride on the media bits. The first is that, after a certain period of time, organisations tend to become sclerotic, bureaucratic, procedure-driven. In other words they tend to forget their raison d’être by confusing the means to reach the goal with the goal itself, they become incapable of reinventing themselves and do not look at the world with the same curiosity and drive they had when they were first established. The second could be that much has changed in the last 27 years and the recipes that were good 25, 15, 10 and maybe even 5 years ago do not necessarily keep their validity today. Lastly there may not be enough work left to produce standards to justify the existence of the group.

to address this question properly it is good to start from the very motivational roots of MPEG: MPEG’s mission is to produce global digital media standards with two main purposes:

  1. For end users to seamlessly exchange media information
  2. For industry to provide interoperable products, services and applications.

Maybe because of the social and industrial importance or the matters handled, MPEG is a “special” standards working group because:

  1. It has a running attendance of 300, sometimes up to 500, from some 25 countries
  2. Its experts represent all the digital media industries, especially not typical IT industries
  3. It has liaisons with some 50 organisations
  4. It is structured to deal with “all” aspects of media
  5. It produces timely standards (hundreds of deliverables so far) in response to anticipated industry needs
  6. It has been running for 26 years adapting its mind set as the digital media has evolved from a fledgling entity to an industry with a turnover of hundreds billion USD.

People seeking a tranquil life do not consider “being special” as necessarily a recommendation. Indeed one decision preferred by some managers is to reshuffle an environment after some time by disbanding groups and creating new ones. However, I do not think that a new body with a mandate similar to MPEG’s would be an advantage. MPEG has a recognised brand, a considerable experience and an established work method. Dissolving MPEG and creating a new group would mean that many industry users would lose a reference while the new group would have to fight for recognition in a sea of competing groups, all marketing their results as the solution for the needs of the Digital Media industry at large, when they are in fact, in most cases, at best solutions targeted to a specific industry. Dissolving MPEG to create a host of group is a simplistic idea that will beget dire consequences: MPEG is not an umbrella under which different groups operate, but a tightly knit network of subgroups producing standards that demonstrably fit with each other. At every MPEG meeting tens of “joint meetings” between different subgroups are held to achieve a common understanding of issues, iron out differences and specify interfaces.

I do not think MPEG is a sclerotic organisation. Let’s first consider the membership. At every new major MPEG project, waves of new participants have come one after another to replace portions of previous generations of participants. Today less than 5% of regular meeting participants have been in MPEG for more than 10 years and the oldest participant is, ahem, the Convenor, followed by Ajay Luthra whose first meeting was Livingston, NJ in February 1989. Maybe a quarter of current participants have been in MPEG for less than a year. Finally all chairs have changed more than once.

Let’s now consider how the group managed to reinvent its role. In MPEG-1 and MPEG-2 (end of ’80s and first half of ’90s) MPEG adopted a traditional approach to solve the needs of rather traditional users, albeit using revolutionary – digital – technologies. Still it laid down some innovations that continue to this day such as the involvement of all industries overcoming the past barriers between industries, the collection of industry-wide requirements and, as a consequence, the profile approach designed to satisfy different needs with the same tool set. In the early MPEG-4 times (second half of the 1990s) MPEG has been able, certainly not to “forecast” the future, but at least to create future-proof technologies that were the first to target the use of audio-visual content for the unstructured world of fixed and mobile Internet. With MPEG-7 MPEG has created a container of standard technologies for the next phase of media experience and with MPEG-21 it has provided the technologies that can conjugate civil and economic rights. With MPEG-A MPEG has created “system standards” to enhance usability of its technologies. With MPEG-B, MPEG-C and MPEG-D MPEG has created containers for standard Systems, Video and Audio technologies to be filled “on demand”. With MPEG-V MPEG has anticipated by many years the needs that are surfacing today for Internet of Things (IoT) standards. With DASH MPEG has asserted the preeminence of “content” requirements over internet delivery. With MPEG-H MPEG has recreated the Systems, Video and Audio triad, so successful with MPEG-1 and MPEG-2, in the age of hybrid – broadcast and IP – delivery. From the organisational viewpoint the teaming up with ITU-T in the creation of single entities for video coding matters (the JVT and the JCTs) is another achievement. Finally the identification of a space for “Type 1” standards (i.e. standards that are expected to be exercised without payment of royalties) and the production of standards that are expected to have such features, is another achievement. In conclusion MPEG has been able to serve more and more new constituencies, without stopping to cater for the needs of its older constituencies.

Sclerotic? I do not think MPEG is sclerotic.

Let’s now consider the worst that can happen to human organisations, i.e. to stop being driven by a sense of mission driven by a vision and start being driven by procedures instead. This sentence should not be taken to mean disdain for the rules. A body like MPEG, producing standards with such impacts on manifold industries totaling a turnover of hundreds of billion USD and end users cannot afford to be loose in the way it performs its function. But it is one thing to concentrate on procedure and forget about the content of the work and another to be driven by the work and make sure that procedures are upheld to high standards to the satisfaction of all parties involved. MPEG can claim it has achieved this difficult balance.

Hoping that the first test has been passed, let’s consider the second, i.e. whether MPEG is still a match to the current challenges. It is true that the last 27 years have brought an incredible amount of changes, but MPEG has managed to adapt to each new context as demonstrated by the changing nature of the standards produced. Audio and video used to be the realm of a restricted number of industries, while now almost any industry claims to have a stake in Digital Media. Standards used to be bound to rigid hardware implementations while now standards have to deal with the fact that their implementations are by and large pieces of computer code running on programmable devices. Indeed, to get to the point it is now, MPEG has had to change its skin several times. From MPEG-1 and MPEG-2 where software was a development tool, to MPEG-4, where software was elevated to the rank of tool to express the standard itself, to MPEG-7 which is definitely an IT standard although developed to serve the needs of the AV industry, to MPEG-21 which is more a framework than a traditional standard, to MPEG-V and MPEG-H – components for the converged industries – MPEG has been able to adapt itself to continuously changing environments. A side issue as it may be considered, MPEG has changed its working methods much before any other group to augment its productivity and was probably the first group of its size to make massive use of advanced ICT in standard development and has continued improving its productivity tools.

This should not be taken to mean that MPEG has already adapted itself so much that it can now take some rest. probably the opposite: it must change even faster than it did so far, just because the environment keeps on changing faster. Here are some issues:

  1. Starting from “video coding” MPEG has evolved to address a wide range of technologies. Each of these require a high degree of in-depth expertise, but the different technologies are not independent. How can MPEG retain its ability to develop standards that are individually high-level and still fit perfectly in a complex picture of intertwined standards users? The response to this question is still through an intense network of joint meetings between different groups to draw from the necessary expertise and to hear all concerns of the different groups involved.
  2. MPEG has traditionally produced standards rich in functionalities but with a rather high entry level. This was the right thing to do in times when the media market was “rich” (in money). Today, however, the Digital Media landscape does not offer a lot fat very often and in many domains users tend to be content with quick and dirty solutions. How can MPEG recover these customers without abdicating its traditional role? One answer is the creation of entry-level configurations for each standard.
  3. The practical exploitation of MPEG standards is becoming difficult because of the conflict between the manifold ways users of the standards conceptualise their market exploitation models and the rather conservative approach that rights holders, some of them from “old” industries, take vis-à-vis the exploitation of their rights. As a matter of fact this is not something that MPEG is entitled to deal with, but it is clear that, unless some new ideas materialise, it is going to be more and more difficult for users to exploit MPEG standards (and in general open standards) and for rights holders to be remunerated for their IP.

The issue of how much need there is for a body like MPEG to cater to its digital media constituencies is my constant concern. If numbers are any guide I would say that going from an average number of 300 participants to nearly 500 like MPEG has experienced, specifically for the HEVC standard, is an indication that its role is still highly regarded and tends to justify my expectation that, if 100 years from now MPEG will not exist any more it will be because those running it at that time will not be up to the task of exploiting the potential of its area of work.


The end of MPEG may be coming, soon?

In End of the MPEG ride?, written a couple of years ago, I tried to guess the future of MPEG and concluded that the MPEG formula was strong enough to guarantee a long future to the organisation. Now I am coming to the conclusion that my analysis was probably too self-assuring. The reality is likely to turn out to be much different.

Let’s redo the analysis that led to my original relaxing conclusion.

When MPEG was established, close to 30 years ago, it was clear to me that there was no hope of developing decently performing audio and video compression standards after so many companies and universities had invested for decades in a field they – rightly – considered strategic for their future. So, instead of engaging in the common standards committees exercise of dodging patents, MPEG developed its standards having as a goal the best performing standards, irrespective of the IPR involved. To be able to do so MPEG offered to the parties involved the following implicit deal (that I call with the grand name of social contract): by granting use of patents (at onerous terms, but that was not for MPEG to handle) there will be

  1. a global market of digital media products, services and applications
  2. seamless end-to-end interoperability to billions of people
  3. hefty royalties to patents holders.

For decades all parties involved gained from the deal: companies had access to a global market; end users communicated with billions of other users and millions of services; and patent holders cashed royalties. MPEG, too, “gained”: it fulfilled its raison d’être, and the virtuous circle ensured that is continued to play a role because patent holders could reinvest in new technologies for possible use in future MPEG standards.

In Patents and Standards I already mentioned how Dick Green, then CableLabs president, acted as the white knight that contributed to solving the MPEG-2 patent pool conundrum in the early 1990’s. The MPEG LA patent pool (no relation with MPEG) was instrumental to the success of MPEG because it provided, among others, licences for

  1. The MPEG-2 Systems and Video standards, well received because it fitted the business model of the digital television players of the time (which were by and large also those who had defined the licence).
  2. The MPEG-4 Visual standard, well received for the part that concerned video equipment (e.g surveillance or mobile handsets) because it fitted the business model of those industries but rejected for the part that regarded streaming of pay content (which had been by and large absent from the definition of the licence).
  3. The MPEG-4 Advanced Video Coding standard, sometimes grudgingly received, but overall successfully used for 15 years now.

In Option 1 Video Coding, I described my attempts at creating a “hierarchy of grades” in MPEG standards (I remind that ISO envisages 3 types of access to patents as Option 1: patents accessible at no cost; Option 2: patents accessible at a cost; and Option 3: patents not accessible)

  1. Top grade: the latest standard in a given area (typically Option 2)
  2. Medium grade: a standard (intended to be Option 1), less performing than the top grade but more performing than the low grade
  3. Low grade: the last but one standard in the area (typically Option 2).

The Internet Video Coding (IVC) standard, approved by MPEG in 2017, proved that it is possible to build such a hierarchy wurh an effective medium grade standard. Indeed, some 12 years after the approval of AVC, IVC showed that it could provide better performance than AVC. It is worth recalling that some IPR holders pointed out that IVC infringed on some of their patents providing precise information on it. MPEG promptly responded by removing the infringing technologies.

Other patent holders, however, started making similar statements but without indicating which of their  technologies were infringed exploiting the ISO rules  that allow a patent holder to make a blanket patent declaration such a “Company may have patent that we are ready to license as Option 2. In these conditions the only thing that MPEG can promise users of the standard is that, at the time it will be given precise information, it will remove the infringing technologies from the standard. This is clearly not a very attractive business proposition for a user of the standard.

Is therefore no Option 1 practically possible in ISO? I would say that is the case, unless the directives are changed, something I think should be done without delay. Indeed, how can an organisation, admittedly a private one, call itself International Organisation for Standardisation, whose standards can be referenced in ligislation, if the standards it can practically produce can only be practiced at Fair, Reasonable and Not Discriminatory (FRAND) terms? This is indeed odd if we look at the marketplace where there are (and have been) several royalty free standards with reasonable performance. Why should there be a possibility to produce FRAND standards blessed by ISO and no freely usable standards with an ISO label?

I had anticipated the need for MPEG Option 1 standards in the mid 1990’s and worked hard to achieve that goal. What I did not anticipate was that the “social contract” I have described above would be broken: 5 years after MPEG approved the MPEG-H HEVC standard (January 2013) offering a reduction of the bitrate of 60% wrt AVC, there are are 2 patent pools who have published their licences, one that has not published their licence and a number of independent patent holders who have not joined any patent pool and have not declared their licensing scheme. In my blog I have asked the rhetorical question: “Whatever the rights granted to patent holders by the laws, isn’t depriving billions of people, thousands of companies and hundreds of patent holders of the benefits of a standard like HEVC and, presumably, other future MPEG standards, a crime against humankind?”.

This is well illustrated by the figure below developed by Jonathan Samuelsson of Divideon. A caveat is necessary: the picture may not be up to date even now because the situation changes so rapidly as new initiatives pop up to solve the problem (or to make it more difficult).

There should be no surprise if the HEVC void is being filled by an industry forum, targeted to – guess what? – a royalty free standard (implemented, I am told, as a cross-licensing agreement between forum member companies). The forum is called Alliance for Open Media and the specification AOM Video Codec 1 (AV1) . AOM has announced that AV1 will be published in spring 2018 with a performance better than HEVC, exactly the “medium grade” codec I had envisaged for MPEG.

At the 116th MPEG meeting in Chengdu in October 2016 MPEG agreed on  the timing of the next video coding standard (Call for Proposal in July 2017 and FDIS in October 2020). That decision was long in the making, but  now that the date was set it became urgent to avoid another HEVC fiasco. So I took an initiative, similar to the one described in Patents and Standards for MPEG-2: I announced that I would hold a friend’s meeting at night (remember that all MPEG people are my friends).

The gathering of friends took place and discussed in good faith several possibilities, the most interesting of which to me were:

  1. Possibility 1
    1. Acknowledge that the new codec would have at least one Option 1 profile
    2. Develop the profile with the Option 1 process used for WebVC, IVC and VCB (i.e. companies submitting proposals declare which (if any) patents of theirs are allowed to be placed in an Option 1 profile)
  2. Possibility 2
    1. Acknowledge that the new codec would have at least one profile defined by a “licence WITHOUT NUMBERS (i.e. $)” called LWN developed outside MPEG
    2. Companies submitting proposals declare which (if any) patents of theirs are allowed to be placed in a LWN profile
    3. The final “licence WITH NUMBERS (i.e. $)” will be defined by patent pools (outside MPEG).

The gathering of friends recognised that there would be a need for advice from the ISO Central Secretariat (CS) and legal reviews before proceeding.

So the Sunday before the following 117th meeting (Geneva, January 2017) I met people of the ISO Central Secretariat and got confirmation that in principle there were no procedural obstacles to the results of the Chengdu discussions. So I convened another gathering of friends for Wednesday night (Geneva meetings have no social event) where I invited industry to take actions, the first of which to have a legal review of the process (much more complicated than outlined above). A few declared they would but, unlike the similar gatherings 25 years before, eventually nothing happened.

In order to set any action in motion within ISO for the HEVC and IVC problems, however, it was necessary that the MPEG parent body SC 29, made a communication to its parent body JTC 1. A subgroup of JTC 1, the Advisory Group (JAG), due to meet in Berlin in March, would accept input contributions by the next Monday. So I drafted an input document and asked Andy Tescher, the Convenor of the SC 29’s Advisory Group on Management (AGM), to convene a meeting of that group. The final text of the document, discussed, edited and approved at the meeting, was titled “Concerns about the ISO/IEC process of standard development” and conveyed the following messages:

  1. Users are reluctant to adopt HEVC because of the uncertain situation, and the next video coding standard , too, is likely to have a similar fate if the HEVC situation is not resolved
  2. A blanket Option 3 patent declaration (no granting of patents) against a standard intended to be Option 2 and a blanket Option 2 patent declaration (granting of patents at onerous terms) against a standard intended to be Option 1 (no cost for using a patent) prevents experts from taking corrective actions.

I presented the AGM document to the JAG meeting requesting that it be forwarded to appropriate ISO entities. Out of 12 National Bodies who took position, 7 supported the document, 4 were against it and one objected on procedural grounds. The JAG Convenor decided to send the document back to SC 29.

The meeting made me realise that, to wage procedural battles spanning the entire ISO (and IEC, and even ITU) hierarchies, with a hope of achieving results MPEG needed to raise its status of working group (even if MPEG is bigger than many Subcommittees and even Technical Committees). So I manifested my interest in being appointed as Chairman of SC 29 who was due to election in July 2017.

It should be clear that my interest was not driven by an ill-placed desire for “promotion”. Chairing MPEG is a tough but a technically rewarding job, chairing SC 29 is institutionalised boredom.

The Japanese National Body which is responsible for the SC 29 Secretariat and has the right to nominate SC 29 Chairs, disregarded my expression of interest and went on with their candidate which SC 29 duly elected.

It is interesting to see how the SC 29 meeting that appointed the new Chair reacted to the document produced by the AGM in January. They asked JTC 1 to draw ISO’s attention to the problem of Option 3 patent declarations made against standards intended to be Option 2. Something good finally being done?

Not really, in MPEG there are no cases of Option 3 patent declarations against standards intended to be Option 2. Why? Because if you want to stop a standard you need not play the bad guy with an Option 3 declaration. You just let the standard be published and there will be plenty of ways to stop it later (see HEVC). On the other hand there are plenty of Option 2 patent declarations made against standards intended to be Option 1, but JTC 1’s attention was not drawn to that problem.

More remarkable has been the reaction of the world outside MPEG to the HEVC problem. In October 2017 the Academy of Television Arts and Sciences decided to award an Emmy to HEVC. Now, I cannot agree to (but I understand why) a Nobel Peace Prize is awarded to a newly elected USA President, but assigning  an Emmy to a standard that has been sitting idle for 5 years is not just ridiculous, it is a slap on the face.

So, when the same Academy on the same occasion decided to assign the Charles F. Jenkins Lifetime Achievement Award to me, I made a post on my blog  that ended with “I am happy to receive this Charles F. Jenkins Lifetime Achievement Award – for what it means for the past – but with a sour taste for the future, the only thing that matters”.

Yes, it is the future what matters. I doubt HEVC will ever see any major deployment unless a licence at extremely interesting terms is made available now. More and more people will adopt the “free” AV1. AOM will continue improving its codec, so that when, in October 2020, MPEG will approve a certainly excellent new video codec standard, there will be few if any wishing to go through the pains of multiple incompatible licences, from different patent pools and a host of other patent holders just staying on the sidelines for the sake of using a slightly better codec.

Therefore my answer today to the question: “Will the next MPEG video codec have a future?” is: in the current conditions, that I do not expect to change.

Video, clearly the sexiest thing that MPEG provides, is not the only area under attack. Five years after MPEG approved the DASH standard, MPEG LA published its DASH licence that many consider unrealistic. Because of this, there are very little news of DASH deployments. So the MPEG systems layer standards, too, are also under attack.

The next question – will MPEG have a future? –  has an articulated  answer. For sure MPEG will continue catering to the maintenance and evolution of its impressive (about 180)  widely used standards. It may also continue developing some new standards in areas where the conflicting policies of patent holders can be harmonised. In the circumstance, however, it is unlikely that its sexiest (and so far most successful) MPEG video coding standards will find users. Little by little companies will stop sending their experts to MPEG to develop those standards. An avalanche that can only grow with time.

Will the Earth stop turning on its axis because of this? Certainly not. As head of the Neanderthal tribe I have tried to give the tribe new tools for the new challenges, but some evil people have undermined my efforts. So the Homo Sapiens tribe is taking over the land because they have the tools that the Neanderthal tribe is missing.

Can the Neanderthals do something to recover the land? Everything is possible if there is a will. The head of the tribe would have that will and could deliver.