Multimedia Standards For An Evolving Market

Two years after the completion of the 1st edition of the AVC standard MPEG organised a Worshop on Future Directions in Video Compression (Busan, April 2005). The purpose of the workshop was to inquire about the prospects of a new generation of video compression standards. As no definite answer could be obtained, another Workshop was held 6 months later (Nice, October 2005) with similar results. Three years later two workshops in a row on New Challenges in Video Coding Standardization (Hannover, July 2008 and Busan, October 2008) brought announcements of new, but still undocumented, algorithms providing more than 30% compression. That was enough to convince MPEG that it was worth issuing a Call for Evidence for new video coding technology (Maui, HI, April 2009). Submissions were reviewed at the following meeting (London, July 2009) and promising results detected. A draft CfP for High Efficiency Video Coding (HVC) was produced at the Xi’an meeting (October 2009).

The JVT was closed at the Kyoto meeting (January 2009) because AVC related activities were in a downward spiral. Three weeks later I went to Geneva to meet the ITU-T Director to discuss the opening of a new collaborative team termed (rather redundantly) Joint Collaborative Team on Video Coding (JCT-VC). The object of the collaboration was a new video coding standard that was eventually eventually called High Efficiency Video Coding (HEVC). Unlike the JVT which could meet independently of MPEG and VCEG, the agreement included a clause that the JCT-VC should either meet as part of ITU-T SG 15 (on average every 9 months) or as part of MPEG (on average 2 out of 3 meetings) . In the former case MPEG should meet at least in the same city (typically Geneva) in order not to disrupt the network of relationships with the other MPEG subgroups.

The HEVC Call for Proposals was developed while the discussions on Requirements were progressing. The rationale for a new video coding standard were set by the need to provide more “quality” – in terms of increased temporal and spatial resolution, color fidelity, and amplitude resolution – in an affordable fashion. The reference numbers are so-called 4k, i.e. a spatial resolution of about 2000X4000 pixels sampled at 1024 quantisation levels (10 bits), progressive scan and a frame frequency of 100 Hz. Next to these “broadcast-oriented” applications, there were also requirements for high-quality video for LTE or 4G because cell phone screens were increasing in size and resolution (even though iPad, the first tablet, had still to be placed on CE shop shelves). The new video coding standard would be required to outperform AVC by at least 50%.

The requirements were reviewed jointly with VCEG and the two organisations eventually issued a Joint Call for Proposals on Video Compression Technology (Kyoto, January 2010). The call defined a set of test sequences progressively scanned at resolutions including 416×240 pixels, 1920×1080 pixels, and 4096×2048 pixels and a set of bitrates ranging between 256 kbit/s and 14 Mbit/s.

Definitely the Forces of Nature, that had already shown their concern with MPEG-1 Video in 1989 and 1990, had remained quiet for too long. The very day the VC group held its first meeting session (Dresden, April 2010) the eruptions of the Eyjafjallajökull volcano in Iceland disrupted air travels across Western and Northern Europe for days. Therefore the travels of all people who were planning on coming to the Dresden for other MPEG meetings than the VC group were compromised. Most VC experts were already in place but many other MPEG experts were prevented from physically attending.

Twenty-seven proposals, all based on block-based hybrid coding with motion compensation, were evaluated by subjective tests. At least one codec provided a rate reduction of 50% compared to AVC High Profile for all test sequences. The technologies selected came roughly from the 5 best performing proposals and were assessed in a “Test Model Under Consideration” (TMUC) until October 2010 when the relevant technologies were consolidated into TM-H1, the common software used in the core experiments whose results were gradually included in the HEVC standard.

The development of the HEVC standard proceeded apace reaching FDIS level (San José, CA, January 2013), after just 33 months of work, providing subjective improvements of over 60% over AVC in some cases), thus exceeding the 50% target of improvement.

Toward the end of the 1st decade of the century MPEG began to realise that a new generation of Systems standards were needed because the market was fast evolving toward personalised viewing of multimedia content – over broadcast channels, the internet and jointly. RThe two most important standards developed by MPEG were the MPEG-2 Transport Stream (TS) supporting real-time streaming delivery and the ISO Base Media File Format (BMFF) supporting file exchange and progressive download applications. Both technologies were not suitable for e.g. personalised advertising or selection of a preferred language. To do this with MPEG-2 TS, stream demux/remux was required and with ISO BMFF, interleaving of metadata with the media data for synchronised playback would support progressive download of a file but it was difficult to achieve efficient access to a subset of a file.

Another important component of streaming over the internet was the ability to cope with the fact that streaming over the internet typically happens at non-guaranteed bitrates. The market had already developed independent ad hoc solutions, but these were not interoperable.

At the Dresden meeting the decision was made to go ahead with two parallel activities. The first activity, eventually called Dynamic Adaptive Streaming over HTTP (DASH), would take care of streaming over the internet and a DASH CfP was issued in Dresden. Responses to the CfP were received at the following Geneva meeting (July 2010) providing a wealth of technologies that have created a stream of activities leading to a DASH standard approved in December 2011. A very high level of activity continue today extending the scope of the specification.

The second activity, called MPEG Media Transport (MMT), was designed to develop standard Systems technologies for an IP network where in-network intelligent caches are close to the receiving entities and are in charge to actively cache, and adaptively packetise and push content to receiving entities to

Enable easy access to multimedia components in multimedia contents
Improve the inefficiency caused by different delivery and storage formats
Combine various multimedia content components, which are located over various cashes and storages.

The MMT CfP was issued at the Geneva meeting (July 2010).

Little by little the need for a taditional “triadic” MPEG standard emerged, where the Systems part would be covered by MMT and the Video part by HEVC. The new standard was called MPEG-H with the rather long, but certainly expressive title “High Efficiency Coding and Media Delivery in Heterogeneous Environments” where coding efficiency and delivery in heterogeneous environments were the focus.

The Audio part of MPEG-H was originally less clear but everybody realised that if HEVC could provide much higher resolution video, MPEG-H Audio could not be a simple revisitation of the AAC family. Eventually the Audio component of MPEG-H was defined as coding of multichannel audio where the number and position of microphones at the sending part and loudspeakers at the receiving end were independent. Part 3 of MPEG-H was called 3D Audio and the 3D Audio CfP was published at the Geneva meeting in January 2013, the very meeting which approved the HEVC standard in its original form.