The target of the first MPEG work item was of interest to many: the CE industry because it could create a new product riding the success of CD Audio extending it to video, the IT industry because interactivity with local pictures enabled by the growing computing power of PCs was a great addition to its ever-broadening application scope and the telco industry because of the possibility to promote the development much-needed integrated circuits for the H.261 real-time audio-visual communication that they, as explained before, were unable to develop by themselves.
This is a possibly too sweetened a representation of industry feelings at that time because its industry had radically different ways of operation. In the Consumer Electronics world when a new product was devised, each company, possibly in combination with some trusted ally, developed the necessary technology (and file the enabling patents) and put their – incompatible – version of the product on the market. As other competitors put their versions of the product on the market at about the same time, the different versions competeed until the time the market would crown one as the winner. At that time the company or the consortium with the winning product would register some key enabling technology of the product with a standards body and would start licensing the technology to all companies, competitors included. This had happened for the Compact Cassette (CC) when the winner was Philips against Bosch; for the CD, when the winners were Philips and Sony against RCA; and for the VHS Video Cassette Recorder (VCR) when the winner was JVC against Sony.
The project proposed by MPEG implied a way of operation that was clearly going to upset the established modus operandi of the CE world. Participants knew that, by accepting the rules of international standardisation, they would be deprived of the rightful time-honoured “war booty”, i.e. the exclusive control of the patents needed to build the product that also largely controlled its evolution, in case they were the winners. The advantage for them was the costly format wars could be avoided.
Another industry had mixed feelings: broadcasting. Even though digital television was a strategically important goal for them, in the second half of the 1980s the bitrate of 1.5 Mbit/s was considered way too low to provide pictures that broadcasters would even remotely consider as acceptable. On the other hand, they clearly understood that the technology used by MPEG could be used for entry-level network-based services and could later be extended to higher bitrates, that were expected to provide a quality of interest to them. A glimpse of their attitude can be seen in the letter that Mr. Richard Kirby, the Director of CCIR at that time, sent to the relevant CCIR SG Chairmen upon receiving news of the establishment of MPEG. The letter requested the Chairmen to study the impact that this unknown group could have on future CCIR activities in the area.
At my instigation, between January 1988 and the first MPEG meeting in May, a group of European companies had gathered with the intention of proposing a project to the ESPRIT program. A consortium was eventually established and a proposal put together. Called COding of Moving Images for Storage (COMIS), it had dual purposes: to contribute to the successful development of the new standard by pooling and coordinating European partners’ resources, and to give European industry a time lead in exploiting that standard. At the instigation of Hiroshi Yasuda, a project with similar goals was being built in Japan with the name of Digital Audio and Picture Architecture (DAPA). Some time later, a European project funded by the newly established Eurescom Institute (an organisation established by European telcos) and called Interactive Multimedia Services at 1 Mbit/s (IMS-1) was also launched.
Therefore, by the time the first meeting of the MPEG group took place in Ottawa, ON in May 1988, the momentum was already building and indeed 29 experts attended that meeting, although some of them were just curious visitors from the JPEG meeting next door. In Ottawa the mandate of the group was established. Drafting this was an exercise in diplomacy. There were already other groups dealing with video coding in ITU-T, ITU-R and CMTT, so the mandate was explicitly confined to Storage and Retrieval on Digital Storage Media (DSM). With this came the definition of the 3 initial planned phases of work:
Phase 1 | Coding of moving pictures for DSM’s having a throughput of 1-1.5 Mbit/s |
Phase 2 | Coding of moving pictures for DSM’s having a throughput of 1.5-10 Mbit/s |
Phase 3 | Coding of moving pictures for DSM’s having a throughput of 10-60 Mbit/s (to be defined) |
People in the business had no doubt about our plans. We intended to start working on low-definition pictures, for which technology was ready to implement and a market was expected to exist because of a great carrier – the CD – existed and because of plans of the CE industry and, partly, the telco industry. The next step would then be to move to standard-definition pictures for which a market did exist because industry was ready to accept digital television as plans for it had been ongoing for years. Eventually we would move to HDTV. These plans were sharply in contrast with those prevailing, especially in European broadcasting circles, where the idea was to start from HDTV and define a top-down hierarchy of compatible coding schemes – technically a good plan, but one that would take years to implement, if ever.
One meeting in Turin and one in London in September followed the Ottawa meeting. So, with the video coding work in MPEG on good foundations, I could pursue another favourite theme of mine. A body dealing with moving pictures with a wide participation of industry was good, but fell short of achieving what I considered a goal of practical value, because audio-only applications are plentiful, but appealing mass-market video-only application are harder to find. The importance of this theme was magnified by my experience of the ISDN videophone project of the ITU. In spite of this project being for an AV application par excellence, the video coding standard (H.261), an outstanding piece of work and the multiplexing standard (H.221), a technically less than excellent piece of work – but never mind – had been developed, but the audio coding part had been left unsettled. This happened because CCITT SG XV had tasked the Video Coding Experts to develop the videophone project, but the Audio Coding experts operated in SG XVIII, and the videotelephone team did not dare to make any decision in a field they had no authority on.
This organisational structure of the ITU-T, and a similar one in ITU-R and IEC, was a reflection of the organisation of the R&D establishments, and hence of the business, of that time: research groups in audio and video were located in different places of the organisation because of their different background, target products and funding channels. This was also a reflection of the services that had started with audio and then with audio-video, but where video had the lion’s share. My personal experience of television – but I may be biased in my judgment – is that the video signal is always there, but the audio signal is there only if everything goes smooth. This is not because the audio experts have done a lousy job but because the integration of audio and video has never been given the right priority – in reasearch, standardisation, product development and operation.
For a manufacturer of videophone equipment, the easiest thing to do was to use one B channel for compressed video and one B channel with PCM audio, never mind the not-so-subtle irony that one channel carried a bitstream that was the result of a compression of more than 3 orders of magnitude – from 216 Mbit/s down to 64 kbit/s – while the other carried a bitstream in the form prescribed by a 30-year old technology without any compression at all!
So, besides video, the audio component was also needed and an action was required lest MPEG end up like videoconference, with an excellent video compression standard but no audio (music) or with a quality non comparable with the video quality or unjustified different compression rates for the two. The other concern was that integrating the audio component in a system that had not been designed for that could lead to some technical oversights that could only be solved later with some abominable hacks. Hence the idea of a “Systems” activity, conceptually similar to the function executed by H.221 for ISDN videophone, but with a better performance because it was more technically forward looking. The goal of the “Systems” activity was to develop the specification of the complete infrastructure, including multiplexing and synchronisation of audio and video so that building the complete AV solution became possible.
After the promotional efforts made in the first months of 1988 to make the industry aware of the video coding work, I undertook a similar effort to inform the industries that MPEG was going to provide a complete audio-visual solution. In this effort I contacted Prof. Hans-Georg Mussmann, director of the Information Processing Institute at the Technical University of Hannover. Hans was well known to me because he had been part of the Steering Committee of the “Workshop on 64 kbit/s coding of moving video“, an initiative that I had started in 1988 to promote the progress of low bitrate video coding research and he had actually hosted the first two workshops. Because of his institute’s and personal standing, Hans was playing a major role in the Eureka project 147 Digital Audio Broadcasting (DAB).
The last meeting of 1988 was held at Hannover. The first two days (29 and 30 November) were dedicated to video matters and held at the old Telefunken labs, those that had developed the PAL system. Part of the meeting was devoted to viewing and selecting video test sequences to be used for simulation work and quality tests. The CCIR library of video sequences had been kindly made available through the good offices of Ken Davies, then with the Canadian Broadcasting Corporation (CBC), an acquaintance from the HDTV workshop. Two of the video sequences – “Table Tennis” and “Flower Garden” – selected on that occasion would be used and watched by thousands of people engaged in video coding research both inside and outside of MPEG. Another output of that meeting was the realisation that the MPEG standard, to be fully exploitable for interactive applications on CD-Read Only Memory (CD-ROM), should also be capable of integrating “multimedia” components. Therefore I undertook to see how this request could be fulfilled.
The last two days (1 and 2 December) saw the kickoff of the audio work with the participation of some 30 experts at Hans’s Institute. Gathering so many audio coding experts had been quite an achievement because, unlike video and speech coding for which there were well established communities developing technologies with a long tradition in standardisation – myself being one element of it – audio coding was a field where the number of researchers was more limited and scattered in a reduced number of places like the research establishment of ATT, CCETT, IRT, Matsushita, Philips, Sony, Thomson and a few others. The Hannover meeting gave attending researchers the opportunity to listen, in some cases for the first time, to the audio coding results of their peers. So the first MPEG subgroup – Audio – was born and Prof. Mussmann was appointed as its chairman. The meeting also produced a document, intended for wide external distribution, which invited interested parties to pre-register their intention to submit proposals for video and audio coding algorithms when MPEG would issue a Call for Proposals (CfP).
Bellcore, a research organisation spun off from the Bell Labs after the break-up of ATT, hosted the February 1989 meeting at their facilities in Livingston, NJ. The main task of the meeting was to develop the first version of the so-called Proposal Package Description (PPD), i.e. a document describing all the elements that proposers of algorithms had to submit in order to have their proposals considered. The document also contained the first ideas concerning the testing of proposals, both subjective and objective.
That meeting was also memorable for the attendance of Mr. Roland Zavada of Kodak. Rollie, the chairman of a high-level ISO group coordinating image-related matters, had come to inspect this unheard-of group of experts dealing with Moving Pictures – which he had clearly taken to mean Motion Pictures – with a membership growing at every meeting like mushrooms.
Livingston was followed by Rennes in May and Stockholm in July 1989. The latter meeting produced a new version of the PPD where the video part was final and incorporated in the CfP. This contained operational data for carrying out subjective tests but also data to assess VLSI implementability and to weigh the importance of different features. Similar data were also beginning to populate the part concerning the audio tests. For systems aspects the document was still at the rather preliminary level of requirements.
At the Stockjolm meeting the second MPEG subgroup – Video – was established and Didier Le Gall, then with Bellcore, was appointed as its chairman. This particular subgroup was established as a formalisation of the most prominent of the ad hoc groups that had already been working, meeting and reporting in the area of Video, Tests, Systems, VLSI implementation complexity and Digital Storage Media (DSM).