MPEG-2 Development – Audio

MPEG-2 Development – Audio

As with MPEG-1, the Audio work in MPEG-2 took a different turn from its original direction. MPEG-1 Audio already provided an excellent way to compress stereo audio, exactly what many broadcasters were thinking of providing as a first step in their soon-to-come digital services. But some expected that the future would lie with a further enhancement of the user experience, to be provided by multichannel audio services. For a Service Provider (SP) it made a lot of sense to start with MPEG-1 stereo sound and upgrade it later to a multichannel audio service that could still be received by the existing population of MPEG-1 Audio receivers, even though the latter would continue to get only stereophonic, not multichannel sound. This was the same argument that was made by people who wanted to have a scalable MPEG-2 Video. Why was the audio argument accepted and not the apparently similar video argument? 

The answer to this question has many facets. On the one hand, there was a matter of personalities involved in the discussions in the two groups. On the other, there was the obvious consideration that the video part of a program would require, in general, one order of magnitude more bits than the audio part and, therefore, a slight inefficiency in the use of the total program bitrate for the audio could be tolerated, while for video inefficiency would come at too high a price to pay. 

At the Haifa meeting the decision was made to adopt the requirement that MPEG-2 Audio be backward compatible with MPEG-1 Audio. This requirement seemed to considerably restrict the range of technologies that could be submitted in response to the MPEG-2 Audio CfP. Still 10 submissions were received in response to the call. 

After a while, the Audio group began to feel uneasy because some felt that, by working exclusively on a backward-compatible solution – justified, as shown before, for digital television services – MPEG was excluding pure audio solutions where the excellence of the standard was going to be judged exclusively on the ground of the highest audio quality at the smallest amount of bits/s. This issue was raised by a US NB contribution submitted to the July 1993 meeting in New York, hosted by Columbia University. So the decision was made that, when carrying out MPEG-2 Audio Verification Tests on the Backward Compatible (BC) solution, MPEG would also use yet-to-be-identified Non-Backward Compatible (NBC) codecs in order to assess the improved performance that could be obtained with an unconstrained algorithm. If the tests showed that the backward-compatibility constraint did introduce too heavy a compression penalty, MPEG would initiate the development of a new, NBC multichannel audio coding standard. 

I personally liked the idea of creating an internal competition between what was bound to be two groups of people working on different technologies because competition could only improve the performance of both the BC and NBC multichannel audio coding solutions. 

At the same meeting I was involved in an unusual case. One evening, past midnight, I was working with a group of MPEG members in a room at Columbia University that was hosting the MPEG meeting. Tristan Savatier, then with Thomson Consumer Electronics, Los Angeles and a very active member of the Video group, felt the need for a cup of coffee and went out to get one but found the kitchen door locked. He worked on the lock, got in the kitchen and had his cup of coffee but was caught red-handed by the night security. Of course his intentions became known to me only after the fact. I had to take responsibility for Tristan’s future actions – for that evening, I mean, not forever – or I would have lost his work that night.

Unexpectedly, at the Paris meeting in March 1994, the US NB requested that MPEG endorse a specific proprietary multichannel audio coding solution as one element of the MPEG-2 Audio standard family. My reaction, at the mid-week plenary, that this was not in line with the MPEG policy of major standards developed within the group, was greeted with whistles of disapproval on the part of some members. The Friday plenary saw a rather lengthy monologue of mine interrupted by a few exchanges of words with some MPEG members. This was a christening of fire for Peter Schirling of IBM, who had just been appointed as head of the US delegation, as Cliff Reader had left that position one year before at the Sydney meeting and had been replaced by Greg Wallace, then with 3DO, who had left that position the meeting before. The meeting ended with a confirmation of the MPEG policy that continues to this day. 

One MPEG member recorded this monologue on an audio cassette (current ISO rules would not allow this) and, subsequently, Tristan Savatier got a copy of the tape and converted it to MPEG-1 Audio Layer II and posted it on a web site. The posting was structured in a way that looked like a soloist performance with titles created from the more interesting (for him) passages of my monologue, much as in an Italian opera. My reaction to this initiative was that, since I had not released the copyright of my “performance”, the posting was illegal and should be removed (call it my version of “cease and desist”). This request of mine, however, was met by a shrug of the shoulders (virtual, as this happened by email). Therefore I can probably claim to have been the target of the first example of an unauthorised posting of a “performance” (not musical, I agree, but performance it still was) on the web. True that the coding technology used was still MPEG-1 Audio Layer II and not the eventually more famous Layer III, but that was just a proof of how people were working hard to bring MPEG technologies to the masses. 

The work on what would eventually be called Advanced Audio Coding (AAC), followed the usual steps of requirement definition, CfP and collaborative development. Marina Bosi, then with Dolby Labs, was appointed as its editor. While returning to the hotel one evening of the AES Convention in New York, she was badly hit by a taxi and had to undergo several surgeries before recovering. The group deeply appreciated her determination in the way she carried out her duties while in such terrible personal circumstances that would have crushed the resistance of many.  When Marina was reporting the completion of the AAC work at the Bristol meeting in April 1997 before the final approval, I asked her what she was still using her walking stick for (and she still badly needed it at that time). She then defiantly set it aside and, standing, completed her report. 

The Verification Tests (VT) showed that subjective transparency was achieved at 128 kbit/s, a 50% gain over MPEG-1 Audio Layer II! As for MP3, the best AAC encoders today can provide even better performance. The VT also confirmed that the original target of “indistinguishable” audio quality at 384 kbit/s for five full-bandwidth channels was achieved and exceeded: tests carried out by BBC and NHK showed that 320 kbit/s were sufficient to achieve the target.