MPEG-1 Development – Audio – Riding the Media Bits

Work in the Audio group was also progressing. Many participants were people interested in audio-only applications, some of them working in the Eureka 147 DAB project. For the majority of them it was important to develop a standard that would provide compressed audio with CD quality at the bitrate of 256 kbit/s because that bitrate was suitable for digital audio broadcasting. This target affected the video work and indeed video simulation results had to be shown at 1.15 Mbit/s because this was the remaining bitrate from the total payload of about 1.4 Mbit/s of the CD.

The approach of the Audio group in the development of the standard was somewhat different than the one followed by the Video group. Instead of producing a general CfP, the Audio group first worked to cluster the proposals that the different companies were considering. Of course this did not mean that the CfP was to be open to anybody else.

These were the four clusters:

Transform Coding with overlapping blocks
Transform Coding with non-overlapping blocks
Subband Coding with less than or equal to 8 subbands
Subband Coding with more than 8 subbands.

The clusters were encouraged to provide a single proposal and this indeed happened. Swedish Radio (SR) was kind enough to perform the subjective test of the four clustered proposals using “golden ears”, i.e. specialists capable of detecting the slightest imperfection in a sound. The results of the subjective tests were shown in Stockholm in June 1990 (this was formally a part of the Porto meeting in July, where the rest of MPEG was meeting). The reason for having this meeting in Stockholm was to be able to listen to the submissions in the same setup the golden ears had used for the tests.

The first clustered proposal performed the best in terms of subjective quality. However, implementation penalty was not unexpectedly higher than the fourth clustered proposal that scored less but with a lower implementation complexity. This was an undoubted challenge that the audio chairman resolved with time and patience. This was also the last achievement of Hans Mussmann who left MPEG at the Paris meeting in May 1991. His place was taken over by Prof. Peter Noll of the University of Berlin.

The result of the work was an audio coding standard that, unlike the corresponding video standard, was not monolithic because there were three different “flavours”: the first – called Layer I – was based on subband coding and had low complexity but the lowest performance, the second – called Layer II – was again based on subband coding with average complexity and good performance and the third – called Layer III – was based on transform coding and provided the best performance, but at a considerably higher implementation cost. So much so that, at that time, many considered Layer III as impractical for a mass-market product. Therefore there could be 3 different conforming implementations of the MPEG-1 Audio standard, one for each layer. The condition was imposed, however, that a standard MPEG-1 Audio decoder of a “higher” layer had to be able to decode all the “lower” layers.

The verification tests carried out before releasing the MPEG-1 Audio standard showed that subjective transparency, defined as a rating of the encoded stereo signal greater than 4.6 in the 5-point CCIR quality scale, as assessed by “golden ears”, was achieved at 384 kbit/s for Layer I, 256 kbit/s for Layer II and 192 kbit/s for Layer III. The promise to achieve “CD quality” at 256 kbit/s with compressed audio had been met and surpassed. Today, with continuous improvements in encoding (which is not part of the standard) even better results can be achieved.