Already in 1991, when MPEG-1 was maturing and the definition of MPEG-2 was rapidly progressing, I had begun to wonder whether there was a scope for work beyond what had been started in 1988, i.e. coding of audio and video for “high” bitrate applications, i.e. above 1 Mbit/s. I triggered some discussions at the Paris MPEG meeting in May 1991 and the rather discounted conclusion was that the lower end of the bitrate spectrum was a likely candidate for such a work. |
That was far from a “new” area for audio and video coding. The ITU-T had been producing a number of speech coding standards aimed at reducing the canonical PCM rate of 64 kbit/s obtained from 8 kHz sampling and 8 bits per sample. Other bodies, like ETSI with GSM, were defining new speech codecs for mobile applications while work had also been done for a so called “wideband speech codec”, i.e. a codec for speech sampled at a rate of 16 kHz and more than 8 bits/sample. This low bitrate video coding area had also attracted my attention before. In 1987, when I was not as sceptical as today of everything related to person-to-person visual communication, I felt the need to promote ISDN visual telephony because that visual communication field was moving at a snail’s pace. The Picture Coding Symposium (PCS), the recognised forum for video coding studies, had papers on the topic but I thought that by starting the International Workshop on 64 kbit/s coding of moving video and promoting more focused R&D, I could accelerate maturity and eventual deployment of visual telephony on ISDN. The original goal of the H.261 project targeted transmission rates of nx384 kbit/s (384 being the the minimum common denominator between European and American rates of 2048 and 1536, to accommodate the old transmission multiplexing split) when it became clear that 384 kbit/s was too high a transmission speed to be of practical telco interest, but was soon changed to a project for px64 kbit/s coding – where p was allowed to assume any value from 1 to 30 – because ISDN lines with their 128 kbit/s were sitting idle waiting for applications. The ITU-T had even started a new project, called H.263, to develop a video codec to improve the performance of H.261 for the lower bitrates. That was partly because of the new results brought by the insuppressible activity of Gisle Bjøntegaard then of Norwegian Telecom, and because of two announcements of consumer-grade videophones for analogue telephone lines based on proprietary solutions. All these, however, were initiatives dealing specifically with real-time person-to-person telecommunication applications, the bread and butter of ITU-T, while at that time MPEG had already fully embraced the “generic” approach to media coding standards, aiming at defining the basic coding technology that application domains would then customise for their own specific needs. The domains would certainly include person-to-person communication, but also new opportunities coming from the advancing digital networks, at that time more ATM than the internet, capable of offering on demand entertainment services. The issue of terminals capable of receiving those bitstreams was of lesser concern because the involvement of foot-dragging telco and CE manufacturing was no longer the stumbling block because the PC was an ideal platform because of its opennes and programmability and its widespread deployment. Additionally the advancing digitisation of mobile networks promised by the 3rd Generation (3G) mobile standards provided another opportunity to offer new applications and services. For a few meetings, MPEG kept on discussing the topic and 18 months later had gone a long way in the identification of what it could mean for MPEG to develop a standard in this area. At the SC 29 meeting in November 1992 in Ottawa I presented the proposal for a new project with the title “Very low bitrate audio-visual coding” that was unanimously approved. It took longer than usual for JTC1 to approve the project, but at the July 1993 New York meeting news came that the project had been finally approved.