At the beginning of December 1970, I returned home after spending 30 months in Japan as a Ph.D. student at the University of Tokyo. That had been both a great and hard period of time for me. Great, because during the day I was fully immersed in Japanese life, being part of the Miyakawa Laboratory, and hard because I had to live with a scant (for a foreigner) scholarship of 33,000 JPY, of which 2,700 went as the rent of the room at the Foreign Student House in Komaba, near Shibuya. So I had decided that I should suspend my Ph.D for a while, get a job and finish my Ph.D. until I was on firmer financial grounds.
A prospective employer was CSELT, the research centre of SIP, the name of the national telephone company at that time, now called Telecom Italia. At the job interview with Prof. Bonavoglia, the director of CSELT at that time, I mentioned a short video coding experience I had in Japan in the Taki Laboratory next door and was assigned to that job.
It was indeed a favourable moment for such an assignment. I have already had the opportunity to mention more than once that telcos had always had the dream of giving their subscribers the opportunity to have richer forms of communication extending the basic vocal form. Already in the late 1920s, Deutsche Post and ATT had shown working prototypes of visual telephony systems. But Black Tuesday and, soon after, Depression, World War II and Reconstruction had set different priorities. Eventually in the late 1960s, after decades of talks, AT&T started a serious attempt to offer video telephony to its subscribers. The service was called Picturephone, where an analogue video signal, with 267 lines and a bandwidth of 1 MHz (another video format!), was transmitted over the telephone subscriber line. Everybody expected that, with the commercial success of Picturephone, the next generation of videophone service would be digital and some telcos were even harbouring the hope of leapfrogging AT&T and going straight to a digital service.
It was not to be so. AT&T devised all sort of tricks to convince people to subscribe to the new service, like creating communities of happy picturephoners, but all was in vain and in the mid 1970s the service was discontinued. Fortunately research on video compression continued because digital video had more applications, not to mention the fact that digital video on an integrated digital network could be more attractive, at least from the operational – if not technological – viewpoint. In Europe the COST 211 project, started in 1974 at about the time the Picturephone service was being discontinued, led to 4 different manufacturers producing in the early 1980s 4 models of the 2 Mbit/s videoconference terminals based on the 4 prototypes that had been independently developed by 4 telco research establishments. The 4 models offered guaranteed interoperability because they had been tested within COST 211 and some tens of terminals were sold before production was stopped. In the USA a company called Compression Labs, Inc. (CLI) was established and their videoconference codecs had a moderate success. In Japan NTT tried to offer a videoconference service between Tokyo and Osaka, but with little success.
In the mid 1980s, ITU started the nx384 kbit/s videoconference project, but the scarce availability of 384 kbit/s accesses and a greater interest in an ISDN-based video service triggered the extension of the of the project to px64 kbit/s. Videophones and videoconference terminals based on H.261 and H.221 became commercially available in the early 1990s, but success was meagre. ITU-T started a successor project leading eventually to H.263 and H.223, but also the new terminals were far from a roaring success, so much so that the two major companies CLI and PictureTel making videophone products no longer exist. The former because it was closed down and the latter because it was acquired by Polycom.
Interest in video communication was also shown by new players. In the early 1990s Intel launched its Indeo communication system that has left no trace of itself. Microsoft developed NetMeeting, a Windows application for audio and video communication based on ITU-T standards that was occasionally used. ATT and Marconi developed and marketed two “analogue” videophones, whose main feature was that they could be plugged to an analogue phone socket, but employed sophisticated video compression and had a built-in modem. After the usual hype and a fair amount of telco executives’ panic, the two devices fell into oblivion.
Toward the mid 1990s the mobile telecommunication industry started developing specifications for 3G networks. The main feature driving this development was a higher bitrate than available on 2G networks, e.g. 9.6 kbit/s of GSM. What for? Obviously, videotelephony on the move. This service was considered so important that Wideband-CDMA (W-CDMA), was designed as a circuit-base communication system like GSM to support mobile videotelephony. In Italy a new 3G operator was established that boasted its credentials by advertising its ability to provide video telephony on the move.
More recently Skype has developed a new business model for free internet-based telephony. Video has been an early addition to the service and some do indeed use it. Unfortunately lack of bandwidth forces many to downgrade the audio-video call to voice only.
Cisco has invested sognificant amount of money to develop a video conference system that relies on very high bandwidth and provides very high picture quality at high resolution. There has been much hype but it is hard to see how this experience for high-layer corporate users can translate into a mass phenomenon.
Why am I saying all this? Do I want to despise my first job assignment? Do I have an agenda? Do I want visual communication to be a failure? Am I in search of a self-fulfilled prophecy? Well, not really. It is true that one of the reasons why I started MPEG was because, after more than 15 years of efforts on my part, I saw the inconclusiveness of the visual communication business I had been selected to be part of. On the other I am certain that humans one day will be fancied with a form of communication that offers them a fuller satisfaction to their desire to interact with other humans physically separated from them. But I am also sure that what has been offered so far has little if anything to do with this fuller form of communication.
Still, the first questions telco managers used to ask when they were shown a videophone is: how good is picture quality, how well does it compare with product xyz? Wrong questions! Not that picture quality is irrelevant, but it is the last question to ask and the last element to take into consideration if one is designing this alternative communication form. Better questions would be: how does the system adapt to lighting changes, how effective is echo suppression, how can parallax be compensated, if it is a multipoint system, how is presence managed. But even these questions do not go to the core. The real questions are: what are the motivations for a consumer (business can be a different story) to be fancied with such a system, what are the classes of information elements that people want to convey, what does visual information bring that audio does not, how does this communication system fit in the spatial environment of a house etc. etc.
This happens because telcos have changed their skin, but their inside is the same. Once I asked a telco executive: why are you investing in ADSL video telephony? What kind of market studies have you carried out? (not that I believe too much in what is sometimes smuggled as “market study”, but I had to start from something that could be considered as “common ground”). I saw a moment of panic in his eyes and then he said: because we must stop talking about broadband services and start doing something in earnest. Belatedly realising that the logic of his answer was far from overriding, he then added: because our competitor does it. Having laid down these two elements of unassailable logic on the table, our conversation languished.
Telco people used to be driven (fortunately less so today) by the idea that the wires underground are what drives the business (the few who dared to object to this postulate were eliminated) and these are the people who plan new services. My take is that control of the wires (or of its Hertzian equivalent) is what enables the business, but the driver is elsewhere. This kind of fuller communication to which visual communication belongs is the remotest thing possible from telephone wires. It has more to do with the untold reasons why a man buys a certain tie and a woman a certain skirt than it has with wires.
Much is probably to be learned from Cisco’s TelePresence service that allow “everyone, everywhere to be “present” to make better and faster decisions through one of the most natural and lifelike communications experiences available”.