Archives: 2015-August-20

Electronic Computers

Since olden times, humans have invented practical mnemonic rules to speed up calculations on numbers – and ensuring that the results are correct. These were of varying complexity depending on the system used to represent numbers: very complex with Roman numbers and rather straightforward with Arabic numbers, but they all practically relied on the application of some blind rules. Therefore it was only natural that thinkers of different times should entertain the idea of building machines capable of performing calculations automatically by codifying those rules into the operation of a machine. 

In modern times some of those individuals were Napier (16th century), Pascal and Leibniz (17th century) and Babbage (18th century). Concrete and lasting developments, however, only happened in the 20th century when several mechanical and electromechanical machines were built, among these the so-called Harvard Mark I. ENIAC, was built entirely with vacuum tubes immediately after World War II and can probably be taken as the beginning of the age of electronic computers. 

Computers were born as machines capable of processing binary data (but I used one that employed ternary data during an internship at KDD Labs. in the mid 1960s) based on an internal processing architecture. The basic common elements of the architecture are the RAM and the Central Processing Unit (CPU). The CPU has permanently wired instructions, the basic commands that the machine executes such as comparing two memory locations and performing certain operations depending on the > or = or < value of the comparison, or calculating the square of the number and returning the result in a register. The programs, i.e. the sequence of instructions written by the programmer to achieve some predetermined goal, are stored in the RAM, but the program instructions themselves can be altered during the execution of the programs. 

Beyond these common general architectural elements, usually referred to as “von Neumann architecture” from the name of the mathematician who first gave a systematic treatment of the topic, computer designers made their own decisions concerning the specific elements: the set of instructions, the organisation of bits in bytes and words, the number of bits in a byte, the number of bytes to be used in the representation of integers and real numbers, the number of bytes used to transfer data from one part of the computer to another, etc. 

Mass storage devices, such as disks and tapes – even paper tapes – were used as input/output (I/O) devices with the feature of having a large capacity and of retaining the data in a permanent way. Therefore they could also be used to “load” programs or portions of programs at “run time”. Computers also had a variety of other input devices, such as keyboards that operators could utilise to input numbers or characters, and output devices, such as printers, that could be used to communicate the results of the calculations. 

For the first 20 years electronic computers were very large machines each costing million USDs. They were housed in large air-conditioned rooms, usually equipped with a large number of peripherals and attended by a host of specialists. These “vestals” reborn for the computer age were the only ones allowed into the “penetralia” of the computer room while computer users, resembling more like day labourers in a queue than scientists, were forced to wait in line with their card decks, hand them over to the specialists who, in due course, would give back a printout with the result of the execution of the program after “jobs” had been assigned at their own inscrutable discretion. 

The evolution of the computer from “big iron”, when one needed a surface comparable to a warehouse to run the machine, to today’s almost infinite computer manifestations – large and small – has been truly exciting. The use of transistors first and integrated circuits later progressively reduced the size while increasing the number crunching and random-access storage capabilities. This success story can be explained by the fact that computers were the tool that enabled the solution to problems that would otherwise not have been solved, or solved at much higher cost, including scientific calculations, payrolls, client orders, and inventory management. 

As the breadth of new application domains became clear, one could see an ever-increasing number of types of computers made by an increasing number of companies who saw computing technology as an opportunity of a “we must be there” business. The large number of computer companies in the first 20 years of electronic computing and the very reduced number today would seem to support the idea that what was at work was an almost perfect implementation of the Darwinian process of selection of the fittest. 

Every company who undertook to enter the electronic computer business started from scratch developing their implementation of a von Neumann machine, adding peripherals, writing system and application software and selling computing machines with new features and models at an accelerated pace. Some of those machines are still in the memory of those who used them and admired for the excellence of the technical solutions adopted in those early times. 

But was this the triumph of a Darwinian process applied to competing companies? I dare say no. Towards the mid 1980s computer manufacturing saw IBM as the largest computer company, a position that it had easily created for itself since the early days of the electronic computer age. The next entry in the list was Digital Equipment Corp. (DEC), a company with a size one order of magnitude less than IBM’s. The conclusion is easy to draw: these fascinating results were achieved not because there was competition, but because of the size of IBM and its already dominant position in the market of “Tabulating Machines” that IBM already occupied before the advent of the electronic computer age. 

This dominant position allowed IBM to mobilise its undoubtedly excellent research, design and manufacturing capabilities – an easier thing to do thanks to the rich revenues that the company enjoyed – respond to the needs of its existing clientele and conquer an increasingly larger share of the nascent new computer market. If IBM did not become eventually the only game in town it was because of the investigation of the USA Department of Justice – hardly a component of a Darwinian process – that paralysed the company for years and changed its attitude forever. The lack of competition, caused by the dominant position of IBM, forced people who could not compete with computers that were “better” for the existing categories of customers, to make computers that were “different” i.e. “smaller” (compared to the mainframes of that time) to serve new categories of customers. 

In the late 1960s (and for quite some time afterwards) those buying computers were the big corporate data processing departments that derived considerable power from the fact that the data processing needs of any department had to be fulfilled by their very expensive corporate data processing machines. IBM and the other mainframe computer vendors provided computing solutions matching this organisational structure. 

The Digital Equipment and Data General start-ups attacked other less structured and more individualistic entities such as university departments and research laboratories, and started installing small (by the standard of those times) computers (hence the name “minicomputer”) that did not even require air conditioning and at a cost a fraction of the price of a mainframe. IBM had its hands tied against this competition: if it had engaged in providing competitive minicomputer solutions, it would have lost the support of its rich corporate clients and undermined the market for its own primary products. 

So in the early 1960s the “minicomputer” started making inroads in company departments and started a process that gradually deprived the managers of the big centralised computers of their power. “Now you can own the PDP-5 computer for what a core memory alone used to cost: $27,000”, ran one 1964 advertisement by Digital Equipment (later acquired by Compaq Computer, itself later acquired by Hewlett Packard, which will be shedding its PC business soon). The Programmed Data Processors (PDP) series of minicomputers (PDP-8 and PDP-11) was very successful and brought cheap computing within the reach of small research groups, including mine at CSELT. 

In the early 1970s the time was ripe for yet another round of computer downsizing. The progress of circuit integration produced the microcomputer, a single silicon chip containing a complete CPU with RAM and some standard interfaces. The Altair computer by Micro Instrumentation Telemetry Systems (MITS) was the first commercially successful PC. It used the Intel 8080 8-bit microprocessor and barely (with today’s eyes) 256 bytes of RAM. It was designed as a kit for hobbyists and professionals to build and use in their home or work environments. 

This created the conditions for the mass introduction of the first PCs, home computers and game machines: Commodore, Atari and Apple and a host of other PC makers were established. Some of these shined for a while until they fell into oblivion: Apple Computers’ Apple II and Commodore’s Amiga were very successful. The former was instrumental in creating the PC image that we know today and the latter remains the emblem of the home computer making some think of what would be the world today if Amiga and not the PC had taken over in the home.

IBM identified the PC as an ideal opportunity for a new business that would eventually leapfrog the minicomputer industry. IBM designed and developed its PC giving authority to the group developing it to use whatever components they thought suitable for it, if necessary defying the iron internal procurement rules of the company. So in 1981 IBM unveiled the PC XT that used the then powerful 8088 8-bit Intel microprocessor, later to be replaced by the 8086 16-bit Intel microprocessor and the various elements of the extended family that later became known as Intel x86. The other innovative idea that the team adopted was to use openly available components for the hardware design so that anybody could build PC “clones”. 

In the second half of the 1980s Apple released the Macintosh that used the powerful Motorola 68000 16-bit microprocessor. So, throughout the late 1980s and the early 1990s, an increasing number of adepts got an IBM PC or a Macintosh and left the serfdom of terminals connected to the corporate or departmental mainframe. I was probably the first in my company to use an IBM PC XT in the office, but I used it in “dual mode” because it was also connected to the company mainframe in terminal mode. The side effect was that the PC started making inroads into the small office and even the home, as Amiga had disappeared because Commodore had gone out of business. 

Today the types of CPU in sizeable use in the PC environment is rather small. Among these towers the Intel CPU of which several generations have already seen the light and which has displaced the PowerPC from the Mac. This is the time to check the validity of yet another holy cow of technology, i.e. that competition makes technology progress. This is simply not true, not because competition is harmful, but because it is “expensive” and in most cases is irrelevant to reaching the goal. The CPU in largest use today in the PC world – and one that has created the largest silicon company in the world – is the offspring of a CPU that was designed 35 years ago. Other, possibly “better” CPUs did appear and for some time seemed to have the upper hand but no one could resist the x86 CPU because of the size of the almost captive market for the IBM PC CPU. Now, with the move by Apple to adopt the Intel CPU for the Mac, the dominance in the PC space is almost complete.

A new story began to unfold a few years ago with the development of the mobile handset, portable player, multimedia display and set top box markets. Here ARM plays a similar dominant role as Intel: ARM does not manufacture CPU chips but licences the design of its CPU core to manufacturers who offer different CPU chips all built around the same CPU core. Like in the mainframe vs. minicomputer case, evolution happens not because of competition but because new markets need new products.

Of course this page only tells one half of the story – the hardware story – because it does not mention the software component of the computer. For this we have to wait until some more actors come to the fore.


Carrying Bits

Because the computer industry was “born” digital, it was the first to be confronted with the problem of “mapping” digital data onto analogue carriers, i.e. storing bits on intrinsically analogue devices. One of the first solutions – storage of bits on paper tape – was limited to small quantities of data, typically programs, but magnetic technologies were more promising because one could use tapes, drums or disks to store data with vastly improved capacity. 

Magnetic tapes for sound recording had already achieved a considerable degree of maturity, as they had been in existence for some time. The difference was that sound was analogue and already band-limited so that a suitable transducer could convert a current or a voltage directly into a magnetic field and vice versa, while binary data from computers have a theoretically infinite bandwidth. The obvious solution was to “modulate”, i.e. pre-filter, the binary data so as to minimise the interference between the binary data entering the storage device, caused by their “infinite” bandwidth. Further, to enable the identification and correct writing/reading of the data, the information had to be “formatted”, i.e. had to be given a precise structure, in very much the same way as a text is formatted in lines, paragraphs and pages. 

Not long after the computer industry had been confronted with the problem of storing digital data on analogue media, the telecommunication industry was confronted with the similar need of “sending” digital data through the analogue transmission medium called telephone cable. One example is provided by the elements of the digital transmission hierarchy where the equivalent of the magnetic disk or tape formatting is the “frame”. The primary A-law based multiplexer has a frame of 256 bits (32 time slots each of 8 bits), where TS 0 has a fixed pattern so that it can act as a “start code” for a receiving device to know where to look in order to get the start of the frame (of course there is no guarantee that this code word cannot be emulated by unconstrained telephone data because it is possible that in a frame one speech sample has exactly that value). Higher-order multiplexers are organised in a similar manner, in the case of the European hierarchy, by multiplexing 4 lower-order streams. 

The COST 211 project mentioned above did not just develop the video coding part but provided the specification of the complete transmission system for videoconference applications. In the COST 211 system TS 1 carries speech, TS 17 optionally carries a digital facsimile signal and TS 18 optionally carries other data. Several additional types of information have to be conveyed from one terminal to the other, such as information on whether TS 17 and TS 18 contain video data or facsimile and other data. 

Transmitting digital data over telephone lines is conceptually the same as storing data on a local storage device, but in general the “modulation” schemes have to be much more sophisticated because the telephone line is a bandwidth-limited transmission system with a nominal bandwidth of 4 kHz (actually the transmitted speech signal has significant energy only in the 300 to 3,400 Hz band) with unpredictable characteristics caused by extremely variable operating conditions and by the long span of time telephone cable have been deployed, while magnetic tapes and disks have better-defined characteristics thanks to well-monitored manufacturing processes and more predictable operating conditions. 

The initial modulation schemes supported a low bitrate transmission of 300 baud (“bit/s” is also called “baud”, frorm Émile Baudot, the inventor of the telegraphy code) but later “adaptive” schemes were developed that automatically adapted their performance to the characteristics of the line and higher bitrates became progressively possible. More and more sophisticated schemes were developed and the bitrate climbed to several kbit/s but always as a multiple of 300. A set of widely used ITU-T recommendations started making it possible for to a new generation of nomadic users to connect from anywhere to anywhere else in the world over distances of possibly thousands of kilometres at values as high as 56 kbit/s depending on the end-to-end link “quality”. 

An ambitious goal that the telco industry set to itself in the late 1960s was the development of ISDN. The plan was to provide telephone subscribers with two 64 kbit/s channels (so-called B-channels) and one 16 kbit/s of signalling (so-called D-channel) for a total of 144 kbit/s (so-called 2B+D). With the usual schizophrenia of the telco business, ISDN is not fully defined in all its parts. In particular the modulation scheme to be used in the local access was left to each telco. Assuming that users are static (a reasonable assumption of that time) this is not an unreasonable assumption, but it is one that has prevented the later availability of ISDN connectors in laptops, the main reason why eventually ISDN did not fly, not even in countries where significant levels of deployment had been achieved. 

At the end of the 1980s, while the ISDN standardisation project was drawing to a close, some telco R&D laboratories showed the first results of what should have been great news for companies whose assets were buried underground in the form of millions of kilometres of telephone cable. The technology was called Asymmetric Digital Subscriber Line (ADSL), which would allow downstream (central office to subscriber) transmission of “high” bitrate data, e.g. 1.5 or 2 Mbit/s, with a lower-rate upstream transmission, e.g. 64 or 128 kbit/s from the subscriber terminal to the central office. 

One instance of this technique uses a large number of carriers that are placed in appropriate parts of the spectrum after an initial phase where the transmitter checks the state of the line by interacting with the receiver. This type of research work was generally ostracised within the telcos because it provided an alternative and competing solution, in terms of cost certainly not of performance, to the old telco dream of rewiring all their subscribers with optical fibres for some yet-unknown-but-soon-to-come pervasive “broadband” applications. Today ADSL provides “asymmetric” access (typically 5-10 times more bit/s downstream than upstream) to hundreds millions of subscribers around the world at increasing bitrates and is playing a major role in allowing fixed telephony service providers to survive.

If one sees the constant progress that magnetic disks are making in terms of storage capacity in one year and the snail-like progress of ADSL in the last 20 years, one could be led to think that the telco industry is simply not trying hard enough. While there is some truth in this statement ;-), one should not forget the fact that while manufacturing of hard disks happens in clean rooms, the local telephone access has to deal with a decade-old infrastructure deployed with wires of varying quality, different installation skills and unpredictable operating conditions. It is definitely not a fair comparison. 

If a comparison has to be made, it is with optical fibers used for long-distance transmission. In this case it is easy to see how the rate of increase in bitrates is even higher than the rate of increase in hard disk capacity. Again this is possible because optical fibres are a new technology and fibres are probably manufactured with as much care and using equipment as sophisticated and expensive as those used in high-capacity magnetic disk manufacturing. The problem is that, of the long distance fibres that were deployed in years of collective madness at the end of 1990s, only a few percent is actually lit, a fact that is clearly not disconnected from the slow introduction of broadband in the local loop. This underutilisation also shows the difference between the concrete advantage felt by a consumer buying a hard disk today with twice the capacity with the same or lower price compared to last year, versus financial decisions made by a telco executive based on expectations of long-distance traffic that depend on the coming of some gainful “broadband application” Messiah. 

The last physical delivery system considered in this list is the coaxial cable used for Community Antenna Television (CATV), a delivery infrastructure originally intended for distribution of analogue television signals. For this the widely chosen modulation system is Quadrature Amplitude Modulation (QAM).  The CATV industry has made great efforts to “digitise” the cable in order to be able to provide digital intercative services. The Data Over Cable Service Interface Specification (DOCSIS), now an ITU-T standard, provides high bitrates to CATV subscribers.

Another transmission medium rivaling the complexity of the local access telephone line is the VHF/UHF band used for television broadcasting on the terrestrial network. Already in the 1980s, several laboratories, especially in Europe, were carrying out studies to develop modulation methods that would enable transmission of digitised audio and television signals in the VHF/UHF bands. They came to the conclusion that such frequencies in typical conditions could carry between 1 and 4 bit/s per Hz depending on operating conditions. The lower the bitrate, the higher the terminal mobility and highest bitrates for fixed terminals. 

The modulation scheme selected in Europe and other parts of the world to digitise the VHF/UHF frequency bands is called Coded Orthogonal Frequency Division Multiplexing (COFDM). This injects a large number of carriers – up to several thousands, each carrying a small bitrate. It is a technology similar to ADSL, with the difference that in broadcasting no return channel is available to adapt the modulation scheme to the channel. In the USA a different system called 8 Vestigial Side Band (8VSB) – a single-carrier modulation system – was selected for digital terrestrial broadcasting. 

For satellite broadcasting the typical modulation scheme is Quadrature Phase Shift Keying (QPSK), a modulation system where the carrier’s phase is shifted in 90° increments. 

Digital cellular phone systems have been widely deployed in several countries. The modulation system used for Global System for Mobile (GSM) is Time Division Multiple Access (TDMA). TDMA is a multiple access technique where the access to the channel is based on time slots – like those used in digital telephony multiplexers – corresponding to digital channels, usually of a fixed bitrate. 

The 3rd generation (3G) mobile communication system is based on Code Division Multiple Access (CDMA), of which several incompatible flavours already exist (CDMA is used in some 2nd generation digital mobile telecommunication systems). This is a specialisation of a more general form of wireless communication called Spread Spectrum used in multiple-access communications, where independent users share a common channel without an external synchronisation. In this communication form the bitstream is spread throughout the available bandwidth using a periodic binary sequence, called Pseudo random Noise (PN) sequence. Because of this information scrambling, the bitstream appears as wide band noise. The receiver uses the same PN sequence as the transmitter to recover the transmitted signal and any narrow band noise is spread to a wide band signal.


Telecom Bits And Computer Bits

Since the early times of computing, it became apparent that CPUs should be designed to handle chunks of bits called “bytes” instead of or in addition to individual bits, obviously without altering the status of bits as the atomic components of information. After some odd initial choices (like the 6 bits of the UNIVAC byte), the number of bits in a byte soon converged to 8 (hence bytes are sometimes called “octets”). With the progress of technology, CPUs became capable of dealing with more bytes at the same time. In the late 1960s and 1970s minicomputers were based on a two-byte architecture that enabled the addressing of 64 bytes of memory. Today the CPU of some advanced game machines can handle many bytes at a time. 

When the telcos decided to digitise speech, they, too, defined their own “byte”, the speech sample. After some initial dithering between 7 and 8 bits – all in the closed environment of CCITT meeting rooms in Geneva, with Americans favouring 7 and Europeans 8 bits – the eventual choice was 8 bits. Unlike the computer world, however, in which most processing involves bytes, telecom bytes are generated at the time of analogue-to-digital (A/D) conversion, but then they are immediately serialised and kept in that form until they are converted to bytes just before digital-to-analogue (D/A) conversion. Because of the way D/A conversion works, the “natural” order of bits in a telecom byte is Most Significant Bit (MSB) to Least Significant Bit (LSB).

The order of bits in a byte really depends on the architecture of the particular computer that processes the bytes. The same ambiguity is found in multi-byte data where the identification of how bytes are stored in the computer’s memory is described by little or big-endian. In a big-endian system, the most significant value in the sequence is stored at the lowest storage address (i.e., first). In a little-endian system, the least significant value in the sequence is stored first.

Transmission also responds to very different needs than storage or processing. In the 1960s the telcos started using a serialised and comparatively high transmission rate of 1,544 or 2,048 kbit/s, but network equipment performed rather simple operations on such streams, one of the most important being the identification of “frame start”. Transmission channels are far from being error free and, as we have already said, the codeword identifying TS 0 can be emulated by speech samples. This means that a receiver must be programmed to deal with the moment it is first switched on and when frame alignment has been lost. The data that have flown in the meantime are, well, lost, but there is no reason to worry: after all it is just a few milliseconds of speech.

For quite some time the bitrate used for transmission of computer data over the network was limited to a few hundred kbit/s, but the network had to perform rather sophisticated operations on the data. Data transmission must be error free, which means that codeword emulation must be avoided or compensated and retransmission is requested for all data that, for whatever reason, do not satisfy strict error checking criteria. 

Because the network does not have to perform complex operations on the speech samples (which does not mean that the logic behind the routing of those samples is simple), the transmission mode is “synchronous”. This means that the transmission channel can never be “idle” and requires that speech samples be organised in fixed-length “frames”, where a frame is immediately followed by another frame. Most networks derive the clock from the information flowing through it, but what happens if there is no speech and all bits are then set to zero? To avoid the case when it is impossible to derive the clock, every other bit of speech samples are inverted. Computer networks, on the other hand, transmit data in frames of variable length called “packets”.

This is an area where I had another techno-ideological clash with my telco colleagues in Europe. While the work on H.261 was progressing, COST 211 bis was discussing ways to multiplex the same data that the original COST 211 project had found necessary: audio, facsimile, text messages and, because things were changing, even some computer data arriving through one of those funny multiples of 300 bit/s rates used in telephony modems. With all the respect I had for the work done in COST 211 (to which, by the way, I had been a major contributor myself), where data multiplexing was done in the best telco tradition of “frames” and “multiframes”, I thought that there should be more modern and efficient – i.e. packet-based – ways of multiplexing data. 

In COST 211 I had already proposed the use of a packet transmission system for exchanging messages between terminals and the Multi-Conference Unit (MCU), a device that performed the management of “multiconference”, i.e. a videoconference with more than 2 users. The message proposal had been accepted by COST 211, but this was not surprising because in telcos the “signalling” function was dealt with by people with friendly ears to the IT language. My new proposal to define a packet-based multiplexer for media, however, was made on a completely different environment and did fell on deaf (or closed) ears. This is why H.211, the multimedia multiplexer used for H.261, is a latter-day survivor of another age: it organises non-video data in chunks of 8 kbit/s subchannels and each of these subchannels has its own framing structure that signals which bit in the frame is used for which purpose. It is unfortunate that there is no horror museum of telecom solutions because this would probably sit in the cemtre.  

There were two reasons for this. The first, and more obvious, is because there are people who, having done certain things in a certain way throughout their life time, simply do not conceive that the same things can possibly be done in a different way, particularly so if the new ideas come from younger folks, driven by some alien, ill-understood new discipline. In this case my colleagues were so accustomed to sequential processing of bits with a finite state machine that they did not conceive that there could be a microprocessor that would process the data stream in bytes and not in bits, instead of a special device designed on purpose to follow certain logic steps. The second reason is more convoluted. In some Post, Telephone and Telegraph (PTT) administrations, where the state had retained the telegraph and postal services but had licensed the telephone service to a private firm, even though the latter was still under some form ot control by the state, there was ground for an argument that “packet transmission” was akin to telegraphy and that telcos should therefore not automatically be given a license to manage packet data transmission services. Those telcos were then afraid of losing what at that time was – rightly – considered the next telco frontier. 

This is what it means to be a regulated private company providing public services. Not many years ago, a time when the telco business was said to be unregulated – while the state happily put its nose in the telephone service price list – one could see different companies digging the same portion of the street more than once to lay the same cables to provide the same services, when doing it once would suffice for all. Or one could see different companies building the same wireless radio antennae twice or thrice, when one antenna would suffice for all (and also reduce power consumption and electromagnetic pollution). All this madness was driven by what I call “electric pole-driven competition” philosophy under the watchful eye of the European Commission that made sure that no one even thought of “sharing the infrastructure”.

Yesterday, cables were laid once and antennae hoisted only once, but then the business had to be based on back-door dealings where bureaucrats issued rulings based on some arcane principles, after proxy battles intelligible – if ever – only by the cognoscenti. 

Frankly, I do not know which one of the two I like better. If I could express a desire, I would like a regulated world without brainless bureaucrats (I agree, it is not easy…), or a competitive world where the achievements of competition are not measured by the number of times city streets are dug to lay down the same cables belonging to different operators to offer the same services, but by the number of smart new services that are provided by different operators, obviously in competition, on the same plain old physical infrastructure. Actually there is room for sharing also some non-physical infrastructure, too, but that is another story. 

Until recently mine was a heretic view, but the current hard economic times have brought some resipiscence to some Public Authorities. The Commission of the European Communities (CEC) has started having second thoughts about imposing the building of separate mobile infrastructures by each operator and is inclined to allow the sharing of infrastructure. There is no better means to bring back sanity in people’s minds than the realisation that the bottom of the purse has been reached.

A further important difference between transmission in the telecom and computer worlds is that, when computers talk to computers via a network, they do so using a very different paradigm than the telephone network’s. This is called connection-oriented because it assumes that when subscriber A wants to talk to subscriber B, a unique path is ideally set up between the two telephone addresses by means of signaling between nodes (switches), that is maintained (and charged!) for the entire duration of the conversation. The computer network model, instead, assumes that a computer is permanently “connected” to the network, i.e. that it is “always on”, so that when computer A wants to talk to computer B, it chops the data in packets of appropriate length and then it sends the first packet to the address of computer B attaching to the packet the destination address and the source address. The computer network, being “always on”, knows how to deliver the said packet of data through the different nodes of the network to computer B. When computer A sends the second packet, it is by no means guaranteed that the network will use the same route as the first packet. It may even happen that the second packet arrives before the first packet, because this one has possibly been kept queuing somewhere in other network nodes. This lack of guaranteed packet sequence is the reason why packet networks usually have means to provide “flow control” so as to free applications from this concern. This communication model is then called connection-less

Several protocols were developed to enable transmitters and receivers to exchange computer data in the proper order. Among these is the ITU-T X.25 protocol, developed and widely deployed since the 1970’s. X.25 packets use the High-level Data Link Control (HDLC) frame format. The equivalent of the 2 Mbit/s sync word is a FLAG character of 01111110 Bin or 7E Hex. To avoid emulation of the FLAG by the data, the transmitter inserts a 0 after 5 consecutive 1s, and the receiver deletes a 0 if it follows 5 consecutive 1s (this is called “bit-stuffing”). Having been developed by the telecommunication industry it should be no surprise that X.25 attempted the merging of the connection-oriented and connection-less models in the sense that, once a path is established, packets follow one another in good order through the same route.

The way data move through the nodes of a network is also paradigmatic of the different approaches of the telecommunication and computer worlds. Each of the 30 speech channels contained in a primary multiplex is instantaneously switched to its destination, but an X.25 packet is first stored in the switch and then the entire packet is routed to its destination. Because of the considerable delay that a packet can undergo in a complex X.25 network, a variation of the protocol – dubbed Fast Packet Switching (FPS) – was introduced in the late 1980s. The computer in the node first interprets the destination address without storing the full packet and, as soon as it understands it, the packet is immediately routed to its destination. 

It is nice to think of the intersection of two movements: “data from computers become analogue and are carried by the telephone network” and “speech signals become digital data and are processed by computers”, but this would be an ideological reading. ISDN was a project created by the telcos to extend the reach of digital technologies from the core to the access network, the rather primitive – I would say naïve – service idea driving it being the provision of 2 telephone channels per subscriber. The hope was to be able to optimise the design and management of the network, not to enable a better way to carry computer data at a higher bitrate. Speech digitisation did make the speech signal processable by computers, but in practice the devices that handled digitised speech could hardly be described as computers, as they were devices that managed bits in a very efficient but non-programmable way. 

In the early 1980s there were more and more requests on the part of users to connect geographically dispersed computers through any type of network. This demand prompted the launch of an ambitious project called Open System Interconnection (OSI). The goal was to develop a set of standards that would enable a computer of any make to communicate with another computer of any make across any type of network. The project was started in Technical Committee 97 (TC 97) “Data Processing” of the International Organisation for Standardisation (ISO) and, for obvious reasons, was jointly executed with ITU-T, probably the first example of a large-scale project executed jointly by two distinct Standard Developing Organisations (SDO). 

For modelling purposes, the project broke down the communication functions of a communication device talking to another communication device into a hierarchical set of layers. This led to the definition of a Reference Model consisting of seven “layers”, a major conceptual achievement of the project. Each layer performs the functions required to communicate with the corresponding layer of the other system (peer-to-peer communication), as if the other layers were not involved. Each layer relies on the layer hierarchically below to have the relevant lower-layer functions performed and it provides “services” to the next higher layer. The architecture is so defined that changes in one layer should not require changes in the other layers. 

The seven OSI layers with the corresponding functions are: 

Name Function
Physical Transmission of unstructured bit streams over the physical link
Data link Reliable transfer of data across the physical link
Network Data transfer independent from the data transmission and switching technologies used to connect systems
Transport Reliable and transparent transfer of data between end points
Session Control structure for communication between applications
Presentation Data transformations appropriate to provide standardised application interface 
Application Services to the users of the OSI environment

In the mid 1980s the telco industry felt it was ready for the big plunge in the broadband network reaching individual subscribers everybody had dreamed of for decades. The CCITT started a new project, quite independently of the OSI project as the main sponsors of this project were the “transmission and switching” parts of the telcos. The first idea was to scale up the old telecom network in bitrate. This, however, was soon abandoned, for two main reasons: the first was the expected traffic increase of packet-based computer data (the main reason for the American telcos to buy into the project) and the second was the idea that such a network could only be justified by the provision of digital video services such as videoconference and television (the main motivation for the European telcos). 

Thus both applications envisaged were inherently at variable bitrate; the video one because the amount of information generated by a video source depends heavily on its “activity”. Asynchronous Transfer Mode (ATM) was the name given to the technology, that bore a clear derivation from FPS. The basic bitrate was 155 Mbit/s, an attempt at overcoming the differences in transmission bitrates from the two hierarchies spawned by the infamous 25-year old split. The basic cell length was 48 bytes, an attempt at reconciling the two main reasons for having the project: 64 bytes for computer data and 32 bytes for real time transmission.

A horse designed by a committee?

 


A Personal Faultline

During my previous incarnation as a researcher in the video coding field, I made more than one attempt at unification. But do not expect lofty thoughts of global convergence of businesses. At that time my intention was just to achieve common coding architectures that could suit the needs of different industries, without considering the ultimate fate of the individual converged industries. What mattered to me was to enable a “sharing” of development costs for the integrated circuits that were required to transform digital technologies from a promise of a bright but distant future to an even brighter reality – but tomorrow. It is fair to say, though, that in these attempts I was biased by my past experience dealing with devices capable of performing very complex operations on very high-bitrate signals, the reluctance of telcos to make investments in the terminal device area and the readiness of the CE industry to develop products without consideration of standards – provided a market existed. 

I gradually came to the conclusion that preaching the idea at conferences was not enough and that the only way to achieve my goals was by actually teaming up with other industries. The opportunity to put my ideas in practice was offered by the European R&D program bearing the name of Research & development on Advanced Communication for Europe (RACE) that the CEC had launched in 1984, after the successful take-off of the European Strategic Program of Research and development in Information Technology (ESPRIT) program one year before. 

The Integrated Video Codec (IVICO) project, led by CSELT, was joined by telecommunication operators, broadcasting companies, and manufacturers of integrated circuit and terminal equipment. The project proposal declared the goal of defining, as a first step, a minimum number of common integrated circuits (at that time it was too early to think of a single chip doing everything) such as motion estimation and compensation, DCT, memory control, etc., with the intention of using them for a wide spectrum of applications. The project had a planned duration of one year after which it was expected to be funded for a full five-year period. 

For several reasons, however, the project was discontinued after the first year “pilot” phase. One reason was the hostility from certain European quarters that were concerned by the prospect of being able to use integrated circuits for digital television in a few years – one of the not so rare cases where it does not pay to deliver. This possibility clashed with the official policy of the CEC, prompted by some European governments and some of the major European manufacturers of CE equipment, that promoted standard and high definition television in analogue form under an improved analogue TV version called MUltiplexed Analogue Components (MAC). The application of digital technologies to television would only happen – so ran the policy – “in the first decade of the third millennium”. 

The demonstrated impossibility of executing a project to develop a microelectronic technology for AV coding, for use by the European industry at large, forced me to rethink the strategy. If the European industrial context was not open to sharing a vital technology, then operating at the worldwide level would shield me from influences and pressures of a non-technical nature from my backyard. For somebody who wanted to see things happening for real, this was a significant scaling down of the original ambitions, because it was not conceivable to achieve an international development of a microelectronic technology. On the other hand, this diminution was compensated by the prospect of achieving a truly global solution, but then only of specification and not of technology. 

At that time (mid 1980s) it was not obvious which body should take care of the definition of the common core because “media-related” standardisation was scattered across the three main international bodies and their subdivisions:

  • CCITT (now ITU-T) handled Speech in SG XV WP 1 and Video in SG XV WP 2;
  • CCIR (now ITU-R) handled Audio in SG 10 and video in SG 11;
  • IEC handled Audio Recording in SC 60 A and Video Recording in SC 60 B; Audio-visual equipment in TC 84 and Receivers in SC 12A and SC 12 G;
  • ISO handled Photography in TC 42, Cinematography in TC 36 and Character sets in TC94/SC2.

Chance (or Providence) offered the opportunity to test my idea. During the Globecom conference in Houston, TX in December 1986, where I had a paper on IVICO, I met Hiroshi Yasuda, an alumnus of the University of Tokyo where he was a Ph.D. student like myself during the same years 1968-70. At that time he was well known for his excellent reading of karuta (a word derived from carta, meaning trump, that the Portuguese had brought when they first reached Japan in 1543, and that Japanese still use to read Hyakunin isshuu poems at year end’s parties). After his Ph.D., Hiroshi had become a manager at NTT Communication Laboratories where he was in charge of video terminals. He invited me to come and see the Joint ISO-CCITT Photographic Coding Experts Group (JPEG) activity carried out by a group inside a Working Group (WG) of which he was the Convenor. 

Hiroshi’s WG was formally ISO TC 97/SC 2/WG 8 “Coding of Audio and Picture Information”. SC 2 was a Subcommittee (SC) of TC 97 “Data processing”, the same TC where another Subcommittee (SC 16) was developing the Open System Interconnection (OSI) standard. TC 97 would become, one year later, the joint ISO/IEC Technical Committee JTC 1 “Information Technology”, by incorporating the microprocessor standardisation and other IT activities of the IEC. SC 2’s charter was the development of standards for “character sets”, i.e. the code assignment to characters for use by computers. WG 8 was a new working group established to satisfy the standardisation needs created by the plans of several PTT administrations and companies to introduce pictorial information in various teletext and videotex systems already operational at that time (e.g. the famous Minitel deployed in France). These systems already utilised ISO standards for characters, and audio and pictures were considered as their natural evolution. JPEG was a subgroup of WG 8 tasked with the development of a standard for coded representation of photographic images jointly with CCITT Study Group (SG) VIII “Telematic Services”.

My first attendance at JPEG was at the March 1987 meeting in Darmstadt and I was favourably impressed by the heterogeneous nature of the group. Unlike the various groups of the Conférence Européenne des Postes et Télécommunications (CEPT) and of the European Telecommunication Standards Institute (ETSI) in which I had operated since the late 1970s, JPEG was populated by representatives of a wide range of companies such as telecommunication operators (British Telecom, Deutsche Telekom, KDD, NTT), broadcasting companies (CCETT, IBA), computer manufacturers (IBM, Digital Equipment), terminal equipment manufacturers (NEC, Mitsubishi), integrated circuits (Zoran), etc. By the time of the Copenhagen meeting in January 1988, I had convinced Hiroshi to establish a parallel group to JPEG, called Moving Picture Coding Experts Group (MPEG) with the mandate to develop standards for coded representation of moving pictures. The first project concerned video coding at a bitrate of about 1.5 Mbit/s for storage and retrieval applications “on digital storage media”.

At the same meeting Greg Wallace, then with Digital Equipment Corporation, was appointed as JPEG chairman and another group, called Joint ISO-CCITT Binary Image Coding Experts Group (JBIG), for coding of bilevel pictures such as facsimile, was also established. Yasuhiro Yamazaki of KDD, another alumnus of the University of Tokyo in the same years 1968-70, was appointed as its chairman. 

The reader may think that the fact that three alumni of Tokyo University (Todai, as it is called in Japan) were occupying these positions in an international organisation is a proof that the Todai Mafia was at work. I can assure the reader that this was not the case. It was just one example of how a sometimes-benign and sometimes-malign fate drives the lives of humans.


The 1st MPEG Project

The target of the first MPEG work item was of interest to many: the CE industry because it could create a new product riding the success of CD Audio extending it to video, the IT industry because interactivity with local pictures enabled by the growing computing power of PCs was a great addition to its ever-broadening application scope and the telco industry because of the possibility to promote the development much-needed integrated circuits for the H.261 real-time audio-visual communication that they, as explained before, were unable to develop by themselves. 

This is a possibly too sweetened a representation of industry feelings at that time because its industry had radically different ways of operation. In the Consumer Electronics world when a new product was devised, each company, possibly in combination with some trusted ally, developed the necessary technology (and file the enabling patents) and put their – incompatible – version of the product on the market. As other competitors put their versions of the product on the market at about the same time, the different versions competeed until the time the market would crown one as the winner. At that time the company or the consortium with the winning product would register some key enabling technology of the product with a standards body and would start licensing the technology to all companies, competitors included. This had happened for the Compact Cassette (CC) when the winner was Philips against Bosch; for the CD, when the winners were Philips and Sony against RCA; and for the VHS Video Cassette Recorder (VCR) when the winner was JVC against Sony. 

The project proposed by MPEG implied a way of operation that was clearly going to upset the established modus operandi of the CE world. Participants knew that, by accepting the rules of international standardisation, they would be deprived of the rightful time-honoured “war booty”, i.e. the exclusive control of the patents needed to build the product that also largely controlled its evolution, in case they were the winners. The advantage for them was the costly format wars could be avoided.

Another industry had mixed feelings: broadcasting. Even though digital television was a strategically important goal for them, in the second half of the 1980s the bitrate of 1.5 Mbit/s was considered way too low to provide pictures that broadcasters would even remotely consider as acceptable. On the other hand, they clearly understood that the technology used by MPEG could be used for entry-level network-based services and could later be extended to higher bitrates, that were expected to provide a quality of interest to them. A glimpse of their attitude can be seen in the letter that Mr. Richard Kirby, the Director of CCIR at that time, sent to the relevant CCIR SG Chairmen upon receiving news of the establishment of MPEG. The letter requested the Chairmen to study the impact that this unknown group could have on future CCIR activities in the area. 

At my instigation, between January 1988 and the first MPEG meeting in May, a group of European companies had gathered with the intention of proposing a project to the ESPRIT program. A consortium was eventually established and a proposal put together. Called COding of Moving Images for Storage (COMIS), it had dual purposes: to contribute to the successful development of the new standard by pooling and coordinating European partners’ resources, and to give European industry a time lead in exploiting that standard. At the instigation of Hiroshi Yasuda, a project with similar goals was being built in Japan with the name of Digital Audio and Picture Architecture (DAPA). Some time later, a European project funded by the newly established Eurescom Institute (an organisation established by European telcos) and called Interactive Multimedia Services at 1 Mbit/s (IMS-1) was also launched. 

Therefore, by the time the first meeting of the MPEG group took place in Ottawa, ON in May 1988, the momentum was already building and indeed 29 experts attended that meeting, although some of them were just curious visitors from the JPEG meeting next door. In Ottawa the mandate of the group was established. Drafting this was an exercise in diplomacy. There were already other groups dealing with video coding in ITU-T, ITU-R and CMTT, so the mandate was explicitly confined to Storage and Retrieval on Digital Storage Media (DSM). With this came the definition of the 3 initial planned phases of work:

Phase 1 Coding of moving pictures for DSM’s having a throughput of 1-1.5 Mbit/s
Phase 2 Coding of moving pictures for DSM’s having a throughput of 1.5-10 Mbit/s
Phase 3 Coding of moving pictures for DSM’s having a throughput of 10-60 Mbit/s (to be defined)

People in the business had no doubt about our plans. We intended to start working on low-definition pictures, for which technology was ready to implement and a market was expected to exist because of a great carrier – the CD – existed and because of plans of the CE industry and, partly, the telco industry. The next step would then be to move to standard-definition pictures for which a market did exist because industry was ready to accept digital television as plans for it had been ongoing for years. Eventually we would move to HDTV. These plans were sharply in contrast with those prevailing, especially in European broadcasting circles, where the idea was to start from HDTV and define a top-down hierarchy of compatible coding schemes – technically a good plan, but one that would take years to implement, if ever.

One meeting in Turin and one in London in September followed the Ottawa meeting. So, with the video coding work in MPEG on good foundations, I could pursue another favourite theme of mine. A body dealing with moving pictures with a wide participation of industry was good, but fell short of achieving what I considered a goal of practical value, because audio-only applications are plentiful, but appealing mass-market video-only application are harder to find. The importance of this theme was magnified by my experience of the ISDN videophone project of the ITU. In spite of this project being for an AV application par excellence, the video coding standard (H.261), an outstanding piece of work and the multiplexing standard (H.221), a technically less than excellent piece of work – but never mind – had been developed, but the audio coding part had been left unsettled. This happened because CCITT SG XV had tasked the Video Coding Experts to develop the videophone project, but the Audio Coding experts operated in SG XVIII, and the videotelephone team did not dare to make any decision in a field they had no authority on. 

This organisational structure of the ITU-T, and a similar one in ITU-R and IEC, was a reflection of the organisation of the R&D establishments, and hence of the business, of that time: research groups in audio and video were located in different places of the organisation because of their different background, target products and funding channels. This was also a reflection of the services that had started with audio and then with audio-video, but where video had the lion’s share. My personal experience of television – but I may be biased in my judgment – is that the video signal is always there, but the audio signal is there only if everything goes smooth. This is not because the audio experts have done a lousy job but because the integration of audio and video has never been given the right priority – in reasearch, standardisation, product development and operation.

For a manufacturer of videophone equipment, the easiest thing to do was to use one B channel for compressed video and one B channel with PCM audio, never mind the not-so-subtle irony that one channel carried a bitstream that was the result of a compression of more than 3 orders of magnitude – from 216 Mbit/s down to 64 kbit/s – while the other carried a bitstream in the form prescribed by a 30-year old technology without any compression at all!

So, besides video, the audio component was also needed and an action was required lest MPEG end up like videoconference, with an excellent video compression standard but no audio (music) or with a quality non comparable with the video quality or unjustified different compression rates for the two. The other concern was that integrating the audio component in a system that had not been designed for that could lead to some technical oversights that could only be solved later with some abominable hacks. Hence the idea of a “Systems” activity, conceptually similar to the function executed by H.221 for ISDN videophone, but with a better performance because it was more technically forward looking. The goal of the “Systems” activity was to develop the specification of the complete infrastructure, including multiplexing and synchronisation of audio and video so that building the complete AV solution became possible. 

After the promotional efforts made in the first months of 1988 to make the industry aware of the video coding work, I undertook a similar effort to inform the industries that MPEG was going to provide a complete audio-visual solution. In this effort I contacted Prof. Hans-Georg Mussmann, director of the Information Processing Institute at the Technical University of Hannover. Hans was well known to me because he had been part of the Steering Committee of the “Workshop on 64 kbit/s coding of moving video“, an initiative that I had started in 1988 to promote the progress of low bitrate video coding research and he had actually hosted the first two workshops. Because of his institute’s and personal standing, Hans was playing a major role in the Eureka project 147 Digital Audio Broadcasting (DAB).

The last meeting of 1988 was held at Hannover. The first two days (29 and 30 November) were dedicated to video matters and held at the old Telefunken labs, those that had developed the PAL system. Part of the meeting was devoted to viewing and selecting video test sequences to be used for simulation work and quality tests. The CCIR library of video sequences had been kindly made available through the good offices of Ken Davies, then with the Canadian Broadcasting Corporation (CBC), an acquaintance from the HDTV workshop. Two of the video sequences – “Table Tennis” and “Flower Garden” – selected on that occasion would be used and watched by thousands of people engaged in video coding research both inside and outside of MPEG. Another output of that meeting was the realisation that the MPEG standard, to be fully exploitable for interactive applications on CD-Read Only Memory (CD-ROM), should also be capable of integrating “multimedia” components. Therefore I undertook to see how this request could be fulfilled. 

The last two days (1 and 2 December) saw the kickoff of the audio work with the participation of some 30 experts at Hans’s Institute. Gathering so many audio coding experts had been quite an achievement because, unlike video and speech coding for which there were well established communities developing technologies with a long tradition in standardisation – myself being one element of it – audio coding was a field where the number of researchers was more limited and scattered in a reduced number of places like the research establishment of ATT, CCETT, IRT, Matsushita, Philips, Sony, Thomson and a few others. The Hannover meeting gave attending researchers the opportunity to listen, in some cases for the first time, to the audio coding results of their peers. So the first MPEG subgroup – Audio – was born and Prof. Mussmann was appointed as its chairman. The meeting also produced a document, intended for wide external distribution, which invited interested parties to pre-register their intention to submit proposals for video and audio coding algorithms when MPEG would issue a Call for Proposals (CfP). 

Bellcore, a research organisation spun off from the Bell Labs after the break-up of ATT, hosted the February 1989 meeting at their facilities in Livingston, NJ. The main task of the meeting was to develop the first version of the so-called Proposal Package Description (PPD), i.e. a document describing all the elements that proposers of algorithms had to submit in order to have their proposals considered. The document also contained the first ideas concerning the testing of proposals, both subjective and objective. 

That meeting was also memorable for the attendance of Mr. Roland Zavada of Kodak. Rollie, the chairman of a high-level ISO group coordinating image-related matters, had come to inspect this unheard-of group of experts dealing with Moving Pictures – which he had clearly taken to mean Motion Pictures – with a membership growing at every meeting like mushrooms.

Livingston was followed by Rennes in May and Stockholm in July 1989. The latter meeting produced a new version of the PPD where the video part was final and incorporated in the CfP. This contained operational data for carrying out subjective tests but also data to assess VLSI implementability and to weigh the importance of different features. Similar data were also beginning to populate the part concerning the audio tests. For systems aspects the document was still at the rather preliminary level of requirements. 

At the Stockjolm meeting the second MPEG subgroup – Video – was established and Didier Le Gall, then with Bellcore, was appointed as its chairman. This particular subgroup was established as a formalisation of the most prominent of the ad hoc groups that had already been working, meeting and reporting in the area of Video, Tests, Systems, VLSI implementation complexity and Digital Storage Media (DSM).


MPEG-1 Development – Video

The Kurihama meeting in October 1989 was a watershed in many senses. Fifteen video coding proposals were received, including one from the COMIS project. They contained D1 tapes with sequences encoded at 900 kbit/s, the description of the algorithm used, an assessment of complexity and other data. The site had been selected because JVC had been so kind to offer their outstanding facilities to perform the subjective tests with MPEG experts acting as testing subjects. At the end of the meeting a pretty rough idea of the features of the algorithm could be obtained and plans were made to continue work “by correspondence”, as this kind of work  was called in those pre-internet days. 

About 100 delegates attended the Kurihama meeting. With the group reaching such a size, it became necessary to put in place a formal structure. New subgroups were established and chairmen appointed: Tsuneyoshi Hidaka (JVC), who had organised the subjective tests, led the Test group, Allen Simon (Intel) led the Systems group, Colin Smith (Inmos, later acquired by ST Microelectronics) led the VLSI group and Takuyo Kogure (Matsushita) led the Digital Storage Media (DSM) group. These were in addition to the already established Audio and Video groups chaired by Hans-Georg Mussman (University of Hannover) and Didier Le Gall (Bellcore), respectively. 

In this way the main pillars of the MPEG organisation were established: the Video group and the Audio group in charge of developing the specific compression algorithms starting from the most promising elements contained in the submissions, the Systems group in charge of developing the infrastructure that held the compressed audio and video information together and made it usable by applications, the Test group assessing video quality (the Audio group took care directly of organising their own tests), the VLSI group assessing the implementation complexity of compression algorithms and the DSM group studying the (at that time only) application environment (stirage) of MPEG standards. In its 25 years, the internal organisation of MPEG has undergone gradual changes, and there has been quite a turnover of chairs. 

The different subgroups had gradually become the place where heated technical discussions had become the norm, while the MPEG plenary meeting had become a place where the entire work done by the groups was reviewed for the benefits of those who had not had the opportunity to attend other groups’ meetings, but still wanted to be informed possibly even retain the ability to have a say in other groups’ conclusions, to resolve unsettled matters and give a formal seal of approval to all decisions. There were, however, other matters of general interest that also required discussion, but it was no longer practical to have such discussions in the plenary. As a way out, I started convening representatives of the different national delegations in separate meetings at night. This was the beginning of the Heads of Delegation (HoD) group. This name lasted for a quarter of a century until one day someone in ISO discovered that there are no delegations in working groups. From that moment the HoDs were called Convenor Advisors and everything went on as before.

It was during an HoD meeting that the structure of the MPEG standard was discussed. One possible approach considered was to have a single standard containing everything, the other to split the standard in parts. The former was attractive but it would lead to a standard of monumental proportions. Eventually the approach was adopted, as John Morris of Philips Research, the UK HoD at that time, put it, of making MPEG “one and trine”, i.e. a standard in three parts: Systems, Video and Audio. The Systems part would deal with the infrastructure holding together the compressed audio and video (thereby making sure that later implementors would not find holes in the standards), the Video part would deal with video compression, and the Audio part would deal with audio compression. 

From a non-technical viewpoint, but quite important for a fraction of the participants, the meeting was also remarkable because during the lunch break on the first day, news appeared on television that a major earthquake had hit San Francisco and most of the participants from the Bay Area had to leave in a haste. It later became known that fortunately no one connected to a Kurihama meeting participant had been seriously affected, but the work on VLSI implementation complexity clearly suffered, as many of the participants in that activity had arrived from the Bay area. 

At Kurihama there was also a meeting of SC 2/WG 8. A noteworthy event was the establishment of the Multimedia and Hypermedia Experts Group (MHEG), a parallel group to JBIG, JPEG and MPEG. This was the outcome of my undertaking decided at the Hannover meeting one year before to look into the problem of a general multimedia standard. After that meeting I had contacted Francis Kretz of CCETT who had been spearheading an activity in France on the subject and invited him to come to Kurihama. At that meeting Francis was appointed as chair of MHEG, another group parallel to MPEG. 

The Kurihama tests had given a clear indication that the best performing and still most promising video coding algorithm was the one that encoded pictures predictively starting from a motion-compensated previous picture using Discrete Cosine Transform (DCT), à la CCITT H.261. This meant that the new standard could easily support one of the requirements of “compatibility with H.261”, a request made by the MPEG telco members. On the other hand, the design of the standard could enjoy more flexibility in coding tools because the target application was “storage and retrieval on DSM” and not real-time communication where information transmission at minimum delay was at a premium. This is why MPEG-1 Video (as all subsequent video standards produced by MPEG so far) has interpolation between coded pictures as a tool that an encoder can use.

Philips hosted the following meeting in Eindhoven at the end of January 1990. The most important result was the drafting of the Reference Model (RM) version zero (RM0), i.e. a general description of the algorithm to be used as a test bed to carry out experiments. The Okubo group had also used a similar document with the same name for the development of H.261, but MPEG formalised the process of Core Experiments (CE) as a practical means to improve the RM in a collaborative fashion. A CE was defined as a particular instance of the RM at one stage that allowed the execution of optimisation tests performed on one feature while keeping fixed all other options in the RM. At least two companies had to perform the CE and provide comparably positive results for the CE to qualify for promotion into the standard. This method of developing standards based on CE has been a constant in MPEG since then. 

RM0 was largely based on H.261. This, and a similar decision made in 1996 to base MPEG-4 Video on ITU-T H.263, is brought by some MPEG critics as a claim that MPEG does not innovate. Those making this remark are actually providing an answer to their own remarks because innovation is not an abstract good in itself. The timely provision of good solutions that enable interoperable – as opposed to proprietary – products and services is the value added that MPEG offers to its constituency – companies large and small. People would be right to see symptoms of misplaced self-assertion, if MPEG were to choose something different from what is known to do a good job just for the sake of it, but that is not the MPEG way. On the other hand I do claim that MPEG does a lot innovation, but this is at the level of transforming research results into practically implementable audio-visual communication standards, as the thousands past and present researchers working in MPEG in the last quarter-of-a-century can testify. 

The Eindhoven meeting did not adopt the Common Intermediate Format (CIF) and in its stead it introduced Source Input Format (SIF). Unlike CIF, SIF is not yet another video format. It is two formats in one, but not “new”, because both formats are obtained by subsampling two existing formats: 625 lines @25 frames/s and 525 lines @29.97 frames/s. The former has 288×352 pixels @25 Hz and the latter 240×352 pixels @29.97 Hz. CE results could be shown and results considered irrespective of whether one or the other format was used. 

At the end of the second and last day of the meeting, a hurricane of unseen proportions swept all of the Netherlands. Trees were uprooted, roads blocked and trains stopped midway between stations: a proof that the Forces of Nature were showing again their concern for the work of MPEG.

The development of MPEG-1 Video took two full years in total starting from the Kurihama tests and involved the participation of hundreds of experts, some attending the meetings and many more working in laboratories and providing results to be considered at the next meeting. As a result, an incredible wealth of technical inputs was provided that allowed the development of an optimised algorithm. 

A major role in this effort was played by the VLSI group, chaired by Geoffrey Morrison of BT Labs, who had replaced Colin Smith who had founded the group. The VLSI group provided the neutral place where the impact on the complexity of different proposals – both video and audio – was assessed. MPEG-1 is a standard optimised for the VLSI technology of those years, because real-time video and audio decoding – never mention encoding – was only possible with integrated circuits. Even though the attention of that time concentrated around VLSI implementation complexity, the subgroup already considered software implementation complexity as part of their mandate. 

At the following meeting in Tampa, FL, hosted by IBM, the name Reference Model was abandoned in favour of Simulation Model (SM). This tradition of sequentially changing the name of the Model at each new standard has continued: the names Test Model (TM), Verification Model (VM), eXperimentation Model (XM) and sYstem Model (YM) have been used for each of the MPEG-2, MPEG-4, MPEG-7 and MPEG-21 standards, respectively. 

Still in the area of software, an innovative proposal was made at the Santa Clara, CA meeting in September 1990 by Milton Anderson, then with Bellcore. The proposal amounted to using a slightly modified version of the C programming language to describe the more algorithmic parts of the standard. The proposal was accepted, and this marked the first time that a (pseudo-) computer programming language had been used in a standard to complement the text. The practice has now spread to virtually all environments doing work in this and similar areas and has actually beed extended to describe the entire standard in software as we will see soon.

 


MPEG-1 Development – Audio

Work in the Audio group was also progressing. Many participants were people interested in audio-only applications, some of them working in the Eureka 147 DAB project. For the majority of them it was important to develop a standard that would provide compressed audio with CD quality at the bitrate of 256 kbit/s because that bitrate was suitable for digital audio broadcasting. This target affected the video work and indeed video simulation results had to be shown at 1.15 Mbit/s because this was the remaining bitrate from the total payload of about 1.4 Mbit/s of the CD. 

The approach of the Audio group in the development of the standard was somewhat different than the one followed by the Video group. Instead of producing a general CfP, the Audio group first worked to cluster the proposals that the different companies were considering. Of course this did not mean that the CfP was to be open to anybody else. 

These were the four clusters: 

  1. Transform Coding with overlapping blocks 
  2. Transform Coding with non-overlapping blocks 
  3. Subband Coding with less than or equal to 8 subbands 
  4. Subband Coding with more than 8 subbands. 

The clusters were encouraged to provide a single proposal and this indeed happened. Swedish Radio (SR) was kind enough to perform the subjective test of the four clustered proposals using “golden ears”, i.e. specialists capable of detecting the slightest imperfection in a sound. The results of the subjective tests were shown in Stockholm in June 1990 (this was formally a part of the Porto meeting in July, where the rest of MPEG was meeting). The reason for having this meeting in Stockholm was to be able to listen to the submissions in the same setup the golden ears had used for the tests. 

The first clustered proposal performed the best in terms of subjective quality. However, implementation penalty was not unexpectedly higher than the fourth clustered proposal that scored less but with a lower implementation complexity. This was an undoubted challenge that the audio chairman resolved with time and patience. This was also the last achievement of Hans Mussmann who left MPEG at the Paris meeting in May 1991. His place was taken over by Prof. Peter Noll of the University of Berlin. 

The result of the work was an audio coding standard that, unlike the corresponding video standard, was not monolithic because there were three different “flavours”: the first – called Layer I – was based on subband coding and had low complexity but the lowest performance, the second – called Layer II – was again based on subband coding with average complexity and good performance and the third – called Layer III – was based on transform coding and provided the best performance, but at a considerably higher implementation cost. So much so that, at that time, many considered Layer III as impractical for a mass-market product. Therefore there could be 3 different conforming implementations of the MPEG-1 Audio standard, one for each layer. The condition was imposed, however, that a standard MPEG-1 Audio decoder of a “higher” layer had to be able to decode all the “lower” layers. 

The verification tests carried out before releasing the MPEG-1 Audio standard showed that subjective transparency, defined as a rating of the encoded stereo signal greater than 4.6 in the 5-point CCIR quality scale, as assessed by “golden ears”, was achieved at 384 kbit/s for Layer I, 256 kbit/s for Layer II and 192 kbit/s for Layer III. The promise to achieve “CD quality” at 256 kbit/s with compressed audio had been met and surpassed. Today, with continuous improvements in encoding (which is not part of the standard) even better results can be achieved.


MPEG-1 Development – Systems

The development of the Systems part of the standard was done using yet another methodology. The Systems group, a most diversified collection of engineers from multiple industries, after determining the requirements the Systems layer had to satisfy, decided that they did not need a CfP, because the requirements were so specific that they felt they could simply design the standard by themselves in a collaborative fashion. The initial impetus was provided by Juan Piñeda, then with Apple Computer, at the Porto meeting in July 1990, when he proposed the first packet-based multiplexer. Eventually Sandy MacInnis of IBM became the chairman of that group after Allen Simon’s resignation. 

One of the issues the group had to deal with was “byte alignment”, a typical requirement from the computer world that the telco world, because of its “serial” approach to bitstreams, did not value. This can be seen, e.g., from the not byte-aligned H.261 bitstream. Byte alignment was eventually supported in MPEG-1 Systems because the system decoding of a byte-aligned 150 kbyte/s stream was already feasible using the CPUs of that time. In the process, the MPEG-1 Video syntax, too, was made byte aligned. 

Another issue was the choice between constant and variable packet size. One could have thought that, because the main target of MPEG-1 was Digital Storage Media where disk formats have a fixed block size, a fixed packet length should have been selected. Eventually, however, this did not happen, a consequence of the fact that the physical format of the disc is, in OSI terminology, a layer 2 issue, while packet multiplexing is a higher-layer issue that did not necessarily have to relate with the former. In conclusion MPEG-1 Systems turned out to be a very robust and flexible specification capable of supporting the transfer of tightly synchronised video and audio streams across an arbitrary error-free delivery system. 

The second MPEG London meeting in November 1992 put the seal on MPEG-1, with the approval of the first three parts of the standard: Systems, Video and Audio. Since then the standard has been very stable: in spite of its complexity, very few corrigenda were published after that date: two for Systems, three for Video and one for Audio. The MPEG-1 work did not stop at that meeting, though, because work on conformance and reference software continued well into 1994. 


Reference Software

One morning of July 1990, Arian Koster of KPN Research called me to make a suggestion: “What if MPEG developed a software implementation of the MPEG-1 standard?”. My immediate reaction was to ask what MPEG would gain from this. He said that various companies had already developed their own software implementations of the Video Simulation Model, because that was a necessary step for anybody wanting to take part in video Core Experiments and that more software would also be developed for Audio and Systems. If everybody would contribute just a small portion of their code, MPEG could have a complete software implementation of the standard, everybody in MPEG would be able to use it and MPEG would get the benefit of many independent usages of the software. Frankly, I did not see at that time for what reasons anybody should give away part of their code, but it has always been my policy not to disallow something other people believed in just because I did not understand it. I never had to regret this policy and certainly not in this case, which created the seeds for one of the major innovations in MPEG, as we will see in a moment. 

Slowly, the idea made impacts and, already at the first Santa Clara, CA meeting in September 1990, the Audio group, made of some of the most contentious people in MPEG, but also of the most open to novelties and well structured in their implementations, had already proposed an ad hoc group on “Software Simulation – Audio”. With the contribution of many, in 1994 MPEG could release part 5 of MPEG-1: “Software simulation” (part 4 had already been assigned to “Conformance Testing” of which we will say more later). While the first four parts of MPEG-1 are normative, in the sense that if you want to make conforming products or generate bitstreams that are decodable (i.e. understandable and actionable by an independently implemented decoder) they must satisfy the requirements of the relevant parts of the standard, part 5 is a Technical Report (TR), i.e. a document that is produced for the general benefit of users of the standard, but has no normative value. It is, in ISO language, “informative”.

This is as much as MPEG could progress in those early times on “software implementation of a standard”. But this was just the beginning of a much bigger thing as we will see in later pages. 

 


Conformance

MPEG-1 is a great standard, but there is a potential problem in its practical adoption. Imagine I am a manufacturer and I choose to be in the business of making MPEG-1 encoders and decoders. I believe I have faithfully implemented all normative clauses in ISO/IEC 11172-1, -2 and -3 and I have checked that my decoder correctly decodes content generated with my encoder. Now, a customer of mine buys my MPEG-1 decoder and starts using it to decode content produced by an encoder manufactured by a competitor. Unexpectedly he encounters problems. My customer talks to my competitor and he is shown that content generated by his encoder is successfully decoded by his decoder. Who is right? Who is to blame? My competitor or myself or both? 

This problem is not new to ISO and IEC. The “Procedures for the Technical Work” prescribe that a standard must contain clauses that enable users of the standard to assess whether an implementation conforms to the standard. One could even say that a standard is useless if there are no procedures to assess conformity to it. It would be like issuing a law without having courts one can make recourse to assess “conformity” of a specific action with the law. 

The conformance problem used to be less well known to ITU because, when telcos were a regulated business providing a “public service”, they performed the “conformity” tests themselves to make sure that terminals from different manufacturers would interoperate correctly on their networks, without exposing subscribers to the kind of incompatibilities I have just described. They used to put a “seal of approval” on conforming terminals. Telcos used to do this because it was part of their public service licence, but also because keeping subscribers happy and not letting them suffer from incompatibilities was “good for business”, an old wisdom that too many proponents of new business models seem to disregard or forget.

When the telecommunication business became deregulated, independent Accredited Testing Laboratories (ATL) were set up. For a fee ATLs issued certifications to products that had successfully passed the conformance test. But even ATLs are a byproduct of the traditional “public service” attitude of the telcos. 

In the IT and CE domains, which MPEG-1 ends up also sort of belonging to, the attitude has always been more “relaxed”. If you buy a mouse and you discover that it does not work on your computer, what do you do? If you are lucky the shop you bought it from will refund your money, if not you are stuck with a lemon. The same if you buy a component for your stereo. Sure, the consumer is protected, because if he is dissatisfied he can always take legal proceedings… The attitude of the IT and CE industry has always been one of either not claiming anything or, at most, of making “self-certification” of conformity. 

That, however, is something that may work well when there is a market with large companies, producing mass market products, possibly not terribly sophisticated and where the product itself depends on a key technology licensed by a company that puts conformity of the implementation – to be verified by the licensing company – as a condition for licensing. Licensors have an interest to make sure that licensees behave correctly because they are interested in the good name of the technology and, again, because that leads to satisfied customers and hence more revenues. This has regularly been the case of major CE products. 

Virtually none of these conditions apply to MPEG-1, and certainly not the last. There are multiple patent holders for the MPEG-1 standard, but none has the authority or interest to become the “godfather” and oversee the correct implementation of the standard. Therefore the approach adopted by MPEG has been to develop a comprehensive set of tools to help people make independent checks of an implementation for conformance. 

Part 4 of MPEG-1 “Conformance” gives guidelines on

  1. how to construct tests to verify bitstream and encoder conformance
  2. test suites of bitstream that can be used to verify decoder conformance. 

Encoder conformance can be verified by checking that a sufficient number of bitstreams generated by the encoder under test are successfully decoded by the reference software decoder. Decoder conformance can be verified by bitstream test suites or by decoding bitstreams generated with the reference software.