Archives: 2015-August-20

List of MPEG standards

Std Pt Title
1 1 Systems
2 Video
3 Audio
4 Compliance testing
5 Software simulation
2 1 Systems
2 Video
3 Audio
4 Coformance testing
5 Software simulation
6 Extensions for DSM-CC
7 Advanced Audio Coding (AAC)
8 VOID
9 Extension for real time interface for systems decoders
10 Conformance extension – DSM-CC
11 IPMP on MPEG-2 Systems
4 1 Systems
2 Visual
3 Audio
4 Conformance testing
5 Reference software
6 Delivery Multimedia Integration Framework (DMIF)
7 Optimized reference software for coding of audio-visual objects
8 Carriage of ISO/IEC 14496 contents over IP networks
9 Reference hardware description
10 Advanced Video Coding
11 Scene description and application engine
12 ISO base media file format
13 Intellectual Property Management and Protection (IPMP) extensions
14 MP4 file format
15 Carriage of NAL unit structured video in the ISOBMFF
16 Animation Framework eXtension (AFX)
17 Streaming text format
18 Font compression and streaming
19 Synthesised texture stream
20 Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
21 MPEG-J Graphics Framework eXtensions (GFX)
22 Open Font Format
23 Symbolic Music Representation
24 Audio and systems interaction
25 3D Graphics Compression Model
26 Audio conformance
27 3D Graphics conformance
28 Composite font representation
29 Web video coding
30 Timed text and other visual overlays in ISO base media file format
31 Video coding for browsers
32 Reference software and conformance for file formats
33 Internet Video Coding
7 1 Systems
2 Description definition language
3 Visual
4 Audio
5 Multimedia description schemes
6 Reference software
7 Conformance testing
8 Extraction and use of MPEG-7 descriptions
9 Profiles and levels
10 Schema definition
11 MPEG-7 profile schemas
12 Query format
13 Compact descriptors for visual search
14 Reference software, conformance and usage guidelines for CDVS
15 Compact descriptors for video analysis
21 1 Vision, Technologies and Strategy
2 Digital Item Declaration
3 Digital Item Identification
4 Intellectual Property Management and Protection Components
5 Rights Expression Language
6 Rights Data Dictionary
7 Digital Item Adaptation
8 Reference Software
9 File Format
10 Digital Item Processing
11 Evaluation Methods for Persistent Association Technologies
12 Test Bed for MPEG-21 Resource Delivery
13 VOID
14 Conformance Testing
15 Event Reporting
16 Binary Format
17 Fragment Identification of MPEG Resources
18 Digital Item Streaming
19 Media Value Chain Ontology
20 Contract Expression Language
21 Media Contract Ontology
22 User Description
A 1 Purpose for multimedia application formats
2 MPEG music player application format
3 MPEG photo player application format
4 Musical slide show application format
5 Media streaming application format
6 Professional archival application format
7 Open access application format
8 Portable video application format
9 Digital Multimedia Broadcasting application format
10 Surveillance application format
11 Stereoscopic video application format
12 Interactive music application format
13 Augmented reality application format
14 VOID
15 Multimedia Preservation Application Format
16 Publish/Subscribe Application Format
17 Multisensorial Media Application Format
18 Media Linking Application Format
19 Common Media Application Format
20 Visual Identity Application Format
21 Visual Identity Management Application Format
22 Multi-Image Application Format
B 1 Binary MPEG format for XML
2 Fragment Request Units
3 XML IPMP messages
4 Codec configuration representation
5 Bitstream Syntax Description Language (BSDL)
6 VOID
7 Common encryption format for ISO base media file format files
8 VOID
9 Common Encryption for MPEG-2 Transport Streams
10 Carriage of Timed Metadata Metrics of Media in ISO Base Media File Format
11 Green metadata
12 Sample Variants
13 Media Orchestration
14 Partial File Format
C 1 Accuracy requirements for implementation of integer-output 8×8 inverse discrete cosine transform
2 Fixed-point 8×8 inverse discrete cosine transform and discrete cosine transform
3 Representation of auxiliary video streams and supplemental information
4 Media tool library
5 Reconfigurable media coding conformance and reference software
6 Tools for reconfigurable media coding implementations
D 1 MPEG Surround
2 Spatial Audio Object Coding (SAOC)
3 Unified speech and audio coding
4 Dynamic Range Control
E 1 Architecture
2 Multimedia application programming interface (API)
3 Component model
4 Resource and quality management
5 Component download
6 Fault management
7 System integrity management
8 Reference software
V 1 Architecture
2 Control information
3 Sensory information
4 Virtual world object characteristics
5 Data formats for interaction devices
6 Common types and tools
7 Conformance and reference software
M 1 Architecture
2 MPEG extensible middleware (MXM) API
3 Conformance and reference software
4 Elementary services
5 Service aggregation
U 1 Widgets
2 Additional gestures and multimodal interaction
3 Conformance and reference software
H 1 MPEG Media Transport (MMT)
2 High Efficiency Video Coding
3 3D Audio
4 MMT Reference Software
5 HEVC Reference Software
6 3D Audio Reference Software
7 MMT Conformance Testing
8 HEVC Conformance Testing
9 3D Audio Conformance Testing
10 MPEG Media Transport Forward Error Correction (FEC) codes
11 MPEG Composition Information
12 Image file format
13 MMT Implementation guidelines
14 Conversion and coding practices for HDR/WCG video
15 Signalling, backward compatibility and display adaptation for HDR/WCG video
DASH 1 Media presentation description and segment formats
2 Conformance and reference software
3 Implementation guidelines
4 Segment encryption and authentication
5 Server and Network Assisted DASH
6 DASH with Server Push and WebSockets
7 Delivery of CMAF content with DASH
I 1 Immersive Media Architectures
2 Omnidirectional MediA Format
3 Immersive Video Coding
4 Immersive Audio Coding
5 Point Cloud Compression
6 Immersive Media Metrics
7 Immersive Media Metadata
8 Network Based Media Processing
CICP 1 Systems
2 Video
3 Audio
G 1 Transport and Storage of Genomic Information
2 Genomic Information Representation
3 API for Genomic Information Representation
4 Reference Software
5 Conformance
IoMT 1 IoMT Architecture
2 IoMT Discovery and Communication API
3 IoMT Media Data Formats and API
Expl 1 Advance signalling of MPEG containers content
2 Digital representation of neural networks
3 Future Video Coding
4 Hybrid Natural/Synthetic Scene Container
5 Network Distributed Video Coding

An Introduction

All living beings communicate in some form and the living beings that are currently on top of the ladder – we, the humans – have the most advanced native form of communication: the word. Not content with it they have invented and used a range of technologies that have made communication between ever more effective:

  • Directly by them: drawing, painting, sculpture, playing music, writing
  • Through machines: printing, photography and cinematography
  • Through immaterial means: wired and wireless communication of text, audio and video
  • By recording: audio and video.

It has taken millennia to get the first and a few centuries to get the last three. Starting just a quarter of century ago, however, the ability of humans to communicate has been greatly impacted by the combination of 3 digital technologies that have brought about the Digital Media Revolution:

  • Media – handling all information sources via the common “bit” unit;
  • Network – delivering information bits everywhere;
  • Device – processing information bits inexpensively.

The Digital Media Revolution shows no sign of abating and it is likely that we will continue riding the media bits for quite some time. Therefore I have decided to write these pages bound by the title “Riding The Media Bits” because most Digital Media Technologies have been – and more continue to be – spawned by the Moving Picture Experts Group (MPEG).

mpeg1000

My goal is to provide the knowledge necessary to understand the nature of media, how digital media came about and how they are evolving so that people can take part in the decisions of where the Digital Media Revolution is taking us to, or – as a minimum – understand the reasons for the decisions made by others.

Technologies are seen from the perspective of the author’s experience – Media, but I have also complemented this with Device and Network aspects when I found it appropriate to complement the picture.

The target reader of these pages is non-technical. The matters handled, however, typically involve sophisticated technologies and some knowledge of them will be required, if understanding is not to come out of thin air. I dare say, though, that technical readers can also benefit from being exposed to the breadth of issues treated in these pages.

In order not to scare away the readers of this first page, I guarantee that I have made all efforts to reduce the technical requirements to the minimum necessary. Non-technical readers are therefore advised to exercise a minimum of perseverance (often not very much), when they see themselves confronted with technical descriptions, if they want to reap the results promised. As a last resort, they may skip the chapter that is challenging them beyond their desire to understand.

There is one last thing I would like to state before taking the reader with me for a 30+ year ride on the media bits. You will find that personal pronouns are rigorously kept in masculine form. I know this is politically incorrect, but I do think that if a language forces people to use personal pronouns in a sentence, like English does, there should be one of two choices: either one can change the language and make the use of pronouns optional, as in Italian or Japanese, or the people who expect to see a constant use of “he or she”, “him or her”, “his or hers” etc., become less prudish. As, neither of these options is within my reach, I will do as I said. After all I would rather look like a male chauvinist and use masculine pronouns, than be a male chauvinist but use politically correct expressions.

The only promise I can make is that I will use all personal pronouns in feminine form on the next occasion (if there will ever be one :-).

This page would not be complete if I did not acknowledge my English mentor – Philip Merrill. Of his own initiative he has reviewed many of the original pages, providing countless invaluable suggestions. If the pages are more understandable – and readable – the credit goes to him. If they are not the discredit only goes to me.

The roadmap of Riding The Media Bits.


A Guided Tour

You can ride the digital media bits in many ways, even create your own roadmapp, but the table below suggests one that combines the sequence of events with a meaningful story. The suggested reading is organised in 21 chapters each subdivided in a variable number of sections. The structure is mostly sequential in time but also tries to accommodate the evolution of technology.

1 Why, How And For What An introduction, a guided tour and a table of contents
2 The Early Communication A brief review of communication in the history of mankind, the role of Public Authorities, why communication by digital means was preferable and why we need of compression of digital media
3 The Early Digital Communication How digital communication technologies were first developed and deployed, a brief history of computing, how bits were stored and transmitted and why telecom bits are somewhat different from computer bits. Finally how a fault line in my professional life led to the creation of MPEG
4 Media Get Digital A brief tale of the events that led to MPEG-1, the development of its 3 main technologies – Video, Audio and Systems, the role of reference software and conformance, a look inside MPEG-1 and what MPEG-1 has achieved.
5 Digital Media Get Better A brief history of television and why it was so important to go digital, the development of 3 main MPEG-2 technologies – Video, Audio and Systems, a look inside MPEG-2, what MPEG-2 has achieved and how a bold global initiative tried to accelerate the deployment of digital television.
6 Standards, ISO And MPEG Why standards are important, the role of patents, the MPEG way of developing standards, how an MPEG meeting unfolds and a sample of life in an international organisation like ISO
7 Works, Rights And Exploitation Why it is difficult to achieve recognition of the value of some intellectual works, how technology helps the distribution of those works, how rights are defined and how they can be protected
8 Computers And Internet The role of software and particularly operating system and how it is possible to to remove dependency of applications from it, how the current Graphical User Interface was developed, how computers achieved creation of pictures and sound, and how internet came to pervade our lives
9 Digital Media Do More Before getting into the MPEG-4 story we have to recall how media came to meet computers, the development and the inside of the many MPEG-4 components, and what MPEG-4 has achieved
10 Software And Communication How different bytes are made out of the same bits, a short story of Open Source Software and the MPEG relationships with it, how patents and standards create new forms of communication, trying to make digital media standards without patents and the myth of real-time person-to-person audio-visual communication
11 Digital Media For Machines About adding descriptions to other data, the fascinating story of other internet technologies, the development and the inside of the MPEG-7 standard, how machines have begun to talk to other machines, what MPEG-7 has achieved and how we can make machines talk to other machines
12 More About Rights and Technologies The many ways technology changes rights and their enforcements, how content protection can be opened, facing a world that MP3 has changed forever and equiring about why, if technology changes society, laws should not change
13 Frameworks For Digital Media The development of MPEG-21, looking inside it, the story of the Digital Media Project, looking inside its specification, and opening the way for a deployment
14 Putting Digital Media Together A brief description of the first batch of MPEG-A, the digital media integration standard, followed by 3 more Application Formats: Multimedia Preservation, Publish/Subscribe and Media Linking
15 More MPEG Triadic Technologies Why there was a need of more Systems, Video and Audio standards: a short overview of what they do, and a standard to describe decoders and to build repositories of media coding tools
16 Technologies For Virtual Spaces How MPEG-4 deals with 3D Graphics, how we can establish bridges between real and virtual worlds, how we can interact with digital media in a standard way and  how an application format can help kickstart Augmented Reality
17 Systems And Services How MPEG has standardised parts of the inside of devices, how I developed a business using standard technologies and how MPEG standards can be used to build a better internet of the future
18 Coping with an unreliable internet Even though very little of what is called internet guarantees anything, our society is based on it. MPEG has developed standards that decrease the impact of internet reliability on its media.
19 More System-wide Digital Media Standards Why we need MPEG-H, another integrated Systems-Video-Audio standard, how we can communicate over an unreliable internet, and looking inside MPEG-H for Systems, 2D and 3D Video, and Audio
20 Compression, the technology for the digital age Compression has been MPEG’s bread and butter and the propulsive for  digital media. But why should compression only apply to media? There are other digital sources that benefit from compression.
21 The future of media – immersion So far the digital media experience has been largely based on extensions of early media technologies. Technology  promises to provide virtual experiences that are undistinguishable from real ones.
22 Internet of Media Things Internet of Things is a catch word that describes the ability of machines (things) to communicate and process information. MPEG is developing standards for the case when things are media things.
23 Glimpses Of The Future About the – sometimes wild – ideas for future MPEG standards, the future of MPEG and the future of research
24 Acknowledgements Thanking the many people without which we would be riding different media bits
25 Support Material A detailed list of acronyms, the hall of fame of those who served or are still serving as MPEG Chairs, and the complete list of all MPEG standards (so far)

 


A Roadmap

The table below provides the full list of chapters and sections. The content of each section is briefly introduced.

Table 1 – How to navigate the Riding The Media Bits pages

1 Why, How And For What
Introduction What has motivated the writing these pages and what they hope to achieve
A guided tour A summary of each of the main areas described in these pages
A Roadmap A summary of each of the individual pages
2 The Early Communication
Communication Before Digital How the forms of communication and of the business of providing the means to communicate have evolved in analogue times
Communication And Public Authorities The role of public authorities and international organisations in communication
Digital Communication Is Good The steps that brought digital technologies within the reach of exploitation by the media industries
Compressed Digital Is Better The developments that led to the practical exploitation of digital technologies for the media
3 The Early Digital Communication
The First Digital Wailings The first sample applications of digital technologies to the media
Digital Technologies Come Of Age The first practical cases of exploitation of digital technologies for the media
Electronic Computers A succinct history of the hardware side of data processing
Carrying Bits Solving the problem of storing and transmitting bits on analogue carriers
Telecom Bits And Computer Bits Bits are bits are bits, but telecom bits are different from computer bits
A personal faultline A fault line in my professional life that led to the creation of MPEG
4 Media Get Digital
The 1st MPEG Project The events that led to the definition of the first MPEG project: MPEG-1
MPEG-1 Development-Video The development of MPEG-1: Video
MPEG-1 Development-Audio The development of MPEG-1: Audio, Systems and Reference Software
MPEG-1 Development-Systems The first time IT puts media together in a synchronised way
Reference Software Software and standards used to be in different worlds. How they first became two sides of the same coin
Conformance Why MPEG standards need conformance and how it can be assessed
Inside MPEG-1 An overview of the technical content of MPEG-1
The Achievements Of MPEG-1 How MPEG-1 has influenced and benefited the media industry
5 Digital Media Get Better
The Highs And Lows Of Television The importance of television, how it was deployed and how it (should have) developed
The digital television maze Why digital television is such a good idea and why using it is so difficult
MPEG-2 Development-Video The steps that led to the development of MPEG-2 Video
MPEG-2 Development-Audio The steps that led to the development of MPEG-2 Audio and AAC.
MPEG-2 Development-Systems The steps that led to the development of MPEG-2 Systems, DSM-CC and RTI.
Inside MPEG-2 An overview of the technical content of MPEG-2
The Impact Of MPEG-2 How MPEG-2 has influenced and benefited the media industry
Beyond MPEG-2 Digital Audio And Video Why there was a need for DAVIC, what it did and why it was wound up
6 Standards, ISO And MPEG 
The Need For Standards Standards are important but their role must be properly understood
Patents And Standards If standards require patented technology their use must obey some rules
The MPEG Way Of Standards Making The unique MPEG way to develop standards. 
An MPEG Meeting A virtual experience of how an MPEG meeting unfolds
Life In ISO A sample of life in an international organisation and how it affected the first phases of MPEG
7 Works, Rights And Exploitation 
Craft, Intellect And Art People agree to pay for the work of a blacksmith or an attorney, but consider it an option to pay for the performance of a singer or an actor
Fixating Works How technology used to help the distribution of literary and artistic works
Rights The rights to a hammer are obvious, those to a book less so, those to a bunch of bits are still waiting for a solution
Protecting Content The need to protect digital content and how it can be done
8 Computers And Internet
Computer Programming The role of software, and particularly operating systems, in IT
Operating System Abstraction Is there a way to remove dependency of applications from the operating system?
Humans Interact With Machines Brief history of how we came to the current Graphical User Interface to enable interaction with computers
Computers Create Pictures And Sound Brief history of a complex business case of IT use in the media space: humans perceive pictures and sound created not by the real world but by computers as well
Internet
The fascinating story of a technology and how it changed the media landscape
9 Digital Media Do More
Media Meet Computers And Digital Networks The story of a project integrating most of the different technologies we have talked about so far – and more
MPEG-4 Development How MPEG-4 developed to become _the_ multimedia standard
Inside MPEG-4 – Systems An overview of the MPEG-4 Systems layer
Inside MPEG-4 – Visual The story of how we tried to build videos from objects
Inside MPEG-4 – Audio MP3 suggested that everything for audio was done, but AAC shows that was not the case
Inside MPEG-4 – File Format
The first encounter of MPEG with files (as opposed to streams)
Inside MPEG-4 – Font An overview of a multimedia content type of critical importance
Inside MPEG-4 – Advanced Video Coding
MPEG continues pushing farther  the limits of video compression
The Impact Of MPEG-4 How MPEG-4 has changed the media landscape
10 Software And Communication
Bits And Bytes Bytes are made of 8 bits, but chopping a bitstream in chunks of 8 bits does not necessarily make bytes
Open Source Software Writing software may be an art and some artists have pretty special ideas about the use of the “art” they create
MPEG and Open Source Software MPEG is a group operating in an industrial environment, but the software it develops uses principles similar to those of the Open Source Software community
The Communication Workflow The role of patents and standards in the creation of new forms of communication
Type 1 Video Coding Sometimes it helps to question the foundations of the way we operate
A Fuller Form Of Communication The myth of real-time communication with pictures in addition to audio
11 Digital Media For Machines
Tagging Information A key technology to add descriptions to other data
 The World Wide Web The fascinating story of other internet technologies and how they changed our lives
MPEG-7 Development The MPEG standard to describe what a piece of content is or contains
Inside MPEG-7 An overview of the technical content of MPEG-7
Machines Begin To Understand The World Searching for information out of an image
The Impact Of MPEG-7 How MPEG-7 is beginning to change the way people access content
A World Of Peers If humans can talk to humans, why should machines not talk to machines (intelligently)?
12 More About Rights and Technologies
Technology Challenging Rights Learning from MP3: the many ways technology changes rights and their enforcements
Opening Content Protection Two relevant stories teaching that it does not help to preserve the value of content by protecting it if people cannot access it
The World After MP3 MP3 has changed the media world forever. People must stop playing the game their traditions has accustomed them to play.
Technology, Society and Law If Digital Media Technologies have wrought a revolution in society why should the laws governing it non change? Can changes putting patches to the old be a response?
13 Frameworks For Digital Media
MPEG-21 Development MPEG-21 contains the components of a global solution
Inside MPEG-21 The technologies that let users build reasonable digital media systems
The Digital Media Project A project to right any wrongs that users of technologies may have made
Inside The Interoperable DRM Platform A walkthrough of value chains enabled by the end-to-end Interoperable DRM Platform
Inside The Other DMP Phases The DMP mission is not over
Doing Something For My Country If MPEG and DMP provide the tools for rightful use of digital media, why should my country – or all countries – not benefit from them?
14 Putting Digital Media Together
The First Application Formats Standards for multimedia formats
Multimedia Preservation Application Format How can we cater to  the long-term future of media
Publish/Subscribe Application Format The media business is about the meeting of demand and offer. A new standard capable of disrupting the status quo
Media Linking Application Format Linking the inside of a document to the inside of another document is done billions of times a day. Let’s do the same for media.
15 More MPEG Triadic Technologies
Generic MPEG Technologies Technology matures and the audio-visual system components have achieved independent lives
Generic MPEG Systems Standards Some words about a bunch of Systems standards
Generic MPEG Video Standards Some words about a bunch of Video standards
Generic MPEG Audio Standards Some words about a bunch of Audio standards
Reconfigurable Media Coding A standard to describe decoders and to build repositories of media coding tools
16 Technologies For Virtual Spaces
Inside MPEG-4 – Graphics Adding 3D Graphics to the media tool set
Interaction Between Real And Virtual Worlds Building bridges between real and virtual worlds
Technologies To Interact With Digital Media Interating with media – but without knobs and switches
Augmented Reality Application Format It is possible to make standards for Augmented Reality, not just buzzwords
17 Systems And Services
Inside Digital Media Devices MPEG is about media but not necessarily only about media formats
Getting Things Done My Way Using, not just developing, standard technologies for a business
Technologies For The Internet Of The Future MPEG-21 and MPEG-M show a practical path to information-centric networks
18 Coping with an unreliable internet
19 More System-wide Digital Media Standards
Multimedia Standards For An Evolving Market It is time again to provide an integrated Systems-Video-Audio standard
Coping With An Unpredictable Internet DASH – to get the most out of the internet resource
Inside MPEG-H – Systems The need for new transport technologies to cope with a variety of application contexts, especially hybrid
Inside MPEG-H – 2D Video HEVC
inside MPEG-H – 3D Video After three quarters of a century of flat television, it is time to add a 3rd dimension?
Inside MPEG-H – 3D Audio The need for new transport technologies to cope with a variety of application contexts, especially hybrid
20 Compression, the technology for the digital age
21 The future of media – immersion
22 Internet of Media Things
23 Glimpses Of The Future
MPEG Explorations About the – sometimes wild – ideas for future MPEG standards
End Of the MPEG Ride? MPEG has played a major role in creating the new world of Digital Media Technologies. Does it still have a role to play?
The end of MPEG may be coming, soon? If not the end, a very substantial resizing of MPEG 
The Future Of Research Research is the basis of human progress and the life blood of MPEG. Are we sure research is in the hands of people who know what research is?
24 Acknowledgements
25 Support Material
Acronyms In a field where are just too many acronyms, this page provides the meaning of those used in these pages.
MPEG Subgroups And Chairs The hall of fame of those who served or are still serving as MPEG Chairs
MPEG standards The complete list of all MPEG standards (so far)

 


Communication Before Digital

If one sets aside some minor downsides, such as famine, floods, droughts, attacks by other tribes or death by some incurable disease, life in the Neolithic age was not necessarily so bad. If you got a smart idea – say, how to capture a deer that had been seen around – you could call on your neighbour, convince him with the force of your arguments, and then possibly the two of you would set out to convince more people and go hunting. If you were on your deathbed you would call your family and leave your last words to them so that you could die in the hope that those around your deathbed would forward your last will to your grandsons and great-grandsons. 

With an increasingly sophisticated and geographically expanded society, communication had to keep up with new needs. Writing evolved from simplified forms of drawing and painting and enabled the recording of spoken words, but could also be used to send messages to individuals in remote places or times. Kings and emperors – but also republics – could even afford to set up standing networks of couriers to propagate their will to the remotest corners of their empires or territories within the constraints of the time it took to cover the distance with a series of horses. A manuscript could reach more people, but only if they happened to be all at the same place. If not, they could only read it at different times in a sequence. However, if a group of sufficiently learned people was hired, multiple copies of an original could be made and distributed to multiple places and multiple people at the same time. 

In those early times creation of copies and distribution of manuscripts was indeed time consuming and costly. Gutenberg’s invention of mobile character printing made reproduction and distribution of written works cheaper and faster, thereby achieving an incomparably – for those times – higher productivity. Use of the invention required skilled people – with quite different skills than those needed before for copying manuscripts. Spreading an idea, until that time a process that involved a large number of people spending their time traveling and talking to other people or copying and distributing manuscripts, became a much simpler undertaking as demonstrated by the rapid spreading of Protestantism across 16th-century Europe. 

It took several centuries before the invention of the typewriter made it possible to compose a written text that did not carry with it the handwriting of the person who had typed the text. 

Daguerre’s photography was the first example of a technology enabling the automatic reproduction of the visual component of a natural and static scene without necessarily requiring a particular artistic ability on the part of the person taking a picture. On the other hand, at least in the early years, considerable technical ability was required to handle a camera, take a picture, and develop and print photographs. Photography had also the added advantage that multiple copies could be made from the same negative.

Similarly, Edison’s phonograph enabled the recording of the sound components of a portion of a physical space on a physical carrier, with the possibility to make multiple copies from the same master. An important difference between photographic and sound recording technologies of those times was that producing negatives and printing photographs required relatively inexpensive devices and materials, and was kind of within the reach of the man in the street, while recording and printing discs required costly professional equipment that could only be afforded by large organisations. 

Cinematography of the Lumière brothers permitted the capture not just of a snapshot of the visual component of the real world but of a series of snapshots close enough in time that the viewer’s brain could be convinced that they reproduced something that looked like real movement – if the series of snapshots was displayed in a sufficiently rapid succession using an appropriate device. The original motion pictures were later supplemented by sound to give a more complete reproduction of the real world, satisfying both the aural and visual senses.

The physical principles used by these technologies were mechanical for printing and sound recording, chemical and optical for photography and mechanical, chemical and optical for cinematography. 

In the wake of expensive line-of sight communication systems such as deployed by Napoleon for state use, Samuel Morse’s telegraph enabled any user to send messages in real-time to a physically separated point by exploiting the propagation of electromagnetic waves along wires, whose physical principles were barely understood at that time. The telegraph modulated an electric current with long and short signals separated by silence to enable the instantaneous transmission to a far end of a message expressed by Latin characters and, later, characters of other languages as well.

The facsimile device enabled the transmission to a far end of the information present on a piece of paper put on a scanning device and transmitted.

The teletypewriter enabled transmission of characters to a far end where an electromechanical printer would print the characters of the message.

Telephony extended the basic idea of telegraphy by sending an analogue electric signal coming from a microphone, a device that contained carbon and produced a current when sound waves impinged on it. Telephony was designed to enable real time two way communication between people.

The discovery that electromagnetic waves propagate in the air over long distances led to Marconi’s invention of wireless telegraphy first and sound broadcasting, called “radio” par excellence, later. What was done for sound, however, was later done also for light, where the equivalent of the microphone was a tube made sensitive to light by a special phosphor layer that produces current when hit by light. The equivalent of the loudspeaker was the Cathode Ray Tube (CRT), a tube with a nearly planar surface producing light at a given point of the screen at a particular time with an intensity proportional to the magnitude of the input electric signal. But unlike the time-dependent one-dimensional electric signal at the output of a microphone, the light intensity on the surface of the light-sensitive tube is a time-dependent two-dimensional signal. The mapping of such a signal into a time-dependent one-dimensional signal was achieved by reading the current generated by an electron beam scanning the tube in an agreed order (left-to-right and top-to-bottom) at a sufficiently high frequency. 

The purpose of scanning is similar to the one achieved with cinematography, viz. to convince the brain that the sequence of rapidly changing illuminated spots created by the electron flying spot is a continuous motion. The electric signal is then transmitted to a CRT where an electron beam, moving synchronously with the original electron beam, generates time-wise the same intensity that had hit the pick-up tube. The television scanning process produces much higher frequencies than audio because of the need to represent a two-dimensional time-dependent signal. The highest frequency is proportional to the product of the number of times per second an image is scanned (frame frequency), the number of scanning lines (vertical frequency) and the number of transitions that the eye can discern on a scan line (horizontal frequency). 

At the time the first television systems were introduced, the state of technology suggested the trick of subjectively “doubling” the frame frequency and the scan lines, to reduce what would otherwise be a fastidious flicker effect. A frame is composed of two “fields”, where one field has the horizontal scan lines offset by half the vertical line spacing with a process called interlacing. Originally CRTs were only capable of producing black-and-white light and therefore television could only produced monochrome pictures. Later, it became possible to manufacture pick-up tubes/CRTs with 3 types of sensor/phosphor, each capable of sensing/producing red, green and blue light, so as to pick up and generate colour pictures that looked more natural to the human eye. It was indeed experimentally proved – and physiological bases for this found – that any human-perceived colour can be expressed as a combination of 3 independent colours, such as Red Green and Blue (RGB) used in television and Cyan, Magenta and Yellow, with the addition of blacK (YMCK) used in colour printing.

colour_space

Figure 1 – The human colour space

The transformation of the aural and visual information into electric signals made possible by the microphone and the television pick-up tube, along with the ability to magnetise tapes covered with magnetic material moving at an appropriate speed and then to read the corresponding information, facilitated the invention of systems to record audio and video information in real-time. 

In more recent times radio has been used to offer telephone service to people on the move. A number of antennae, deployed in an area with an appropriate spacing so as to create a set of “cells”, capture the signal emitted by mobile telephone sets and hand it over to the antenna nearest to the receiving set when a mobile handet changes cell.

mobile_communication

Figure 1 – Mobile communication

The cellular system handles the transition to the next cell, so that users are not even aware that they are communicating through a different antenna.


Communication and Public Authorities

The most potent driver to the establishment of civilisation has been the sharing by the members of a community of an understanding that certain utterances are associated with certain objects and concepts, all the way up to some shared intellectual values. Civilisation is preserved and enhanced from generation to generation because the members of a community agree that there is a mapping between certain utterances and certain graphical signs, even though the precise meaning of the mapping may slowly change with time.

It helps to understand the process leading to the creation of writing by looking at Chinese characters: some are known to derive from a simplified drawing of an object while others, often representing concepts, contain a character indicating the category and a second character whose sound indicates the pronounciation.

chinese_characters

Figure 1 – Some Chinese characters

Writing enables a human not only to communicate with his fellow humans in different parts of the globe but also to leave messages beyond his life span. A future generation will be able to revisit the experience of people who have departed possibly centuries or even millennia before, often with difficulty because of the mentioned drift of the mapping.

Since the earliest times Public Authorities have always had a keen interest in matters related to communication. In most civilisations priesthood have, if not developed, certainly taken over the art of writing. In historical times it is possible to trace back to the political considerations behind the adoption of the Latin and Cyrillic alphabets in Middle and Eastern Europe, the adoption of Chinese characters in Japan, Korea and Vietnam, the introduction of the hangul alphabet in Korea as a replacement of Chinese characters, the replacement of the Arabic alphabet with the Latin one in post World War I Turkey, the use of the Cyrillic alphabet in the former Soviet republics in Central Asia and its recent replacement with the Latin alphabet in Turkish-speaking former Soviet republics. Beyond writing, one can recall the policy of the daimyos in medieval Japan to foster diversity of speech in their territories so as to spot intruders more easily, or the prohibition still enforced in some multi-ethnic countries dominated by one particular ethnical group to make it a crime, sometimes even punished with the death penalty, to broadcast or even speak in a public place a language different from the “official” one. 

While late 16th century Italy, with its lack of political unity (a “geographic expression” said Metternich, an Austrian foreign minister in the early 19th century), witnessed the spontaneous formation of the “Accademia della Crusca” with its self-assigned goal of preserving the Florentine language of Dante, in France the Académie française, established a few decades later by Cardinal Richelieu, is to this day the official body in charge of the definition of the French Language, whose use has been reaffirmed a few years ago by the Loi Toubon. Similarly, the German Bundestag approved a law that amended the way the German language should be written. From that time on, law-abiding Germans fond of Italian cuisine shall stop eating spaghetti and start eating Spagetti instead. In Japan the Ministry of Education publishes a list of Chinese characters (Touyou Kanji) that are taught in elementary schools and used in newspapers. These are all attempts at strengthening the ties of a community through the formal codification of verbal or written expressions. Unfortunately – or fortunately, as the case may be – sometimes the success of its implementation falls short of intentions. 

From early on, technology extended peoples’ communication capabilities in ways that Public Authorities did not necessarily welcome. Regulating the use of goose pens was not easy to implement because of the large supply of “raw material”, so the goal of inhibiting the dissemination of “dangerous” ideas was effectively achieved by keeping people in ignorance. It was printing, with the greater ease for interested people to disseminate their views, which provided Public Authorities with their first technology-induced challenge. The Catholic Church wanted to retain control of orthodoxy and decided to introduce “imprimatur“, a seal of “absence of disapproval” meaning that there was no opposition (“nihil obstat”) on the part of the Church to the printing of a particular book. It did not take long for civil authorities to emulate the Church, so much so that “freedom of the press” did become one of the first claims made by the different revolutions that affected Europe in the late 18th and all of the 19th centuries, while the Americans got it earlier – but not at the very beginning of their independence – as the First Amendment to their Constitution. 

The mail service is an example of Public Authorities proactively fostering communication, but mail is a communication system largely intended to be person-to-person. The mail service started in the UK in 1840 with the introduction of prepaid letters and developed quickly in all countries soon afterwards. All countries charged a uniform rate for all letters of a certain weight within their countries, regardless of the distance involved. The conflicting web of postal services and regulations linking the different countries was overcome by the General Postal Union (GPU), established in 1874 by a number of states, and renamed Universal Postal Union (UPU) in 1878, when the member states succeeded in defining a single postal territory where the principle of “freedom of transit” for letters applied and mail items could by exchanged using a single rate. This did not mean that restrictions were not applied, though. Censorship has now disappeared in most countries, but Public Authorities still retain the right, in special circumstances and subject to certain rules, to open letters. 

The invention of telegraphy must have caused great concern to Public Authorities, because their citizens were suddenly given the technical means to instantly communicate with anybody, first within and later even outside of their country. But theirs was a brave reaction: after the first confused restrictions, when a telegram had to be transcribed, translated and handed over in paper at the frontier between two countries before being retransmitted over the telegraph network of the neighbouring country, Public Authorities of that time took the very forward-looking attitude of agreeing on a single “information representation” standard, i.e. a single code to represent a given character. It did take some time, but eventually they got there. All this was facilitated by the establishment in 1865 of the International Telegraph Convention, one of the first examples of sovereign states ceding part of their authority to a specific supranational organisation catering for common needs. In 1885, following the invention of the telephone and the subsequent expansion of telephony, the International Telegraph Convention began to draw up international rules for telephony as well.

In 1906, after the invention of radio, the first International Radiotelegraph Convention was signed. The International Telephone Consultative Committee (CCIF), set up in 1924, the International Telegraph Consultative Committee (CCIT), set up in 1925, and the International Radio Consultative Committee (CCIR), set up in 1927 were made responsible for drawing up international standards. In 1927, the Convention allocated frequency bands to the various radio services existing at the time (fixed, maritime and aeronautical mobile, broadcasting, amateur, and experimental). In 1934 the International Telegraph Convention of 1865 and the International Radiotelegraph Convention of 1906 were merged and became the International Telecommunication Union (ITU). In 1956, the CCIT and the CCIF were amalgamated to give rise to the International Telephone and Telegraph Consultative Committee (CCITT). Today the CCITT is called ITU-T and the CCIR is called ITU-R. We now take it for granted that we can make fixed-line telephone calls and listen to analogue radio everywhere in the world, but this is the result of decades of efforts by the people who have worked in these international committees to make this happen – a unique achievement if one thinks of the belligerent attitude of countries of those times. 

In the 1930s, the UK started a television broadcasting system, with 405 horizontal scanning lines and 25 interlaced frames/second. The USA did the same in 1942 but with a system that had 525 lines and 30 interlaced frames/second. After the end of World War II and each at different times, the other European countries introduced their television systems – all with 625 horizontal scanning lines and 25 interlaced frames/second and were followed by the UK that had to manage a dual system (405 and 625 lines) for several decades. Maybe because the ravages of World War II were still so vivid in people’s minds, the same system was adopted over all of Europe, possibly the only example of such a large-scale agreement on both sides of the Iron Curtain. 

A complex tug-of-war started when progress in pick up tubes, displays and electronics in general made colour television possible. The United States extended their system and defined a nation-wide television standard defined by and known as National Television System Committee (NTSC). In doing so they had to change the original frame frequency of 30.00 Hz to 29.97 Hz (Americans call it trial and error, and they claim it works). Japan, South Korea and Taiwan, and most countries in the American continent that had already adopted the 525-line 30 Hz television standard soon adopted NTSC. 

A few years later Europe witnessed the competition between the German-originated system called Phase Alternating Lines (PAL) and the French-originated system Séquentiel à Mémoire (SECAM) that spread across countries and continents, a fact reminiscent of battles of yore but, thanks God, less bloody this time. PAL and SECAM extended their influence also to the American continent, the Monroe doctrine notwithstanding. Two of the three Guyanas use PAL, but the French Guyana uses SECAM, Argentina chose PAL but with a different colour subcarrier and Brazil decided to add its own indigenous version of PAL on top of the original American 525 lines 30.00 Hz TV system. 

This bifurcation – the first major split in international telecommunication standards – was justified because the Very High Frequency (VHF) radio band – around 100 MHz – and later of the Ultra High Frequency (UHF) radio band – a few 100s MHz – did not allow propagation beyond line of sight. While Short Wave (SW) and Medium Wave (MW) radio that could propagate for thousands of kilometres, television could be made a strictly “national business” and managed accordingly – a blessing of God for the local Public Authorities. The CCIR became a place where countries would come and inform the body of their decisions and the CCIR, much as a Notary Public, dutifully recorded the information provided. The result is that ITU-R Recommendation 624 “Television systems”, is a document with over thirty pages, full of tables where minuscule and large countries alike competing in footnotes stating that they reserve the right to adopt, say, a different frequency tolerance compared to the value adopted by other countries. This is clearly not because, all of a sudden, the Maxwell equations governing propagation of radio waves start behaving differently when a border is crossed, but because of a conscious policy decision driven by the desire to protect the national manufacturing industry or to keep foreign television programs out of the national market, or both. Interestingly, though, Frequency Modulation (FM) radio that uses the same frequency band as television, and has accordingly the same propagation constraints, is the same all over the world. 

All colour television systems are based on the idea of filling in some “holes” in the spectrum of the monochrome TV signal – called Y signal – with the spectrum of two colour difference signals, U = R-Y and V = B-Y. From these 3 signals a receiver can recover the three RGB colour primaries and drive the flying spot with the right colour information to the corresponding phosphors.  

Interestingly, Public Authorities had little concern of communication means other than telecommunication and broadcasting. For instance formats for tapes, cassettes and discs have consistently and independently been defined by private enterprises, as shown by the cases of the Compact Disc (CD) or the Vertical Helix Scan (VHS) format for video cassette recording universally adopted after the market has issued its verdict between competing technologies. The International Electrotechnical Commission (IEC), in charge of international standards for all electrical, electronic, and related technologies, has a role similar to the one played by ITU-R for television systems. 

The International Organisation for Standardisation (ISO) deals with such communication standards as photography, cinematography and Information Technology (IT). The ISO work on photography and cinematography has ensured that anybody could buy a camera anywhere in the world and a film anywhere else in the world – choosing among a small number of formats – and be sure that there is a film that fits in the camera. Examples of IT standards produced or ratified by ISO are character set codes, such as the 7-bit American Standard Code for Information Interchange (ASCII), also known as (aka) ISO/IEC 646, 8-bit Latin 1 (ISO/IEC 8859-1) and the 16-bit Unicode (part of ISO/IEC 10646).


Digital communication is good

Both audio and video signals can be represented as waveforms. The number of waveforms corresponding to a signal is 1 for telephone, 2 for stereo music and 3 for colour television. While it is intuitively clear that an analogue waveform can be represented to any degree of accuracy by taking a sufficiently large number of samples of the waveform, it was the discovery of Harry Nyquist, a researcher at the Bell Labs in the 1920s, formalised as the Nyquist theorem bearing his name, that a signal with a finite bandwidth of B Hz can be perfectly – in a statistical sense – reconstructed from its samples, if the number of samples per second taken on that signal is greater than 2B. Bell Labs used to be the research centre of the Bell System that included ATT, the telephone company operating in most of the USA and Western Electric, the manufacturing arm (actually it was the research branch of Western Electric’s engineering department that became the Bell Labs).

sampling_and_quantisation

Figure 1 – Signal sampling and quantisation

In the mid 1940s it became possible to build electronic computers. These were machines with thousands of electronic tubes designed to make any sort of calculations on numbers expressed in binary form based on the sequence of operations described in a list of instructions, called “program”. For several decades the electronic computer was used in a growing number of fields: science, business, government, accounting, inventory etc., in spite of the guess of one IBM executive in the early days that “the world would need only 4 or 5 such machines”.

Researchers working on audio and video saw the possibility of eventually using electronic computers to process samples taken from waveforms. The Nyquist theorem establishes the conditions under which using samples is statistically equivalent to using the continuous waveforms, but samples are in general real numbers, while digital electronic computers can only operate on numbers expressed with a finite number of digits. Fortunately, another piece of research carried out at Bell Labs showed that, given a statistical distribution of samples, it is possible to calculate the maximum number of levels – called quantisation levels – that must be used to represent signal samples so that the power of the error generated by the imperfect digital representation can be kept below a given value. 

So far so good, but this was “just” the theory. The other, practical but no less important, obstacle was the cost and clumsiness of electronic computers of that time. Here another invention of the Bell Laboratories – the transistor – created the conditions for another later invention – the integrated circuit. This is at the basis of the unstoppable progress of computing devices, also known as Moore’s law, that allows making more powerful and smaller integrated devices, including computers, by reducing the size of circuit geometry on silicon (i.e. how close a “wire” of a “circuit” can be close to a “wire” of another circuit). 

It is nice to think that Nyquist’s research was funded by enlightened Bell Labs managers who foresaw that one day there could be digital devices capable of handling digitised telephone signals. Such devices would be used for two fundamental functions of the telecommunication business. The first is moving bits over a transmission link, the second is routing (switching) bits through the nodes of a network. Both these functions – transmission and switching – were already performed by the networks of that time but in analogue form. 

The motivations were less exciting but no less serious. The problem that plagued analogue telephony was its unreliability, because analogue equipment performance tends to drift with time, typically because electrical component deteriorate slowly with time. A priori, it is not particularly problematic to have a complex system like the telephone network subject to error – nothing is perfect in this world – the problem is when a large number of small random drifts add up in unpredictable ways and the performance of the system degrades below acceptable limits without being able to point the finger to a specific cause. More than by the grand design suggested above, the drive to digitisation was caused by the notion that, if signals were converted into ones and zeroes, one could create a network where devices either worked or did not work. If this was achieved, it would have been possible to put in place procedures that would make spotting the source of malfunctioning easier. Correcting the error would then easily follow: just change the faulty piece. 

In the 1960s the CCITT made its first decision on the digitisation of telephone speech: a sampling frequency of 8 kHz and 8 bits/sample for a total bitrate of 64 kbit/s. Pulse Code Modulation (PCM) was the name given to the technology that digitised signals. But digitisation had an unpleasant by-product: conversion of a signal into digital form with sufficient approximation creates so much information that transmission or storage requires a much larger bandwidth or capacity than the one required by the original analogue signal. This was not just an issue for telecommunication, but also for broadcasting and Consumer Electronics, all of which used analogue signals for transmission – be it on air or cable – or storage, using some capacity-limited device. For computers this was not an issue – just yet – because at that time audio and video were a data type that was still too remote from practical applications if in digital form. 

The benefits of digitisation did not extend just to the telco industry. Broadcasting was also a possible beneficiary because the effect of distortions on analogue radio and television signals would be greatly reduced – or would become more manageable – by conversion to digital form. The problem was again the amount of information generated in the process. Digitisation of a stereo sound signal (two channels) by sampling at 48 kHz with 16 bits/sample, one form of the so-called AES/EBU interface developed by the Audio Engineering Society (AES) and the European Broadcasting Union (EBU), generates 1,536 kbit/s. Digitisation of television by sampling the luminance information at 13.5 MHz and each of the colour-difference signals at 6.75 MHz (this subsampling can be done because the eye is less demanding on colour information accuracy), generates 216 Mbit/s. Dealing with such high bitrates required special equipment that could only be used in the studio. 

In the CE domain, digital made a strong inroad at the beginning of the 1980s when Philips and Sony on the one hand, and RCA on the other, began to put on the market the first equipment that carried bits with the meaning of musical audio to consumers’ homes in the form of a 12-cm optical disc. After a brief battle, the Compact Disc (CD) defined by Philips and Sony prevailed over RCA’s. The technical specification of the CD was based on the sampling and quantisation characteristics of stereo sound: 44.1 kHz sampling and 16 bits/sample for a total bitrate of 1.41 Mbit/s. For the first time end-users could have studio-quality stereo sound in their homes at a gradually affordable cost, providing the same quality no matter how many times the CD was played, something that only digital technologies could make possible.


Compressed Digital Is Better

A new technology to store 650 MByte of data on a CD had been developed based on the same basic principles used by the telco industry for their optical fibres, but the way it had been implemented had brought the price down to levels that industry could only dream of. Storing that impressive amount of data on a 12-cm physical carrier was a done deal, but transmitting the same data over a telephone line, would take a long time, because the telephone network had been designed to carry telephone signals with a nominal bandwidth of just 4 kHz. 

So, the existing telephone network was unsuitable for the future digital world, but what should be an evolutionary path to a network handling digital signals? Two schools of thought formed, the visionaries and the realists. The first school aimed at replacing the existing network with a brand new optical network, capable of carrying hundreds – maybe more – Mbit/s per fibre. This was technically possible and had already been demonstrated in the laboratories, but the issue was bringing the cost of that technology to acceptable levels, as the CE industry had done with the CD. The approach of the second school was based on more sophisticated considerations (this does not imply that the technology of optical fibres is not sophisticated): if the signals converted to digital form are indeed bandwidth-limited, as they have to be if the Nyquist theorem is to be applied, contiguous samples are unlikely to be radically different from one another. If the correlation between contiguous samples can be exploited, it should be possible to reduce the number of bits needed to represent the signals without affecting its quality. 

Both approaches required investments in technology, but of a very different type. The former required investing in basic material technology, whose development costs had to be borne by suppliers and the deployment costs by the telcos (actually, because of the scarce propensity of manufacturers to invest on their own, telcos would have to fund that research as well). The latter required investing in algorithms to reduce the bitrate of digital Audio and Video (AV) signals and devices capable of performing what was expected to be a very high number of computations per second. In the latter case it could be expected that the investment cost could be shared with other industries. Of course either strategy, if lucidly implemented, could have led the telco industry to world domination, given the growing and assured flow of revenues that the companies providing the telephone service enjoyed at that time. But you can hardly expect such a vision from an industry accustomed to cosset Public Authorities to preserve its monopoly. These two schools of thought have dominated the strategy landscape of the telco industry, the former having more the ears of the infrastructure portion of the telcos and the latter having more the ears of the service portion. 

The first practical application of the second school of thought was the facsimile. The archaic Group 1 and Group 2 analogue facsimile machines required 6 and 3 minutes, respectively, to transmit an A4 page. Group 3 is digital and designed to transmit an A4 page scanned with a density of 200 Dots Per Inch (DPI) with a standard number of 1728 samples/scanning line and a number of scanning lines of about 2,300 if the same scanning density is used vertically (most facsimile terminals, however, are set to operate at a vertical scanning density of ½ the horizontal scanning density).

Therefore, if 1 bit is used to represent the intensity (black or white) of a dot, about 4 Mbits are required to represent an A4 page. It would take some 400 seconds to transmit this amount of data using a 9.6 kbit/s modem (a typical value at the time Group 3 facsimile was introduced), but this time can be reduced by a factor of about 8 (the exact reduction factor depends on the specific content of the page) using a simple but effective scheme based on the use of: 

  1. “Run lengths” of black and white samples: instead of sample values
  2. Different code words to represent black and white run lengths: their statistical distribution is not uniform (e.g. black run lengths are more likely to be shorter than white ones). This technique is called Variable Length Coding (VLC). 
  3. Different VLC tables for black and white run lengths: black and white samples on a piece of paper have different statistics. 
  4. Information of the previous scan line: if there is a black dot on a line, it is more likely that there will be a black dot on the same vertical line one horizontal scanning line below than if there had been a white dot. 

Group 3 facsmile has been the first great example of successful use of digital technologies in the telco terminal equipment market. 

But let’s go back to speech, the telco industry’s bread and butter. In the early years, 64 kbit/s for a digitised telephone channel was a very high bitrate in the local access line, but not so much in the long distance, where bandwidth, even without optical fibres, was already plentiful. Therefore, if speech should ever reach the telephone subscriber in digital digital form, a lower bitrate had to be achieved.

The so-called Differential PCM Pulse Code Modulation (DPCM) was widely considered in the 1970s and ’80s because the technique was simple and offered a moderate compression. DPCM was based on the consideration that, since the signal has a limited bandwidth, consecutive samples will not be so different from one another, and sending the VLC of the difference of the two samples will statistically require fewer bits. In practice, however, instead of subtracting the previous sample from the actual sample, it is more effective to subtract an estimate of the previous sample, as shown in the figure below. Indeed, it is possible to make quite accurate estimates because speech has some well identified statistical characteristics given by the structure of the human auditory system. Taking into account the sensitivity of the human ear, it is convenient to quantise the difference finely if the difference value is small and more coarsely if the value is larger before applying VLC coding. The decoder is very simple as the output is the running sum of each differential sample and the value of reconstructed samples filtered through the predictor.

DPCM_encoder DPCM_decoder

Figure 1 – DPCM encoder and decoder

 DPCM was a candidate for video compression as well but the typical compression ratio of 8:3 deemed feasible with DPCM was generally considered inadequate for the high bitrate of digital video signals. Indeed, using DPCM one could hope to reduce the 216 Mbit/s of a digital television signal down to about 70 Mbit/s, a magic number for European broadcasters because it was about 2 times 34 Mbit/s, an element of the European digital transmission hierarchy. For some time this bitrate was fashionable in the European telecom domain because it combined the two strategic approaches: DPCM running at a clock speed of 13.5 MHz was a technology within reach of the early 1980s and 70 Mbit/s was “sufficiently high” to still justify the deployment of optical fibres to subscribers. 

To picture the atmosphere of those years, let me recount the case when Basilio Catania, then Director General of CSELT and himself a passionate promoter of optical networks, opened a management meeting by stating that because 210 Mbit/s was going to be feasible as subscriber access soon (he was being optimistic, it is becoming, if not economically, at least strategically feasible even now, 35 years later), and because 3 television programs was what a normal family would need, video had to be coded at 70 Mbit/s. The response to my naïve question if the solution just presented was the doctor’s prescription, was that I was no longer invited to the follow-up meetings. Needless to say that the project of bringing 3 TV channels to subscribers went nowhere. 

Another method, used in music coding, subdivides the signal bandwidth in a number of subbands, each of which is quantised with more or less accuracy depending on the sensitivity of the ear to the particular frequency band. For obvious reasons this coding method is called Subband Coding (SBC).

Yet another method uses the properties of certain linear transformations. A block of N samples can be represented as a point in an N-dimensional space and a linear transformation can be seen as a rotation of axes. In principle each sample can take any value within the selected quantisation range, but because samples are correlated, these will tend to cluster around a hyper-ellipsoid in the N-dimensional space. If the axes are rotated, i.e. an “orthogonal” linear transformation is applied, each block of samples will be represented by different numbers, called “transform coefficients”. In general, the first coordinate will have a large variance, because it corresponds to the axis of the ellipsoid that is most elongated, while the subsequent coordinates will tend to have lesser variance.

axis_rotation

Figure 2 – Axis rotation in linear transformation

If the higher-variance transformed coefficients (the u axis in the figure) are represented with higher accuracy and lower-variance coefficients are represented with lower accuracy, or even discarded, a considerable bit saving can be achieved, without affecting the values of the samples too much when the original samples are reproduced approximately, using available information, by applying an inverse transformation. 

The other advantage of transform coding, besides compression, is the higher degree of control of the number of bits used that can be obtained compared with DPCM. If the number of samples is N and one needs a variable bitrate scheme between, say, 2 and 3 bits/sample, one can flexibly assign the bit payload between 2N and 3N. 

A major shortcoming of this method is that, in the selected example, one needs about NxN multiplications and additions. The large number of operations is compensated by a rather simple add-multiply logic. “Fast algorithms” need a smaller number of multiplications (about Nxlog2(N)), but require a considerable amount of logic driving the computations. An additional concern is the delay intrinsic in transform coding. By smartly programming the instructions, this roughly corresponds to the time it takes to build the block of samples. If the signal is, say, music sampled at 48 kHz, and N=1,024, the delay is about 20 ms, definitely not a desirable feature for real-time communication, of some concern in the case of real-time broadcasting and of virtually no concern for playback from a storage device. 

The analysis above applies particularly to a one-dimensional signal like audio. Pictures, however, present a different challenge. In principle the same algorithms as used in the one-dimensional (1D) audio signal could be applied to pictures. However, transform coding of long blocks of 1D picture samples (called pixel, from picture element) was not the way to go because image signals have a correlation that dies out rather quickly, unlike audio signals that are largely oscillatory in nature and whose frequency spectrum can then be analysed using a large number of samples. Applying linear transformations to 2D blocks of samples was very effective, but this required storing at least 8 or 16 scanning lines (the typical block size choices) of 720 samples each (the standard number of samples of a standard television signal), a very costly requirement in the early times (second half of the 1970s) when the arrival of the 64 Kbit RAM chip after the 16 Kbit Random Access Memory (RAM) chip, although much delayed, was hailed as a great technology advancement (today we have 128 Gbyte RAM chips) . 

Eventually, the Discrete Cosine Transform (DCT) became the linear transformation of choice for both still and moving pictures. It is therefore a source of pride for myself that my group at CSELT has been one of the first to investigate the potential of linear transforms for picture coding, and probably the first to do so in Europe on a non-episodic fashion. In 1979 my group implemented one of the first real-time still-picture transmission systems that used DCT and exploited the flexibility of transform coding by allowing the user to choose between transmitting more pictures per unit of time at a lesser quality or fewer pictures at a higher quality.

Pictures offered another dimension compared to audio. Correlation within a picture (intra-frame) was important, but much more could be expected by exploiting inter-picture (inter-frame) correlation. One of the first algorithms considered was called Conditional Replenishment (CR) coding. In this system a frame memory contains the previously coded frame. The samples of the current frame are compared line by line, using an appropriate algorithm, with those of the preceding frame. Only the samples considered to be “sufficiently different” from the corresponding samples of the previously coded frames are compressed using intra-frame DPCM and placed in a transmission buffer. Depending on the degree of buffer fullness, the threshold of “change detection” can be raised or lowered to produce more or fewer bits. 

A more sophisticated algorithm is so-called Motion Compensation (MC) video coding. In its block-based implementation, the encoder looks for the best match between a given block of samples of the current frame and a block of samples of the preceding frame. For practical reasons the search is performed only within a window of reduced size. Then the differences between the given block and the “motion-compensated” block of samples are encoded using again a linear transformation. From this explanation it is clear that inter-frame coding requires the storage of a full frame, i.e. 810 Kbyte for a digital television picture of 625 lines. When the cost of RAM decreased sufficiently, it became possible to promote digital video as a concrete proposition for video distribution. 

From the 1960s a considerable amount of research was carried out on compression of different types of signals: speech, facsimile, videoconference, music, television. The papers produced at conferences or academic journals can be counted by hundreds of thousand and the filed patents by the tens of thousands. Several conferences developed to cater to this ever-increasing compression coding research community. At the international level, the International Conference on Acoustics, Speech and Signal Processing (ICASSP), the Picture Coding Symposium (PCS) and the more recent International Conference on Image Processing (ICIP) can be mentioned. Many more conferences exist today on special topics or at the regional/national level. Several academic journals dealing with coding of audio and video also exist. In 1988 I started “Image Communication”, a journal of the European Signal Processing Association (EURASIP). During my tenure as Editor-in-Chief of that journal until 1999, more than 1,000 papers were submitted for review to that journal alone.

I would like to add a few words about the role that CSELT and my lab in particular had in this space. Already in 1975 my group had received the funds to build a simulation system that had A/D and D/A converters for black and white and PAL television, a solid state memory of 1 Mbyte built with 4 kbit RAM chips, connected to a 16-bit minicomputer equipped with a Magnetic Tape Unit (MTU) to move the digital video data to and from the CSELT mainframe where simulation programs could be run.

In the late 1970s the Ampex digital video tape recorder promised to offer the possibility to store an hour of digital video. This turned out not to be possible because the input/output of that video tape recorder was still analogue. With the availability of the 16 kbit RAM chips we could build a new system called Digital Image Processing System (DIPS) that boasted a PDP 11/60 with 256 Kbyte RAM interfaced to the Ampex digital video tape recorder and a 16 Mbyte RAM for real-time digital video input/output.

DIPS1

Fig. 3 – The DIPS Image Processing System

But things had started to move fast and my lab succeeded in building a new simulation facility called LISA by securing the latest hardware: a VAX 780 interfaced to a system made of two Ampex disk drives capable of real-time input/output of digital video according to ITU-R Recommendation 656 and, later, a D1 digital tape recorder.

LISA1

Fig. 4 The LISA Image Processing System


The First Digital Wailings

ITU-T has promulgated a number of standards for digital transmission, starting from the foundational standard for digital representation of telephone speech at 64 kbit/s. This is actually a two-edged cornerstone because the standard specifies two different methods to digitise speech: A-law and µ-law. Both use the same sampling frequency (8 kHz) and the same number of bits/sample (8) but two different non-linear quantisation characteristics to take into account the logarithmic sensitivity of the ear to audio intensity. Broadly speaking, µ-law is used in North America and Japan and A-law is used in the rest of the world. 

While any telephone subscriber can technically communicate with any other telephone subscriber in the world, there are differences on how the communication is actually set up. If the two subscribers wishing to communicate belong to the same local switch they are directly connected via that switch. If they belong to different switches, they are connected through a long-distance link where a number of telephone channels are “multiplexed” together. The size of the multiplexer depends on the likelihood that a subscriber belonging to switch A will want to connect to a subscriber belonging to switch B at the same time. This pattern is then repeated through a hierarchy of switches. 

When telephony was analogue, multiplexers were implemented using a Frequency Division Multiplexing (FDM) technique where each telephone channel was assigned a 4 kHz slice. The hierarchical architecture of the network did not change with the use of digital techniques. The difference was in the technique used, Time Division Multiplexing (TDM) instead of FDM. In TDM a given period of time is assigned to the transmission of 8 bits of a sample of a telephone channel, followed by a sample of the next telephone channel, etc. Having made different choices at the starting point, Europe and USA (with Japan also asserting their difference with yet another multiplexing hierarchy) kept on doing so with the selection of different transmission multiplexers: the primary USA multiplexer has 24 speech channels – also called Time Slots (TS) – in the multiplexer, plus 8 kbit/s of other data for a total bitrate of 1,544 kbit/s. The primary European multiplexer has 30 speech channels plus two non-speech channels for a total bitrate of 2,048 kbit/s. 

While both forms of multiplexing do the job they are expected to do, the structure of the 1,544 kbit/s multiplex is a bit clumsy, with 8 kbit/s inserted in an ad hoc fashion in the 1,536 kbit/s of the 24 TSs. Instead, the structure of the 2,048 kbit/s is cleaner because the zero-th TS (TS 0) carries synchronisation, the 16th TS (TS 16) is used for network signalling purposes and the remaining 30 TSs carry the speech channels. The American transmission hierarchy is 1.5/6/45 Mbit/s and the European hierarchy is 2/8/34/140 Mbit/s.

The decision to bifurcate was a deliberate decision of the European PTT Administrations, each having strong links with national manufacturers (at least two of them, as was the policy at that time, so as to retain a form of competition in procurement), who feared that, by adopting a worldwide standard for digital speech and multiplexing, their industries would succumb to the more advanced American manufacturing industry. These different communication standards notwithstanding, the ability of end-users to communicate is not affected because speech was digitised only for the purpose of core network transmission, while end-user devices continued to receive and transmit analogue speech. 

The development of Group 3 facsimile was driven by the more enlightened approach of providing an End-To-End (E2E) interoperable solution. This is not surprising because the “service-oriented” portions of the telcos and related manufacturers, many of which were not typical telco manufacturers, were the driving force for this standard that sought to open up new services. The system is used even today, althought with a downward trend, after some 40 years of existence, with hundreds million devices sold. 

The dispersed world of television standards found its unity again – sort of – when the CCIR approved Recommendations 601 and 656 related to digital television. Almost unexpectedly, agreement was found on a universal sampling frequency of 13.5 MHz for luminance and 6.75 MHz for the colour difference signals. By specifying a single sampling frequency, NTSC and PAL/SECAM can be represented by a bitstream with an integer number of samples per second, per frame and per line. The number of active samples is 720 for Y (luminance) lines and 360 for U and V (colour difference signals) lines. This format is also called 4:2:2 where the three numbers represent the ratio of the Y:U:V sampling frequencies. This sort of reunification, however, was achieved only in the studio where Recommendation 656 is mostly confined because 216 Mbit/s is a very high bitrate, even by today’s standard. Recommendations 601 and 656 are linked to Recommendation 657, the digital video recorder standard known as D1. While this format has not been very successful market-wise, it played a fundamental role in the picture coding community because it was the first device available on the market that enabled storage, exchange and display of video without the degradation introduced by analogue storage devices.

ITU-T also promulgated several standards for compressed speech and video. One of them is the 1.5/2 Mbit/s videoconference standard of the “H.100 series”, currently no longer in use, but nevertheless important because it was the first international standard for a digital audio-visual communication terminal. This transmission system was originally developed by a European collaborative project called COST 211 (COST stands for Collaboration Scientifique et Technique and 211 stands for area 2 – telecommunication, project no. 11) in which I represented Italy. 

It was a remarkable achievement because the project designed a complete terminal capable of transmitting audio, video, facsimile and other data, including End-to-End signaling. The video coding, very primitive if seen with today’s eyes, was a full implementation of a DPCM-based Conditional Replenishment scheme. My group at CSELT developed one of the four prototypes, the other three being those of British Telecom, Deutsche Telekom and France Telecom (these three companies had different names at that time, because they were government-run monopolies). The prototypes were tested for interoperability using 2 Mbit/s links over communication satellites, another first. The CSELT prototype was an impressive two 6U racks made of Medium Scale Integration (MSI) and Large Scale Integration (LSI) circuits that even contained a specially-designed Very Large Scale Integration (VLSI) circuit for change detection that used the “advanced” – for the late 1970s – 4 µm geometry! 

Videoconferencing, international by definition because it served the needs of business relations trying to replace long-distance travel with a less time-consuming alternative, was soon confronted with the need to deal with the incompatibilities of television standards and transmission rates that nationally-oriented policies had built over the years. Since COST 211 was a European project, the original development assumed that video was PAL and transmission rate was 2,048 kbit/s. The former assumption had an important impact on the design of the codec, starting from the size of the frame memory.

By the time the work was being completed, however, it was belatedly “realised” that the world had different television standards and different transmission systems. A solution was required, unless every videoconference rooms in the world was equipped with cameras and monitors of the same standard – PAL – clearly wishful thinking. Unification of video standards is something that had been achieved – for safety reasons – in the aerospace domain, where all television equipment is NTSC, but it was not something that would ever happen in the fragmented business world of telecommunication terminals and certainly not in the USA where even importing non-NTSC equipment was illegal at that time. 

A solution was found by first making a television “standard conversion” between NTSC and PAL in the coding equipment, then use the PAL-based digital compression system as originally developed by COST 211, and output the signal in PAL, if that was the standard used at the receiving end, or make one more standard conversion in output, if the receiving end was NTSC. Extension to operate at 1.5 Mbit/s was easier to achieve because the transmission buffer just tended to fill up more quickly than with a 2 Mbit/s channel, but the internal logic remained unaltered. 

At the end of the project, the COST 211 group realised that there were better performing algorithms – linear transformation with motion compensation – than the relatively simple but underperforming DPCM-based Conditional Replenishment scheme used by COST 211. This was quite a gratification for a person who had worked on transform coding soon after being hired and had always shunned DPCM as an intraframe video compression technology without a future in mass communication. Instead of doing the work as a European project and then trying to “sell” the results to the international ITU-T environment – not an easy task, as had been discovered with H.100 – the decision was made to do the work in a similar competitive/collaborative environment as COST 211 but within the international arena as a CCITT “Specialists Group”. This was the beginning of the development work that eventually gave rise to the ITU-T Recommendation H.261 for video coding at px64 kbit/s (p=1,…, 30). An extension of COST 211, called COST 211 bis, continued to play a “coordination” role of European participation in the Specialists Group.

The first problem that the Chairman Sakae Okubo, then with NTT, had to resolve was again the difference in television standards. The group decided to adopt an extension of the COST 211 approach, i.e. conversion of the input video signal to a new “digital television standard”, with the number of lines of PAL and the number of frame/second of NTSC – using the commendable principle of “burden sharing” between the two main television formats. The resulting signal, digitised and suitably subsampled to 288 lines of 352 samples at a frame rate of 29.97 Hz, would then undergo the digital compression process that was later specified by H.261. The number 352 was derived from 360, ½ the 720 pixels of digital television, but with a reduced number of pixels per line because it had to be divisible by 16, a number required by the block-coding algorithm of H.261. 

I was part of that decision made at the Turin meeting in 1984. At that time I had already become hyper-sensitive to the differences in television standards. I thought that the troubles created by the divisive approach followed by our forefathers had taught a clear enough lesson and the world did not need yet another, no matter how well-intentioned, television standard, even if it was confined inside the digital machine that encoded and decoded video signals. I was left alone and the committee went on approving what was called “Common Intermediate Format” (CIF). This regrettable decision set aside, H.261 remains the first example of truly international collaboration in the development of a technically very complex video compression standard. 

Common_Intermediate_Format

Figure 1 – The Common Intermediate Format

The “Okubo group”, as it was soon called, made a thorough investigation of the best video coding technologies and assembled a reasonably performing video coding system for bitrates, like 64 kbit/s, that were once considered unattainable (even though some prototypes based on proprietary solutions had already been shown before). The H.261 algorithm can be considered as the progenitor of most video coding algorithms commonly in use today, even though the equipment built against this standard was not particularly successful in the market place. The reason is that the electronics required to perform the complex calculations made the terminals bulky and expensive. No one dared make the large investments needed to manufacture integrated circuits that would have reduced the size of the terminal and hence the cost (the rule of thumb used to be that the cost of electronics is proportional to its volume). The high cost made the videoconference terminal a device centrally administered in companies, thus discouraging impulse users. Then the video quality when using H.261 at about 100 kbit/s, the bitrate remaining when using 2 Integrated Service Digital Network (ISDN) time slots after subtracting the bitrate required by audio (when this was actually compressed) and other ancillary signals, was far from satisfactory for business users, since consumers had already been scared away by the price of the device. Lastly there remains the still unanswered question: do people really wish or need to see the face of the person they are talking to over the telephone on a video screen?


Digital Technologies Come Of Age

Speech digitisation was driven by the need to manage long-distance transmission systems more effectively. But the result of this drive also affected end users because, thanks to digital technologies, the random level of speech quality that had plagued telephony since its early days began to have less consequences on communication. Later the CCITT adopted the Group 3 facsimile standard that offered considerable improvement to end users, business first and general consumers later. Then Philips and Sony on the one hand, and RCA on the other, began to put on the market the first equipment that carried bits with the meaning of music to consumers’ homes. 

As in the case of speech digitisation, digital technologies in the CE space were not primarily intended to offer end users something really new. The CD was just another carrier, more compact and lighter to distribute records with a quality claimed to be indistinguishable from the studio’s. The opinion of some consumers, however, seems to indicate that CD quality is not necessarily the issue because some claim that the sound of the Long Playing (LP) record is better. In other words, the drivers to both digital speech and digital music were the stability, and reduced manufacturing and maintenance cost offered by digital technology, not quality. This last feature was just a by-product. 

In the same years of the CD (1982) the CCITT, with the publication of its Recommendation H.100, enabled videoconferencing through the use of 1.5/2 Mbit/s primary multiplexers as carriers of compressed videoconference streams. Videoconferencing was not unknown at that time because several telecommunication operators, and broadcasting companies as well, had videoconference trials – all using analogue techniques – but there was hope that H.100 would eventually enable a diffused use of a business communication form that at that time was little more than a curiosity. This was followed in the mid 1980s by the beginning of the standardisation activity that would give rise to CCITT Recommendations H.261 (video coding at px64 kbit/s) and H.221 (media multiplex) together with other CCITT Recommendations for coding speech sampled at a bitrate less than or equal to 64 kbit/s, in some cases with a wider speech bandwidth than 4 kHz. These activities were synergistic with the huge CCITT standardisation project known as ISDN that aimed to bring 144 kbit/s to subscribers using existing telephone lines. 

In the mid 1980s several CE laboratories were studying methods to digitally encode audio-visual signals for the purpose of recording them on magnetic tapes. One example was the European Digital Video Recorder (DVS) project, originally a very secretive project that people expected would provide a higher-quality alternative to the analogue VHS or Betamax videocassette recorder, as much as the CD was a higher-quality alternative to the LP record. Still in the area of recording, but for a radically new type of application – interactive video on compact disc – Philips and RCA were studying methods to encode video signals at bitrates of 1.4 Mbit/s fitting in the output bitrate of their CDs. 

Laboratories of broadcasting companies and related industries were also active in the field of audio and video coding for broadcasting purposes. The Commission Mixte pour les Transmission Télévisuelles et Sonores (CMTT), a special Group of the ITU dealing with issues of transmission of radio and television programs on telecommunication networks and now folded into ITU-T as Study Group 9, had started working on transmission of compressed digital television for “primary contribution” (i.e. transmission between studios). At the end of the 1980s RAI and Telettra, the latter an Italian manufacturer of telecommunication equipment, had developed an HDTV codec for satellite broadcasting that was used for very impressive demonstrations during the Soccer Worldcup hosted by Italy in 1990. Slightly later, General Instrument (GI) had showed its Digicipher II system for terrestrial HDTV broadcasting in the very narrow bandwidth of 6 MHz used in American terrestrial television. 

This short list of examples shows how, at the end of the 1980s, the telco, CE and broadcasting industries had already embarked in implementations, some of them at research level and some of an industrial value, that were based on digital technologies and provided products and services to end users with the intention of consolidating or extending their own positioning in the communication and media businesses. People who are fancied by the “convergence” idea have probably noticed that the computer industry is missing from the list of canonical “converging” industries. The reason is that, even if the computer industry has been the first to make massive use of data processing techniques, in the second half of the 1980s the computing machines within reach of end users – mostly Macintoshes and IBM compatible Personal Computers (PC) in the office, and Atari, Commodore 64 and Amiga at home – still needed at least one order of magnitude more processing power to be able to provide their users with natural moving video of acceptable size and natural sound of acceptable quality. In January 1988, the IBM representatives at the JPEG meeting proudly showed how an IBM AT could decode in real time a DCT-encoded still picture at the bitrate of 64 kbit/s. 

This snapshot describes what technology could achieve in the 1980s, but says nothing of the mindsets of the people who had masterminded those developments. Beyond the superficial commonality of technological solutions, there were and, to a considerable extent, still exist today, fundamental differences of traditions, strategies and regulatory concerns among the different industries and, within each of these industries, in the different countries or regions of the world. 

The telco industry placed great value in the existence of standard solutions, but was typically weak in end-user equipment. These were forced, on the one hand, to adhere to standards if they were intended to be connected to the “public network” and, on the other, had to be left to the goodwill of the manufacturing industry, because terminal equipment was outside the telcos’ purview. Manufacturers, however, had scarce inclination to invest in new terminal equipment because they were accustomed to receive guaranteed orders from operators at prices that could hardly be described as the result of fierce competition. The interest in digital solutions was linked to the digitisation prospects of the telephone network: basic-access ISDN (144 kbit/s) for the general user and at most primary-access ISDN (1.5 or 2Mbit/s) for the professional user, but there continued to be an underground clash between those who were driven by the need to foster the evolution of the network – here and now – and those who assumed that the world would stay unchanged and cherished the dream of bringing optical fibres to end users with virtually unlimited bandwidth some time in the future. 

The CE industry did not feel particularly constrained by the existence of standards, as shown by the adoption of 44.1 kHz as the sampling frequency of compact disc audio, selected because analogue video recorders provided an easy means to digitally encode audio in the early phases of development. That industry, however, had the culture and financial muscle to design and develop user equipment for mass-market deployment that often required sophisticated integrated circuits, when market needs so dictated. The weak point of that industry showed when equipment from several manufacturers that were functionally similar but technically incompatible appeared almost simultaneously in the market. Just the names V2000, Betamax and VHS, the three formats of home video cassette recorders and the battles that raged around them, should suffice to explain the concept. 

Even more complex was the attitude of the broadcasting industry. This was rigidly regulated in Europe and Japan and less visibly, but equally, if not more so, rigidly regulated in the USA. In Europe the Commission of the European Communities (CEC) had laid down the policy of evolution of television through Multiplexed Analogue Components (MAC) towards the European version of HDTV transmission called HD-MAC, both via satellite. In the USA and Japan the policy was one of evolution from analogue NTSC to analogue HDTV. In Japan, the introduction of HDTV was expected to happen via satellite while in the USA this was expected to happen as an evolution of the terrestrial network.