All posts by admin

MPEG-4 Inside – Audio

MPEG-4 Audio provides complete coverage of the bitrate range of 2 to 64 kbit/s. Good coded speech is obtained already at 2 kbit/s and transparent quality of monophonic music sampled at 48 kHz and 16 bits/sample is obtained at 64 kbit/s. Three classes of algorithms are used in the standard. The first covers the low bitrate range and has been designed to encode speech. The second can be used in the midrange to encode both speech and music. The third can be used in the high bitrate range and can be used for any audio signal.

MPEG-4 Audio contains a large set of coding tools through which it is possible to construct several audio and speech coding algorithms

  • MPEG-4 AAC, an extension of MPEG-2 AAC
  • Twin Vector Quantisation (VQ)
  • Speech coding based on Code Excited Linear Predictive (CELP) coding and on Parametric representation
  • Various usages of the Spatial Band Replication (SBR) technologies to provide high quality music at ever reduced bitrate, such as High Efficiency AAC (HE AAC)
  • Various forms of audio lossless coding.

 MPEG-4 AAC is MPEG-2 AAC with the addition of one tool: Perceptual Noise Substitution (PNS). This tool identifies and codes as random noise segments of spectral coefficients that appear to be noise-like. This is achieved by indicating that PNS is used and the value of the average power of the noise. A decoder uses a pseudo-random noise generator weighted by the signaled power value to reconstruct those coefficients.

AAC_general_model

Figure 1 – The MPEG-4 AAC encoder

The addition of Spectral Band Replication (SBR) MPEG-4 AAC has provided significant performance improvements. SBR consists of replicating the highband, i.e. the high frequency part of the spectrum. A small amount of data representing a parametric description of the highband is encoded and used in the decoding process. The data rate is by far below the data rate required when using conventional AAC coding of the highband.

SBR_model

Figure 2 – The SBR operating principle

A timeline of the development of the MPEG Audio standards is given in the figure below

AAC_timeline

Figure 3 – MPEG Audio development timeline

The MPEG-4 AA performance has been tested against a number of alternatives at different bitrates. the results are given in Figure 4.

MPEG-4_AAC_performance

Figure 4 – MPEG-4 AAC Verification Tests


MPEG-4 Inside – File Format

The MPEG File Format has been designed to satisfy a set of requirements, some of which are listed below

  • Binary assets
  • Hierarchical structure
  • Backward- & forward-compatible
  • Suitable to hold timed content
  • Suitable to exchange content
  • Self-contained (only contain data to be exchanged or all of them)
  • Content can extend on more than one file
  • Suitable for editing
  • Suitable for streaming
  • Can playback local file
  • Can download and play the file
  • Object oriented
  • Separated content and metadata

The figure below depicts the MP File hierarchy in a specific instance

FMPFile_hierarchyFigure 1 – an example of MP File hierarchy

The meaning of some the boxes are

  • ftyp (File Type): File type, version
  • mdat (Media Data): Holds media data (several, non contiguous possible)
  • moov (Movie): Holds metadata of a presentation
  • mvhd (Movie Header): General info about the movie
  • trak (Track): Holds metadata related to one stream
  • hdlr (Handler): Stream type
  • dinf/dref (Data Information/Data Reference): Data location (this or remote file)
  • stbl (Sample Table): Holds metadata related to samples, sample by sample
  • stsd (Sample Description): decoder configuration for the elementary stream
  • stts (Sample To Time): DTS for each sample
  • stsz (Sample To Size): Size of each sample

MPEG-4 Inside – Advanced Video Coding

During the development of MPEG-4, several liaison statements were sent to ITU-T suggesting to work together on the new MPEG-4 Visual standard and even on the MPEG-4 Audio standard, specifically on the speech coding part. As no responses were received to these offers MPEG continued the development of MPEG-4 alone.

For several years the Video Coding Experts Group (VCEG) of Study Group 16 of ITU-T worked from the ground up on the development of new video compression technologies and achieved a breakthrough in compression performance around the turn of the century. In spite of the lack of official answers to our liaison statements from ITU-T, I decided that it would be in the industry interest to establish a working relationship. When Thomas Sikora left MPEG I appointed Gary Sullivan, VCEG rapporteur, as MPEG Video chair to achieve a belated convergence of ITU-T and MPEG efforts in the video coding area.

At the July 2001 meeting MPEG reviewed the results of video compression viewing tests designed to assess whether there was evidence for advances in video coding technology that warranted the start of a new video coding project. With the positive result of the review, a Call for Proposals was issued and in December a Joint Video Team (JVT) composed of MPEG and VCEG members was established. The objective of the JVT was similar to the one that had been established for MPEG-2. The two main differences were that the JVT would only work on video compression and that the ISO/IEC standard and the ITU recommendation would only be “technically aligned” and not “common text”.

With an intense schedule of meetings, the JVT managed to achieve the Final Draft International Standard stage of the new Advanced Video Coding (AVC) standard in March 2003. So AVC became part 10 of MPEG-4.

The AVC standard specifies

  1. A video coding layer (VCL) for efficient representation of the video content
  2. A network abstraction layer (NAL) to format the VCL representation and provide header information (Fig. 1).

The most important control data are

  1. The sequence parameter set (SPS) that applies to an entire series of coded pictures
  2. The picture parameter set (PPS) that applies to one or more individual pictures within such a series.

AVC_layers

Figure 1 – The AVC layers

AVC is based on the so-called “block based hybrid video coding” approach where a coded video sequence consists of an independently-coded sequence of coded pictures.

  1. The VCL data for each picture is encoded with a reference to its PPS header data, and the PPS data contains a reference to the SPS header data
  2. A more sophisticated intra-picture prediction exploiting dependencies between spatially-neighbouring blocks within the same picture, in addition to motion-compensated inter-picture prediction
  3. Spatial block transform coding exploiting the remaining spatial statistical dependencies within the prediction residual signal for a block region (both for inter- and intra-picture prediction)
  4. Quarter-sample accuracy and high-quality interpolation filters for motion compensation of the luma component blocks
  5. Transformation (integer approximation of DCT to avoid drift between encoder and decoder picture representations) is defined for block sizes of 4×4 or 8×8
  6. Basic building blocks of the encoding process are macroblocks of size 16X16 down to to 4X4, e.g., for motion compensation (MC) and linear transformation
  7. Adaptive de-blocking filter in the prediction loop
  8. References for prediction of any macroblock from one of up to F previously decoded pictures
  9. A picture may be split into one or several slices, i.e. sequences of macroblocks which are typically processed in the order of a raster scan
  10. Slice-wise definition of B-type, P-type, and I-type pictures
  11. Two different entropy coding mechanism
    1. Context-Adaptive VLC (CAVLC)
    2. Context-Adaptive Binary Arithmetic Coding (CABAC)
  12. The hypothetical reference decoder (HRD) specifies how/when bits are fed to a decoder and how decoded pictures are removed from a decoder
  13. Supplemental Enhancement Information (SEI) is made available to a decoder in addition to video data

Figure 2 shows the innovation broght about by point 6. of the list above

AVC_variable macroblock size benefits_of_AVC_variable_block_sizer

Figure 2 – Variable macroblock size in AVC

Figure 3 shows the innovation implied by point 8. in the list above

AVC_Multiple_reference_frames

Figure 3 – Multiple reference frames in AVC

Several profiles have been defined, some of which are

  • Baseline, Main, and Extended Profiles primarily for applications of  “entertainment-quality” video, based on 8-bits/sample, and 4:2:0 chroma sampling
  • Full range extensions (FRExt) for applications such as content-contribution, content-distribution, studio editing and post-processing
  • Professional quality (PQ) extensions for applications requiring 4:4:4 color sampling and more than 10 bit/sample

The AVC endeavour kept its promise of reducing by half the performance of MPEG-2 Video.

At the same meeting the JVT was established, MPEG started an investigation in video scalability that eventually led to the development of requirements for Scalable Video Coding (SVC). In a nutshell these imply that the encoded video stream should be structured into a base layer stream, decodable by a non-scalable decoder and one or more enhancement layer stream(s) decodable aware of the SVC standard syntax. A Call for Proposals was issued and this work item, too, was entrusted to the JVT’.

The enhancement layer(s) defined by the SVC standard can actually work on top of any base layer video coding standard. 

SVC is based on a layered representation with multiple dependencies. Frame hierarchies are needed to achieve temporal scalability so that frames that are not used as references for prediction of layers that are still present can be skipped, as indicated in Figure 4 where pictures marked as “B3” can be removed to reduce the frame rate by a factor of 3, and by removing those marked “B2” the frame rate is reduced by a factor of 2 etc.

svc_frame_hierarchy

Figure 4 – SVC frame hierarchy

SVC offers a high degree of flexibility in terms of scalability dimensions, e.g. it supports various temporal/spatial resolutions, Signal-to-Noise (SNR)/fidelity levels and global/local Region of Interest (ROI) access). SVC performs significantly better and is much more flexible in terms of number of layers and combination of scalable modes than the scalable version of MPEG-2 Video and MPEG-4 Visual, while the penalty in compression performance, as compared to single-layer coding, is almost negligible.

For the purpose of spatial scalability, the video is first downsampled to the required spatial resolution(s). The ratio between frame heights/widths of the respective resolutions does not need to be dyadic (factor of two). Encoding as well as decoding starts at the lowest resolution, where an AVC compatible “base layer” bitstream is typically used. For the respective next-higher “enhancement layer”, three decoded component types are used for inter-layer prediction from the lower layer:

  • Up-sampled intra-coded macroblocks;
  • Motion and mode information (aligned/stretched according to image size ratios);
  • Up-sampled residual signal in case of inter-coded macroblocks.

SVC_block_diagram

Figure 5 – SVC block diagram

The seminal work on Multiview Video Coding carried out in MPEG-2 Video was extended for MPEG-4 Visual. In AVC Multiview Video Coding further work was done to improve coding of multiview video. The overall structure of MVC defining the interfaces is illustrated in Fig. 6.

MVC_model

Figure 6 – MVC model

The encoder receives N temporally synchronized video streams and generates one bitstream. The decoder receives the bitstream, decodes and outputs N Video signals that can be used for different purposes: to generate 1 view or N views of a stereo view.

Prediction across views, as shown in Fig. 7, is used to exploit inter-camera redundancy with the limitation that inter-view prediction is only effected from the same time instance and cannot exceed the maximum number of stored reference pictures

MVC_prediction
Figure 7 – MVC prediction

The base view is independent of any other view and is AVC compatible that can be extracted to provide a compatible 2D representation of the 3D version.


The Impact Of MPEG-4

At the MPEG Vancouver meeting in July 1999 it was realised that, while interest in MPEG-4 remained high and diffuse, apparently there was no industry or company that was willing to take on the role that Cable Labs had taken for MPEG-2, a result of the truly generic nature of the MPEG-4 standard. It was then found necessary to make another step to help kickstart MPEG-4 adoption in the marketplace. Rob Koenen, the Requirements Chair, took the task on himself and, starting from the brainstorming session at the Vancouver meeting, he led the discussions among interested people that eventually led to the establishment of the MPEG-4 Industry Forum (M4IF), again c/o Me Jacquemmoud in Geneva, in May 2000 with the goal

to further the adoption of the MPEG-4 Standard, by establishing MPEG-4 as an accepted and widely used standard among application developers, service providers, content creators and end users.

I was one of those signing the statutes (the others were Rob Koenen and Takuyo Kogure), but decided that this time I would take no official role and be just a supporter of the initiative from the outside.

At the Maui, HI meeting, the Saturday after the MPEG meeting in December 1999, M4IF kicked off the activity that eventually led to the licensing of MPEG-4 Video, Systems and Audio (in the order of publication). Working Groups were set up to discuss how an effective licensing environment could be created.

At the Amsterdam meeting in March 2000, the Saturday after the MPEG meeting in Noordwijkerhout, the M4IF Statutes (again an adaptation of the DAVIC Statutes) were approved and an announcement was issued that a patent pool would be initiated, inviting those who believed they held essential patents to submit their claims to an evaluator. At the July 2000 meeting in Newark, work on self certification and M4IF Logo procedures was initiated. At the Paris meeting in October the first discussions on interoperability tests between products were made.

While MPEG-4 was being developed, the world was living through one of its greatest – and bloodless – revolutions. The ability to send email and post HTML pages triggered the demand for ever-increasing transmission bitrates made possible by faster telephony modems, ISDN, ADSL and Cable Modem. When the bitrate began to reach a few tens of kbit/s, it began possible to start offering some audio and video in streaming mode, albeit with a reduced picture size. Some of the companies that first tried this have created a strong brand. At the end of 2000 a new industry consortium called Internet Streaming Media Association (ISMA) was established with the goal

to accelerate the adoption of open standards for streaming rich media – video, audio, and associated data – over the Internet.

Pretty soon MPEG LA took over the MPEG-4 Visual licensing and eventually produced a licensing scheme. The licence is quite elaborate and this page cannot represent all the legal subtleties. Roughly the licence distinguishes between licensing encoders and decoders and licensing of encoders and decoders use.

Encoders and decoders
Encoder and decoder use
  • Right to use by an end user only for encoding/decoding video transmitted to/by another end user
  • 0.25 $ per unit
    • annual cap of 1M$
    • no charge on first 50k$ units/year
  • Royalty: functioning product
  • Licensee: functioning product manufacturer
  • Options
    • 0.25$/subscriber subject to 1M$ annual cap or
    • 0.02$/hour subject to 1M$ annual cap or
    • Paid-up 1M$ annual license (w/o reporting)
  • First 50k$ subscribers/year = no charge
  • Royalty: only when content is offered for remuneration
  • Licensee: video provider to end-user

Again those interested in the actual access to MPEG-4 licensing should consult the licence. It can be seen that traditional device-based business models are well supported – and with a reduction of an order of magnitude of the unit fee. However, the licence includes new content streaming models with royalties based on the time paid-for content is streamed. This scheme was widely objected to and contributed to the sporadic use of MPEG-4 Visual for paid-for streaming services as opposed to other proprietary streaming services.

In the meantime the mobile telco industry was getting ready for a major overhaul of the service offer that was to be based on so-called Third Generation (3G) mobile. An international consortium called 3rd Generation Partnership Program (3GPP) was established that developed all the specifications needed. A parallel but formally unrelated group called 3GPP-2 has also been established with a similar objective.

Dolby enlarged its original MPEG-2 AAC licence to include MPEG-4 AAC.

MPEG-4 has played an important role in the new environment created by the two main forces that drove the development of the standard: digital networks and IT:

  • The World DMB Forum uses a profile of BIFS as composition technology to add other media to digital radio.
  • DivX, a company started by two French students, developed effective MPEG-4 Visual encoders and decoders that can be freely downloaded from the web. Audio is typically MP3. The company eventually developed a business model based on licensing MPEG-4 Visual implementations whose conformance had been tested by the company.
  • Several versions of AAC are also in 3G cellphones to enjoy various forms of audio services.
  • Apple selected AAC for its iTune service and MPEG-4
  • AAC is also used in two digital radio services, XM and Digital Radio Mondiale (DRM) that are different from DAB.
  • The MPEG File Format is universally adopted: 3G cellphones have mobile person-to-person video communication and with the additional capability to capture, download and play, stream and store (MPEG-4 File Format)
  • 3GPP adopted and slightly changed the MP4 file format. This format is also supported by 3G cellphones.
  • Open Font Format is widely used by the industry.

Bits And Bytes

A very general way of classifying information is based on its being structured or unstructured, i.e. on its appearing or being organised according to certain rules. Characters are a special form of structured information because they are the result of mental processes that imply a significant amount of rationalisation. Characters have been one of the first types of structured information that computers have been taught how to deal with. Represented as one (or more than one) byte, characters can be easily handled by byte-oriented processors. 

At the beginning, characters were confined within individual machines because communication between computers was still a remote need. In those days, memory was at a premium and the possibility of a 70% saving, thanks to the typical entropy value of structured text (a value that is usually obtained when zipping a text file), should have suggested more efficient forms of character representation. This did not happen, because the processing required to convert text from an uncompressed to a compressed form, and vice versa, and to manage the resulting variable length impacted on a possibly even scarcer real estate: CPU execution. In the early days of the internet the need to transmit characters over telecommunication links arose but, again, the management of Variable Length Coding information was a deterrent. SGML, a standard developed in the mid 1980s, was designed as a strictly character-based standard. The same was done for XML more than 10 years later, even though progress of technology could have allowed bolder design assumptions. 

At an appropriate distance a picture of the Mato Grosso or the sound coming from a crowd of people can be considered as unstructured. Depending, however, on the level at which it is assessed, information can become structured: the picture of an individual tree of the Mato Grosso or an individual voice coming from a crowd of people do represent structured information. 

As humans are good at understanding highly complex structured information, one of the first dreams of Computer Science (CS) was endowing computers with the ability to process information in picture or sound signals in a way that is comparable to humans’ abilities. Teaching computers to understand that information proved to be a hard task. 

Signal Processing (SP) has traditionally dealt with (mostly) digital pictures and audio (other signals like seigmograms, too, have been given a lot of attention for obvious reasons). Some Computer Science tools have also been used to make progress in this field of endeavour. In contrast with CS, however, the SP community had to deal with huge amounts of information: about 4 Mbit for a digitised sheet of paper, at least 64 kbit/s for digitised speech, about 1.5 Mbit/s for digitised stereo sound, more than 200 Mbit/s for digital video, etc. To be practically usable, these large amounts of bits had to be reduced to more manageable values and this required sophisticated coding (compression) algorithms that removed irrelevant or dispensable information.

In the early days, transmission was the main target application and these complex algorithms could only be implemented using special circuits. For the SP community the only thing that mattered were bits, and not just “static” bits, as in computer “files”, but “dynamic” bits, as indicated by the word “bitstream”. 

It is worth revisiting the basics of picture and audio compression algorithms keeping both the SP and the CS perspectives in mind. Let’s consider a set of video samples and let’s apply an algorithm – e.g. MPEG-1 Video – to convert them into one or more sets of variables. We take a macroblock of 2×2 blocks of 8×8 pixels each and we calculate one motion vector, we add some other information such as position of the block, etc. and then we apply a DCT to each (differences of) 8×8 pixel blocks. If we were to use the CS approach, all these variables would be tagged with long strings of characters (remember the verbose VRML) and the result would probably be more bits than before. In the SP approach, however, the syntax and semantics of the bitstream containing VLC-coded Motion Vecors and DCT coefficients is standardised. This means that the mapping between the binary representation and the PCM values at the output is standardised, while the equivalent of the “tagged” representation that exists in a decoder implementation is not. 

The CS community did not change its attitude when the WWW was invented. In spite of the low bitrate of the communication links used in the early years of the web, transmission of an HTML page was (and still is) done by transmitting the characters in the file. Pictures are sent using JPEG for photo-like pictures and GIF/PNG for graphic pictures (even though many authors do not understand the different purposes of the formats and use one format for the other purpose). JPEG has been created by the SP community and applies compression; Graphics Interchange Format (GIF) has been created by the CS community and only applies mild lossless compression. Portable Network Graphics (PNG), a picture representation standard developed by the W3C, applies little compression, probably because users are expected not to be concerned with their telephone bills, as PNG is said to be patent-free.

This means that the world has been misusing the telecommunication infrastructure for years by sending about three times more information than should have been strictly needed using a simple text compression algorithm. Not that the telco industry should complain about it – the importance of its role is still somehow measured by how many bits it carries – but one can see how resources can be misused because of a specific philosophical approach to problems. The transmission of web pages over the “wireless” internet used to be of interest to the telcos because bandwidth used to be narrow and for some time there have been companies offering solutions that allowed sending of “compressed” web pages to mobile phones. Each of the solutions proposed was proprietary, without any move to create a single standard, so when the hype subsided (because available bandwidth improved) those incompatible solutions melted like snow in a hot day. 

A similar case can be made for VRML worlds. Since these worlds are mostly made of synthetic pictures, the development of VRML was largely made by representatives of the CS community. Originally it was assumed that VRML worlds would be represented by “files” stored in computers, so the standard was designed in the typical CS fashion, i.e. by using text for the tags, and long text at that because the file “had to be human readable”. The result has been that VRML files are exceedingly large and their transmission over the Internet has not been practical for a long time. The long downloading time of VRML files, even for simple worlds, was one of the reasons for the slow take off of VRML. Instead of working on reducing the size of the file by abandoning the text-based representation, so unsuitable in a bandwidth-constrained network environment, the Web3D Consortium (as VRLM is now called) has developed a new XML-based representation of its VRML 97 standard, which keeps on using long XML tags. I fail to understand why a long XML tag should be any better than a long non-XML tag. 

One of the first things MPEG, with its strong SP background, did when it used VRML 97 as the basis of its scene description technology, was to develop a binary representation of the VRML textual representation. The same was done for Face and Body Animation (FBA) and indeed this part of the MPEG-4 standard offers another good case to study the different behaviour of the CS and SP communities. As an MPEG-4 synthetic face is animated by the value of 66 parametres, a transmission scheme invented in the CS world could code the FAPs somewhat like this:

(type joy, intensity 7), viseme (type o, intensity 5), open_jaw (intensity 10), …, stretch_l_nose (intensity 1), …, pull_r_ear (intensity 2) 

where the semantics of expression, joy, etc. is normative. Transmission would simply be effected by sending the characters exactly as they are written above. If the coding is done in the SP world, however, the first concern is to try and reduce the number of bits. One possibility is to group the 66 FAPs in 10 groups, where the state of each group is represented with 2 bits, e.g. with the following meaning: 

00  no FAP transmitted
11  all FAPs in the groups are transmitted with an arithmetic coder
01 some FAPs, signalled with a submask, transmitted with an arithmetic coder
10  some FAPs, signalled with a submask, interpolated from actually transmitted FAPs

Another interesting example of the efforts made by MPEG to build bridges between the CS and SP worlds is given by XMT. Proposed by Michelle Kim then with IBM Research, XMT provides an XML-based textual representation of the MPEG-4 binary composition technology. As described in Figure 1 this representation accommodates substantial portions of SMIL and Simple Vector Graphics (SVG) of W3C, and X3D (the new format of VRML). Such a representation can be directly played back by a SMIL or VRML player, but can also be binarised to a become a native MPEG-4 representation that can be played by an MPEG-4 player.

 

xmt

Figure 1 – XMT framework

As reported later in the MPEG-7 portion, BiM is another bridge that has been created between the character-based and the binary world. BiM has been designed as a binariser of an XML file to allow bit-efficient representation of Descriptors and Description Schemes, but can also be used to binarise any XML file. For this reason why it has been moved to MPEG-B part 1.


Open Source Software

If art is defined as something that people take pleasure in producing, that contains something that different people agree is intrinsically beautiful and, therefore, connoisseurs can admire and talk about, some pieces of software could be defined as pieces of art. As for all arts, you have the practitioners – those who produce an artistic piece just because they like to do that and take pleasure in seeing others give nods of approval – and the professionals – those who do the same in hope or promise of receiving a reward. 

In this context there is probably nothing more emblematic than the great project that built the internet. You could see people taking pleasure in inventing protocols, debating the merits of their own ideas, and implementing them in some smart piece of software knowing other colleagues would look at it and find other smart ideas on how complex processing could be expressed with a handful of computer instructions. Add to it the sense of fulfillment that came with the idea of contributing to building the communication infrastructure of tomorrow serving the whole of mankind. Finally also add that all this happened largely in the academic environment where people worked because they had a contract and that performance review, at the time the contract work was due, just had to check how smart protocol ideas and software implementations had been. These few hints suggest how the world that produced the internet can be considered as a sort of realisation of Plato’s political ideas. 

On another occasion I have compared that memorable venture – absit iniuria verbis – with the work that my ancestors who lived in the lower parts of the Western Alps near the city of Turin had done when they cobble-stoned the paths criss-crossing the mountains behind their houses. They did that with the same zeal and sense of duty because it was more comfortable for everybody to have cobble-stoned paths instead of leaving them in the state in which the steps of millions of passengers had molded them. Everybody tried to do his best in the hope that others would admire the quality of the work done. The only difference may be that doing that work was probably not the free decision of those mountain dwellers but more due to the local communal authority that imposed corvées on them during winter when work in the fields was minimal. 

Next to the world of people taking pleasure in writing smart computer code, however, there were other people who just dealt with this particular form of art in a more traditional way. In the early 1960s, computer manufacturers freely distributed computer programs with mainframe hardware. This was not done because those manufacturers did not value their software, but because those programs could only operate on the specific computer platform the software went with. Already in the late 1960s, manufacturers had begun to distribute their software separately from the hardware. The software was copyrighted and “license”, instead of “sale”, was the legal form under which the developer of the software entitled other to use his product. 

Copyrighted software is the type of software for which the author retains the right to control the program’s use and distribution, not unlike what authors and/or publishers have done for centuries with their books. In the 1970s “public domain” software in source code came to the stage. This type of software has the exact opposite status of copyrighted software because, by putting the software in the public domain, the rights holder gives up ownership and the original rights holder has no say on what other people do with the software. In the rest of this page I will mention several categories of software. Of course this is a very complex subject and interested eaders should get more extended treatment of the subject.

Since early internet times, access to User Groups and Bulletin Board Systems (BBS) was already possible. These groups passed around software for which the programmers did not expect to be paid, either because the program was small, or the authors offered no support or for other reasons such as because the author just wanted the rest of the world to see how smart he was or how good he was in donating something that benefited other people. One should not be too surprised if, besides public domain software, such user groups and BBSs passed around some pirated commercial software. 

In 1982, two major programs for the IBM PC were published: a communication program called PC-Talk by Andrew Fluegelman and a database program called PC File by Jim Knopf. Even though these were substantial programs, the authors decided to spontaneous distribution, instead of marketing the programs through normal commercial channels. Users could freely copy their programs, but were reminded that if they wanted the authors to be motivated to continue producing more valuable software for others to freely use, they should send money to the authors. 

Fluegelman called this new software distribution method “Freeware” and trademarked the name. However, Fluegelman did little to continue to develop and promote PC-Talk because he lost control over PC-Talk source code when several “improved” versions of the program appeared. Unlike Fluegelman, Knopf succeeded in building a multi-million dollar database company with his PC-File. This idea set a pattern that others followed, e.g. Bob Wallace who developed a successful business with his PC-Write, a word processing program that was free to try, but required a payment if the user continued to use it. These three major applications became popular with major businesses and established the credibility of Freeware as a source of high quality, well-supported software. As the name “Freeware” had been trademarked, the user community settled on “shareware“, the name used by Bob Wallace for his PC-Write. 

In 1984, Richard Stallman formed the Free Software Foundation to promote the idea of “Free Software”. With the help of lawyers he developed the Gnu’s Not Unix (GNU) General Public License (GPL) and called the licence “copyleft”. This allows use and further development of the software is available by others. GNU followers, starting from the founder of the movement, like to say that GNU software is free. This adjective does not represent unambiguous categories – not just the well-known ambiguity of the English language between “gratis” and “freedom” – and therefore I will refrain from using it. I will use instead the term “GNU licence” because, even though most people would agree that the GNU licence gives users “more” rights than, say, Microsoft gives Word program users, the GNU license is by no means “unrestricted” as the unqualified word “free” means to me. 

In summary the rights are the freedom to:

  1. Distribute copies of the software
  2. Receive the software or get it
  3. Change the software or use pieces of it in new programs

The obligations are to:

  1. Give another recipient all the rights acquired
  2. Make sure that recipient is able to receive or get the software.

There is no warranty for GNU license software and if the software is modified, a recipient must be made aware that it is a modification. Finally any patent required to operate the software must be licensed to everybody. 

A justification of the Open Source Software (OSS) approach is that it is a more effective way of developing software. If software is Open Source, this is the thesis, its evolution is facilitated because programmers can improve it, make adaptations, fix bugs, etc. All this can happen at a speed and effectiveness that conventional software developed in corporate environments cannot match. 

The Open Source Initiative (OSI), a California not-for-profit corporation, has produced an Open Source Definition according to which OSS does not just mean access to the source code, but also that the distribution terms must comply with a set of general criteria. These are summarised below with the intention of providing a general overview of the approach. Interested readers are advised to study the official document. 

  1. Free Redistribution. There should be no restriction to sell or give away the software as a component of a complete package containing programs from other sources and there should be no royalty or other fee for such sale. 
  2. Source Code. The program must include source code (or a means of obtaining it at little or no cost) and it must be possible to distribute it in both source code and compiled form. However, distribution of deliberately obfuscated source code, or in an intermediate form such as the output of a preprocessor or translator, is not allowed. 
  3. Derived Works. Modifications and derived works must be possible and they can be distributed under the same terms as the license of the original software. 
  4. Integrity of The Author’s Source Code. It must be possible to have non-modifiable source-code if the license allows the distribution of “patch files” by means of which modifications at build time are possible. It must be possible to distribute software built from modified source code. Derived works may carry a different name or version number from the original software. 
  5. No Discrimination Against Persons or Groups. Distribution should not be based on the fact that one is a specific persons or belongs to a specific group.
  6. No Discrimination Against Fields of Endeavour. Distribution should not be based on the specific intended use.
  7. Distribution of Licence. A recipient should require no other license to use the software. 
  8. Licence Must Not Be Specific to a Product. It should not be possible to tie use of the software to a specific product.
  9. Licence Must Not Contaminate Other Software. The licence must not place restrictions on other software that is distributed along with the licensed software, e.g. there should not be an obligation for the other software to be open-source. 

An interesting case combining collaborative software development and standardisation in the context of a proprietary environment is the Java Community Process (JCP) started by Sun Microsystems in 1998 and revised in 2000. The Community is formed by companies, organisations or individuals who have signed the Java Specification Agreement (JSPA), which is legally created by an agreement between each member of the Community and Sun (now Oracle) that sets out rights and obligations of a member participating in the development of Java technology specifications in the JCP. 

Below is a brief description given for the purpose of understanding the spirit of the process. Those interested in knowing more about this environment are invited to study the JSPA. The process works through the following steps:

  1. Initiation when a specification is initiated by one or more community members and approved for development by an Executive Committee (EC) – there are two such ECs targeting different Java markets – as a Java Specification Request (JSR). Oracle, ten ratified members and five members elected for 3 years hold seats in the ECs. Once the JSR is approved, an Expert Group (EG) is formed. JCP members may nominate an expert to serve on the EG that develops the specification. 
  2. The first draft produced by the EG, called Community Draft, is made available for review by the Community. At the end of the review, the EC decides if the draft should proceed to the next step as a Public Draft. During this phase, the EC can preview the licensing and business terms. This Public Draft is posted on a web site, openly accessible and reviewed by anyone. The Expert Group further revises the document using the feedback received by anyone. 
  3. Eventually the EC approves the document. However, for that to happen there must be a reference implementation and an associated Technology Compatibility Kit (TCK) – what MPEG would call conformance testing bitstreams. A TCK must test all aspects of a specification that impact how compatible an implementation of that specification would be, such as the public API and all mandatory elements of the specification. 
  4. The specification, reference implementation, and TCK will normally be updated in response to requests for clarification, interpretation, enhancements and revisions. This Maintenance process is managed by the EC who reviews proposed changes to a specification and indicates those that can be carried out immediately and those that will require a revision by an EG. 

Aside from considering some elements that are specific to the Java environment, the process described is very similar to the process that MPEG has been following since MPEG-4 times.

Originally Microsoft took quite a strong position vis-à-vis OSS, claiming that OSS contains elements that undermine software companies’ business. Some of the reasons put forth are the possibility to have “forking”, i.e. the split of the code base into separate directions enabled by the possibility for anybody to modify a piece of OSS, and the risk of contaminating employees working in non-OSS companies.

As an alternative, Microsoft proposes a different approach that it calls Shared-Source Software (SSS) which is expected to encourage commercial software companies to interact with the public, and to allow them to contribute to open technology standards without losing control of their software. This basically means that the company is ready to license, possibly at no charge, some parts of its software to selected entities, such as universities for research and educational purposes, or Original Equipment Manufacturers (OEM) to assist in the development and support of their products.


MPEG And Open Source Software

I have already mentioned that MPEG had to deal with the software aspects of its work from very early on. The first MPEG-1 Video Simulation Model, fully assembled at the Tampa meeting in March 1990, was still a traditional textual description but, at the first Santa Clara, CA meeting in August 1990, the group started complementing the text of the standard with pseudo C-code. For people accustomed to write computer programs describing the operations performed by a codec with this means of expression was often more natural and effective than words. 

In MPEG-1 and MPEG-2 times active participants individually developed and maintained their own simulation software. Some time later, as has been reported, the decision was made to develop reference software, i.e. a software implementation of the MPEG-1 standard. 

Seen with the eyes of a software developer, the process of standard development in MPEG-1 and MPEG-2 times was rather awkward, because the – temporally overlapping – sequence of steps was: 

  1. Produce a textual description of the standard 
  2. Translate the text to the Simulation Model software
  3. Optimise the software
  4. Translate the software back to text/pseudo C-code. 

But with MPEG-4 a new world came to the fore where the information technology mind set was in the driver\s seat. Software was no longer just a tool to develop the standard; it was becoming the tool to make many (even though not all) products based on the standard. Therefore a reversal of priorities was required because the standard in textual form was still needed – a traditional method of expressing standards cannot be changed overnight – but for many users the standard expressed in a programming language was considered as the real reference. This applied not just to those making software implementations, but also to those making more traditional hardware-based products and VLSI designs. Therefore it was decided that the software version of the standard should have the same normative status as the textual part. This decision has been maintained in alla subsequent MPEG standards.

Quite independently from the formalisation of the OSS rules that were already taking place in the general IT world, as recalled here, MPEG made the decision to develop the reference software collaboratively because

  1. Better software would be obtained
  2. The scope of MPEG-4 was so large that probably no company could afford to develop the complete software implementation of the standard
  3. A software implementation made available to the industry would accelerate adoption of the standard.

Finally, a standard that had two different forms of expression would have improved quality because the removal of an ambiguity from one form of expression would help clarify possible ambiguities in the other. 

I certainly do not claim that MPEG has followed the OSI rules that define a piece of software as OSS. I am just saying that even within a peculiar environment like an SDO, driven by an industrial mindset, collaborative software development naturally ends up obeying similar rules. These are the first rules set by MPEG

  1. Every normative (decoder) and informative (encoder) component of the standard has to be implemented in software, except for external parts such as tools for shape extraction
  2. Whoever makes a proposal that is accepted must provide a software implementation and assign the copyright of the code to ISO
  3. ISO grants a license of the copyright of the code for products conforming to the standard
  4. Release of patents required to exercise the code is not required and users should not expect that the copyright release includes a patent licence. 

For each portion of the standard, a manager of Core Experiments was also appointed. This manager integrated the code of the accepted tools in the existing code base. The following companies and organisations appointed a representative for the corresponding portions of the MPEG-4 standard. 

Company  Portion of MPEG-4
Apple  MP4 File Format
ETRI  Text-to-Speech Interface
Fraunhofer  Natural audio
Microsoft Video code in C++
MIT  Structured Audio
MoMuSys  Video code in C
Optibase  Core
Sun MPEG-J

 

In the table, “Core” is the portion of code on which all media decoders and other components plug in.

Unlike traditional OSS projects, only MPEG members can participate in the project, a consequence of ISO rules related to standards development. Discussions, however, are usually done on email reflectors that are open to anybody. 

The so-called “copyright disclaimer” that is found on all the original MPEG-4 software modules establishes the following points: 

  1. The role of the original developer and subsequent contributors to the software module
  2. The status of the software module as an implementation of a part of one or more MPEG standard
  3. The ability of users to obtain a free license from ISO/IEC to use the module or modifications of it in products claiming conformance to the MPEG standard
  4. warning to users that use of the module may infringe existing patents
  5. A liability disclaimer for developers, contributors, their companies and ISO/IEC for use of a software module or its modifications
  6. No release of copyright for non MPEG standard conforming products
  7. Full rights of the original developer to
    1. Use the code for its own purpose
    2. Assign or donate the code to a third party
    3. Inhibit third parties from using the code for products that do not conform to the MPEG standard
  8. Inclusion of the copyright notice in all copies or derivative works. 

Of late some objections have been raised to the clause that MPEG-4 reference software should be licenseable only for “conforming products”. The first objection, the same that is made to OSS in general, is that access to the reference software may contaminate employees of a software company. The second objection is that the reference software of a standard should not have any restriction, e.g. it should be in the public domain. The response to this second objection is that an SDO is in the business of promoting its own standards, not to promote the public domain software philosophy in general, much less competing standards. Besides, having that clause prevents forking, a concern for software companies as we have seen before. 

The joint project with ITU-T that led to the Advanced Video Coding (AVC) standard has prompted some changes to the reference software policy. The first is the possibility for the original developer of the code to simply donate the code without possibly identifying himself. The second is the possibility to reuse the code in other ISO/IEC and ITU standards. The latter covers the obvious case where, say, the DCT transform code can be reused in other standards that make use of the DCT. 

The MPEG OSS approach has been extended to two other cases. One case was motivated by the fact that, while the reference software is intended to be “reference” (normative or informative as the case may be), it is not necessarily intended to be efficient. Therefore since December 1999 MPEG has been working on MPEG-4 part 7 that contains optimised code, e.g. software for optimised motion vector search, a computationally very expensive part of an encoder implementation, where savings of more than one order of magnitude can easily be achieved. Any implementer can take this code and use it under the same conditions as the rest of the reference software, with the added condition that optimised code should not require the use of patents in addition to those already existing for the standard reference software. 

More recently MPEG has started using a slightly modified version of the Berkeley Software Distribution (BSD), a licence originally used to distribute a Unix-like operating system. This simply says code may be used for anything with the usual disclaimer that patents are not released.This new licence is particularly interesting for software companies that do not want to have liaibilities when using software not developed internally.


The Communication Workflow

After talking about one scourge of the hacker community – software licensing – it is now time to say more about another scourge – patent licensing. Instead of discussing the merits or demerits of patent licensing forms, I would like to make some considerations on the entire workflow that underpins the creation of new communication forms. 

Inventions start from, well, people – inventive people, I mean. Most Public Authorities understand that, today more than ever, inventions are key ingredients of the wealth of nations and try to play a role in the creation of environments that are conducive to inventions and they typically invest in education and dissemination of information. A good example is the mechanism, emulated by other countries and adapted to their culture, that the US government set up with the Bell Labs, where ATT had to divert a fixed proportion of its revenues to fund research. The effects of this policy were outstanding: besides creating what used to be the best telephone system in the world, an endless string of major inventions was created that benefited the well being and further development of the entire country and the world at large.

While in the past individuals usually created inventions, today it is more common that companies hire capable individuals and give them the task of improving or innovating communication forms. Companies tend to become closed ecosystems – not a good recipe to foster inventiveness – but this inward-looking tendency is compensated by the centrifugal action of professional and scientific societies which play the role of creating neutral environments where individuals can exchange opinions and experiences, possibly leading to new ideas or inventions. In Europe this role of professional societies is somehow offset, or maybe compensated, by R&D projects funded by the European Commission. In these environments participants are freer to exchange ideas because those project are regulated by contracts with specific IPR clauses.

Good inventions may be turned into patents. They are obviously the starting point for a company wishing to innovate its communication products or services. At this point the traditional behaviour of industries used to depart. 

The traditional IT and CE approach was to leverage on these patents and make new proprietary products or services. This approach used to work well because, at that time, new communication devices were largely “stand alone”: people could buy a Betamax recorder and have a large selection of movies in that format as much as users of VHS had a comparable range of titles to choose from. The problem only arose when a cassette with one’s daughter birthday recorded in Betamax was sent to auntie with a VHS recorder. After a more or less long period of time, with more or less hassle created to users, the eventual “industry standard” would be settled and, possibly, would get a formal standard “stamp” from ISO or, more usually, from IEC.  The telecommunication and broadcasting industries went through incredible birth pains in their early years and this convinced government that the exploitation of inventions in these domains had to go through a formal standardisation process. For good measure, the specifications of the resulting service, technology included, was even converted into law. 

In either case, the SDO requested that the company holding patent rights would license the patents at RAND conditions. Everybody could then start manufacturing the new device or offering the new service and the rights holder or an agent would manage the licensing. If there were more than one company holding patent rights, usually one of the companies would act as licensing authority on behalf of the others. 

This modus operandi no longer suited the industries concerned. Even if a company had some smart invention, it was hard to convert it into a successful product because of the need to get the involvement of too many technologies from too many companies from too many industries, not to mention the fact that a proprietary device had still a hard time to get accepted, unless the company had a virtual monopoly in a particular field. A simple example – DVD – shows that this model was no longer sufficient. Philips and Sony had a technically very good solution but the other (Toshiba) camp had enrolled the support of a range of movie companies and they won the day. For a long time the same two camps fought for the next generation package media, each camp enrolling movie companies and movie companies switching side until the Blu-Ray standard emerged. This is one of the reasons behind the – now outmoded – trend to build huge conglomerates where most if not all the technology and content components required for launching new products or services were in house. It then follows as a corollary that these conglomerates tend to create walled gardens to keep users in. 

The MPEG process provided a different approach to this problem. Again we start from companies investing in R&D and making inventions that are patented. In the technical area covered by MPEG, however, a large number of components was needed to make complete solutions. Assembling these would typically require teaming up with other business players. MPEG designed the pieces of communication systems – actually only the interfaces and the protocols that are required to achieve interoperability between subsystems – providing solutions that achieved a previously agreed goal. In addition MPEG performed some other “technology integration services” because its standards could be used in pieces but also as complete solutions, i.e. as the sum of their parts. 

In other words, MPEG offered a place where R&D results, even if they were still at an unrefined stage, i.e. not yet transformed into products, were fed to a standardisation process and could become, if they satisfied the fitness and technical excellence criteria as assessed by peer review and decided by consensus, part of the common technology portfolio needed to create new communication forms.  

From what has been said before it should be clear that, far from stifling innovation, the MPEG standardisation process was the source of a virtuous circle where companies invested in innovation in the hope of a return both from the existence of new products and services shared with all other companies and industries, and from patent royalties. This worked well in principle, but the practice could be different because MPEG standard usually required a considerable amount of IPR for its implementation. This is because, for whatever choice MPEG had to make, there were usually a number of solutions, some of which were likely to be affected by IPR. Often, getting agreement from all rights holders for reasonable licensing terms might not be easy. 

In MPEG-2 times, the North American CATV industry was kind enough to help kick start the creation of the MPEG-2 patent pool, but in the MPEG-4 Visual case there was no industry or trade association that would be available to play a similar role. Commending words must not be spared to those who engaged in the daunting task of working out licensing terms for a standard that could be used in such diverse cases as mobile and CE devices, for personal and streaming applications, in hardware and software-based solutions.

I happened to have some comments on the terms of the MPEG-4 Visual licensing scheme. For sure they do not have the right balance that some people think the MP3 license has. A superficial observation is that charging for both receivers and content makes two major players unhappy. Another is that there was a perception that the component of “licensing fees now and from anything” prevailed over the creation of a business out of which much bigger revenues for the licensors could have been obtained in a longer term. 

To work fully, the MPEG model of standard development required a body (in the case of MPEG-4, actually two more bodies, the industry forum and the licensing entity). The last body was the one attempting to create a single-stop shop for patent licensing according to well-identified licensing schemes.


Option 1 Video Coding

From the very beginning MPEG realised that too many companies had invested in digital media technologies for MPEG to realistically target the development of Optionunencumbered high-performance digital media standards. That wisdom has served MPEG well for many years thanks to the (internal) ability to integrate the best technologies and the (external) ability to provide licensing terms for packages of patents.

Does the value of that wisdom still hold today? the answer is definitely yes. We are nowhere in sight of the end point of compression not only because there is more fat to squeeze out of media but also because capturing – and corresponding presentation – technologies continue to improve yielding more bits/samples, more realistic colours, more brightness, more dimensions… Unless MPEG continues to provide standards yielding the best quality money can buy, someone else will do in its stead.

This does not mean that there is only one wisdom ruling the field. When version 1 of MPEG-4 was approved 6 years had passed since MPEG-1 had been approved. When AVC was approved 11 years had passed. When HEVC was approved 21 years had passed. If 20 years ago it was foolish to try and define an unencumbered compression standard, today, with many of the old patents expired it may be possible to put together what ISO/IEC call “Option 1 standard” that people can hopefully implement without the need to pay royalties. That this is a meaningful path to tread is shown by the increasing number of private companies bringing to market proprietary codecs, some of them claiming that their solution is “royalty free”.

In the second half of the 2000s I started raising the issue of MPEG becoming a body handling technologies that were fast maturing, even though the research area was still very vital and capable of producing significant innovations as we will see with the HEVC and 3D Audio standards. In July 2011 a Call for Proposals for Internet Video Coding Technologies was published. This CfP was seeking compression technology for video in progressive format with compression capability that substantially outperforms MPEG-2 and is comparable to AVC Baseline Profile. The intention was to develop a specification that would include an initial profile (the “Baseline Profile”) of Option 1, i.e. for which patent owners are prepared to grant a free of charge license and which may include other profiles that may be royalty-bearing (so-called Type 2).

The CfP also stated that “MPEG recognized, however, that a Baseline Profile may not be possible”, indicating that MPEG was aware of the difficulties laying ahead.

Two submissions were received at the following October 2011 meeting. One proposed to create a hopefully Option 1 standard by extracting the AVC Baseline Profile and making it a separate standard – called Web Video Coding (WebVC) – against which patent declarations could be sought. This was done but the result was not up to the expectations: a number of companies declared that they would licence their patents ar Fair, Reasonable and Non-Discriminatory (FRAND) conditions.

The second proposal was indeed about a process to develop a Option 1 video compression standard substantiated by a first video coding whose performance was still below the target set by the CfP. Still, because the proposal looked interesting, an exploration activity was started retaining the name Internet Video Coding (IVC). However, in the next two years substantial performance improvements were not reported.

To break the stalemate a revised version of the July 2011 CfP was published again in April 2013. Only one response from Google was received that proposed yet another approach at the development of a Option 1 standard. Google proposed their VP8 video codec that they licensed free of charge.

While this third try at the development of a Option 11 video coding standard progressed in the ISO approval process with the name Video Coding for Browsers (VCB), significant improvements were reported in the activity initiated by the second proposal submitted to the first CfP. At the June 2015 meeting the performance reported was equivalent to that of the AVC High Profile, hence much more than what had been requested 4 years before. At that meeting MPEG initiated the ISO approval process. In Ovtober 2017 The IVC standards was eventually approved as an FDIS.

Before getting there, however, MPEG had to deal with a few patent declarations from companies unwilling to license their patents free of charge. They were of two types: statements that identified the patents and those that did not. The technologies referenced in the declarations of the first type were duly removed. For the other MPEG only could commit to remove the technologies if these were identified.


A Fuller Form Of Communication

At the beginning of December 1970, I returned home after spending 30 months in Japan as a Ph.D. student at the University of Tokyo. That had been both a great and hard period of time for me. Great, because during the day I was fully immersed in Japanese life, being part of the Miyakawa Laboratory, and hard because I had to live with a scant (for a foreigner) scholarship of 33,000 JPY, of which 2,700 went as the rent of the room at the Foreign Student House in Komaba, near Shibuya. So I had decided that I should suspend my Ph.D for a while, get a job and finish my Ph.D. until I was on firmer financial grounds. 

A prospective employer was CSELT, the research centre of SIP, the name of the national telephone company at that time, now called Telecom Italia. At the job interview with Prof. Bonavoglia, the director of CSELT at that time, I mentioned a short video coding experience I had in Japan in the Taki Laboratory next door and was assigned to that job.

It was indeed a favourable moment for such an assignment. I have already had the opportunity to mention more than once that telcos had always had the dream of giving their subscribers the opportunity to have richer forms of communication extending the basic vocal form. Already in the late 1920s, Deutsche Post and ATT had shown working prototypes of visual telephony systems. But Black Tuesday and, soon after, Depression, World War II and Reconstruction had set different priorities. Eventually in the late 1960s, after decades of talks, AT&T started a serious attempt to offer video telephony to its subscribers. The service was called Picturephone,  where an analogue video signal, with 267 lines and a bandwidth of 1 MHz (another video format!), was transmitted over the telephone subscriber line. Everybody expected that, with the commercial success of Picturephone, the next generation of videophone service would be digital and some telcos were even harbouring the hope of leapfrogging AT&T and going straight to a digital service.

It was not to be so. AT&T devised all sort of tricks to convince people to subscribe to the new service, like creating communities of happy picturephoners, but all was in vain and in the mid 1970s the service was discontinued. Fortunately research on video compression continued because digital video had more applications, not to mention the fact that digital video on an integrated digital network could be more attractive, at least from the operational – if not technological – viewpoint. In Europe the COST 211 project, started in 1974 at about the time the Picturephone service was being discontinued, led to 4 different manufacturers producing in the early 1980s 4 models of the 2 Mbit/s videoconference terminals based on the 4 prototypes that had been independently developed by 4 telco research establishments. The 4 models offered guaranteed interoperability because they had been tested within COST 211 and some tens of terminals were sold before production was stopped. In the USA a company called Compression Labs, Inc. (CLI) was established and their videoconference codecs had a moderate success. In Japan NTT tried to offer a videoconference service between Tokyo and Osaka, but with little success.

In the mid 1980s, ITU started the nx384 kbit/s videoconference project, but the scarce availability of 384 kbit/s accesses and a greater interest in an ISDN-based video service triggered the extension of the of the project to px64 kbit/s. Videophones and videoconference terminals based on H.261 and H.221 became commercially available in the early 1990s, but success was meagre. ITU-T started a successor project leading eventually to H.263 and H.223, but also the new terminals were far from a roaring success, so much so that the two major companies CLI and PictureTel making videophone products no longer exist. The former because it was closed down and the latter because it was acquired by Polycom. 

Interest in video communication was also shown by new players. In the early 1990s Intel launched its Indeo communication system that has left no trace of itself. Microsoft developed NetMeeting, a Windows application for audio and video communication based on ITU-T standards that was occasionally used. ATT and Marconi developed and marketed two “analogue” videophones, whose main feature was that they could be plugged to an analogue phone socket, but employed sophisticated video compression and had a built-in modem. After the usual hype and a fair amount of telco executives’ panic, the two devices fell into oblivion.

Toward the mid 1990s the mobile telecommunication industry started developing specifications for 3G networks. The main feature driving this development was a higher bitrate than available on 2G networks, e.g. 9.6 kbit/s of GSM. What for? Obviously, videotelephony on the move. This service was considered so important that Wideband-CDMA (W-CDMA), was designed as a circuit-base communication system like GSM to support mobile videotelephony. In Italy a new 3G operator was established that boasted its credentials by advertising its ability to provide video telephony on the move.

More recently Skype has developed a new business model for free internet-based telephony. Video has been an early addition to the service and some do indeed use it. Unfortunately lack of bandwidth forces many to downgrade the audio-video call to voice only.

Cisco has invested sognificant amount of money to develop a video conference system that relies on very high bandwidth and provides very high picture quality at high resolution. There has been much hype but it is hard to see how this experience for high-layer corporate users can translate into a mass phenomenon.

Why am I saying all this? Do I want to despise my first job assignment? Do I have an agenda? Do I want visual communication to be a failure? Am I in search of a self-fulfilled prophecy? Well, not really. It is true that one of the reasons why I started MPEG was because, after more than 15 years of efforts on my part, I saw the inconclusiveness of the visual communication business I had been selected to be part of. On the other I am certain that humans one day will be fancied with a form of communication that offers them a fuller satisfaction to their desire to interact with other humans physically separated from them. But I am also sure that what has been offered so far has little if anything to do with this fuller form of communication.

Still, the first questions telco managers used to ask when they were shown a videophone is: how good is picture quality, how well does it compare with product xyz? Wrong questions! Not that picture quality is irrelevant, but it is the last question to ask and the last element to take into consideration if one is designing this alternative communication form. Better questions would be: how does the system adapt to lighting changes, how effective is echo suppression, how can parallax be compensated, if it is a multipoint system, how is presence managed. But even these questions do not go to the core. The real questions are: what are the motivations for a consumer (business can be a different story) to be fancied with such a system, what are the classes of information elements that people want to convey, what does visual information bring that audio does not, how does this communication system fit in the spatial environment of a house etc. etc.

This happens because telcos have changed their skin, but their inside is the same. Once I asked a telco executive: why are you investing in ADSL video telephony? What kind of market studies have you carried out? (not that I believe too much in what is sometimes smuggled as “market study”, but I had to start from something that could be considered as “common ground”). I saw a moment of panic in his eyes and then he said: because we must stop talking about broadband services and start doing something in earnest. Belatedly realising that the logic of his answer was far from overriding, he then added: because our competitor does it. Having laid down these two elements of unassailable logic on the table, our conversation languished.

Telco people used to be driven (fortunately less so today) by the idea that the wires underground are what drives the business (the few who dared to object to this postulate were eliminated) and these are the people who plan new services. My take is that control of the wires (or of its Hertzian equivalent) is what enables the business, but the driver is elsewhere. This kind of fuller communication to which visual communication belongs is the remotest thing possible from telephone wires. It has more to do with the untold reasons why a man buys a certain tie and a woman a certain skirt than it has with wires. 

Much is probably to be learned from Cisco’s TelePresence service that allow “everyone, everywhere to be “present” to make better and faster decisions through one of the most natural and lifelike communications experiences available”.