Last update: 2011/08/21
More video compression has come after MPEG-1 Video, MPEG-2 Video and MPEG-4 Visual.
During the development of MPEG-4, several liaison statements were sent to ITU-T suggesting to work together on the new MPEG-4 Visual standard and even on the MPEG-4 Audio standard, specifically on the speech coding part. As no responses were received to these offers MPEG continued the development of MPEG-4 alone.
For several years the Video Coding Experts Group (VCEG) of Study Group 16 of ITU-T worked from the ground up on the development of new video compression technologies and achieved a breakthrough in compression performance around the turn of the century. In spite of the lack of official answers to our liaison statements from ITU-T, I decided that it would be in the industry interest to establish a working relationship. When Thomas Sikora left MPEG I appointed Gary Sullivan, VCEG rapporteur, as MPEG Video chair to manage to achieve a belated convergence of the ITU-T and MPEG efforts in the area of video coding.
At the July 2001 meeting MPEG reviewed the results of video compression viewing tests designed to assess whether there was evidence for advances in video coding technology that warranted the start of a new video coding project. With the positive result of the review a Call for Proposals was issued and in December the Joint Video Team (JVT) composed of MPEG and VCEG members was established.
With an intense schedule of meetings, the JVT managed to achieve the Final Draft International Standard stage of the new Advanced Video Coding (AVC) standard in March 2003. So AVC became part 10 of MPEG-4.
The AVC standard specifies
The most important control data are
Fig. 1 - The AVC layers
AVC is based on the so-called "block based hybrid video coding" approach where a coded video sequence consists of an independently-coded sequence of coded pictures.
Fig. 2 shows the innovation broght about by point 8. in the list above
Fig. 2 - Multiple reference frames in AVC
Fig. 3 shows the innovation broght about by point 6. in the list above
Fig. 3 - Variable macroblock size in AVC
Following requests from the industry several profiles have been defined, some of which are
The AVC endeavour kept its promise of reducing by half the performance of MPEG-2 Video.
At the same meeting the JVT was established, MPEG started an investigation in video scalability that eventually led to the development of requirements for Scalable Video Coding (SVC). In a nutshell these imply that the encoded video stream should be structured into a base layer stream, decodable by a non-scalable decoder and one or more enhancement layer stream(s) decodable by a decoder conforming to the SVC standard. A Call for Proposals was issued and this work item, too, was entrusted to the JVT even though the SVC standard can actually work on top of any base layer video coding standard.
SVC is based on a layered representation with multiple dependencies. To achieve temporal scalability there is a need for frame hierarchies so that frames that are not used as references for prediction of layers that are still present can be skipped, as indicated in Fig. 2 where pictures marked as “B3” can be removed to reduce the frame rate by a factor of 3, and by removing those marked “B2” the frame rate is reduced by a factor of 2 etc.

Fig. 4 - SVC frame hierarchy
SVC offers a high degree of flexibility in terms of scalability dimensions, e.g. it supports various temporal/spatial resolutions, Signal-to-Noise (SNR)/fidelity levels and global/local Region of Interest (ROI) access). SVC performs significantly better and is much more flexible in terms of number of layers and combination of scalable modes than the scalable version of MPEG-2 Video and MPEG-4 Visual, while the penalty in compression performance, as compared to single-layer coding, is almost negligible,
For the purpose of spatial scalability, the video is first downsampled to the required spatial resolution(s). The ratio between frame heights/widths of the respective resolutions does not need to be dyadic (factor of two). Encoding as well as decoding starts at the lowest resolution, where an AVC compatible “base layer” bitstream will typically be used. For the respective next-higher “enhancement layer”, three decoded component types are used for inter-layer prediction from the lower layer:

Fig. 4 - SVC block diagram
Seminal work on Multiview Video Coding was already carried out for MPEG-2 and more work was done for MPEG-4 Visual. In AVC Multiview Video Coding was added to AVC. ) standard that provides efficient coding of such multiview video. The overall structure of MVC defining the interfaces is illustrated in Fig. 5.

Fig. 5 - MVC model
The encoder receives N temporally synchronized video streams and generates one bitstream. The decoder receives the bitstream, decodes and outputs N Video signals that can be used for different purposes: to generate 1 view, N views of a stereo view.
Prediction across views , as shown in Fig. 6 is used to exploit inter-camera redundancy with the limitation that inter-view prediction is only effected from the same time instance and cannot exceed the maximum number of stored reference pictures

Fig. 5 - MVC prediction
The base view is independent of any other view and is AVC compatible that can be extracted to provide a compatible 2D representation of the 3D version.