Coping with an Unpredictable Internet

	Previous chapter	Next section	Next chapter
ToC	Systems And Services	Inside MPEG-H – Systems	Glimpses Of The Future

Dynamic Adaptive Streaming over HTTP (DASH) is a media-streaming standard in which the client has control of the server-client information flow. Clients may request data using the HTTP protocol from standard web servers even if the servers may not have DASH-specific capabilities. The DASH standard primarily defines two formats:

The Media Presentation Description (MPD): a format to announce resource identifiers (HTTP-URLs) for Segments (media chunks) and to provide the context for these identified resources within a Media Presentation;
The Segment formats specify the formats of the entity body of the HTTP response to an HTTP GET request or a partial HTTP GET with the indicated byte range using HTTP/1.1 to a resource identified in the MPD.

The MPD provides sufficient information for a “DASH Client” to provide a streaming service to the user by accessing the Segments through HTTP/1.1.Figure 1 shows a possible DASH deployment architecture. In the figure

Solid-line boxes are referenced in the DASH standard as they host/process DASH formats;
Dashed boxes are conceptual or transparent or outside of the scope of the standard.

Figure 1 – DASH model

The collection of encoded and deliverable versions of media content and the appropriate description of these form a Media Presentation. Media content is composed of a single or multiple contiguous media content periods in time. Each media content period is composed of one or multiple media content components, for example audio components in various languages and a video component. Each media content component has an assigned media content component type, for example audio or video.Each media content component may have several encoded versions, referred to as media streams. Each media stream inherits the properties of the media content, the media content period, the media content component from which it was encoded and in addition it gets assigned the properties of the encoding process such as sub-sampling, codec parameters, encoding bitrate, etc. This describing metadata is relevant for static and dynamic selection of media content components and media.

Figure 2 — DASH High-Level Data Model

A DASH Media Presentation [1] is described by a Media Presentation Description [2] (MPD). This describes the sequence of Periods [3] in time that make up the Media Presentation. A Period typically represents a media content period. Within a Period, material is arranged into Adaptation Sets [4]. If there is other material available, for example captions or audio descriptions, then these may each have a separate Adaptation Set. Material may also be provided in multiplexed form, in which case interchangeable versions of the multiplex may be described as a single Adaptation Set, for example an Adaptation Set containing both the main audio and main video for a Period. Each of the multiplexed components may be described individually by a Media Content Component Description. An Adaptation Set contains a set of Representations [5]. A Representation describes a deliverable encoded version of one or several media content components. A Representation includes one or more media streams (one for each media content component in the multiplex). Any single Representation within an Adaptation Set is sufficient to render the contained media content components. Typically, clients may switch from Representation to Representation within an AdaptationSet in order to adapt to network conditions or other factors. Within a Representation, the content may be divided in time into Segments [6]. A URL is provided for each Segment meaning that a Segment is the largest unit of data that can be retrieved with a single HTTP request. DASH defines different timelines. One key feature in DASH is that encoded versions of different Media Content Components [7] share a common timeline. The Presentation Time [8] of Access Unit [9] within the media content is mapped to the global common presentation timeline for synchronization of different media components and to enable seamless switching of different coded versions of the same media components. This timeline is referred as Media Presentation Timeline [10]. The Media Segments themselves contain accurate Media Presentation timing information enabling synchronization of components and seamless switching. A second timeline is used to signal to clients the availability time of segments at the specified HTTP-URLs. These times are referred to as Segment Availability Times [11] and are provided in wall-clock time. Clients typically compare the wall-clock time to Segment availability times before accessing the Segments at the specified HTTP-URLs. For On-Demand services with a static MPD, the availability times of all Segments are identical. For live services when the MPD is updated, the availability times of segments depend on the position of the Segment in the Media Presentation timeline. Segments are assigned a duration, which is the duration of the media contained in the Segment when presented at normal speed. Typically all Segments in a Representation have the same or roughly similar duration. However Segment duration may differ from Representation to Representation. A DASH presentation can be constructed with relative short segments (for example a few seconds), or longer Segments including a single Segment for the whole Representation. Short Segments are usually required in the case of live content, where there are restrictions on end-to-end latency. The duration of a Segment is typically a lower bound on the end-to-end latency. DASH does not support the possibility for Segments to be extended over time: a Segment is a complete and discrete unit that must be made available in its entirety. Segments may be further subdivided into Subsegments [12] each of which contains a whole number of complete access units. There may also be media-format-specific restrictions on Subsegment boundaries, for example in the ISO Base Media File Format a Subsegment must contain a whole number of complete movie fragments. If a Segment is divided into Subsegments these are described by a compact Segment index [13], which provides the presentation time range in the Representation and corresponding byte range in the Segment occupied by each Subsegment. Clients may download this index in advance and then issue requests for individual Subsegments. Clients may switch from Representation to Representation within an Adaptation Set at any point in the media. However, switching at arbitrary positions may be complicated because of coding dependencies within Representations and other factors. It is also desirable to avoid download of ‘overlapping’ data i.e. media for the same time period from multiple Representations. Usually, switching is simplest at a random access point in the new stream. In order to formalize requirements related to switching DASH defines a codec-independent concept of Stream Access Point [14].Segmentation and Sub-segmentation may be performed in ways that make switching simpler. For example, in the very simplest cases each Segment or Subsegment begins with a random access point and the boundaries of Segments or Subsegments are aligned across the Representations of an Adaptation Set. In this case, switching Representation involves playing to the end of a (Sub)Segment of one Representation and then playing from the beginning of the next (Sub)Segment of the new Representation. The Media Presentation Description and Segment Index provide various indications, which describe properties of the Representations that may make switching simpler. Profiles of this specification may then require these indicators to be set in certain ways, making implementation of clients for those profiles simpler at the cost of requiring the media data to obey the indicated constraints. For On-Demand services, the MPD is a static document describing the various aspects of the Media Presentation. All Segments of the Media Presentation are available on the server once any Segment is available. For live services, however, Segments become available with time as the content is produced. The MPD may be updated regularly to reflect changes in the presentation over time, for example Segment URLs for new segments may be added to the MPD and those for old, no longer available Segments may be removed. However, if Segment URLs are described using a template, this updating may not be necessary except for some redundancy/failover cases.
[1] Media Presentation: collection of data that establishes a bounded or unbounded presentation of media content[2] Media Presentation Description: formalized description for a Media Presentation for the purpose of providing a streaming service[3] Period: interval of the Media Presentation, during which a consistent set of encoded versions of the media content is available i.e. the set of available bitrates, languages, captions, subtitles etc. does not change during a Period[4] Adaptation Set: a set of interchangeable encoded versions of one or several Media Content Components. For example there may be one Adaptation Set for the main video component and a separate one for the main audio component[5] Representation: collection and encapsulation of one or more media streams in a delivery format and associated with descriptive metadata[6] Segment: unit of data associated with an HTTP-URL and optionally a byte range, meaning that the Segment is contained in the provided byte range of some larger resource[7] Media Content Component: one continuous component of the media content with an assigned media component type that can be encoded individually into a media stream[8] Presentation Time: a time associated to an access unit that maps it to the Media Presentation timeline[9] Access Unit: unit of a media stream with an assigned Media Presentation time[10] Media Presentation Timeline: concatenation of the timeline of all Periods which itself is common to all Representations in the Period[11]Segment Availability Time: The time at which a Segment becomes available at the specified HTTP-URLs[12] Subsegment: a portion of a Segment which contains a whole number of complete Access Units[13]Segment index: a compact index of the time range to byte range mapping within a Media Segment separately from the MPD[14] Stream Access Point: position in a Representation enabling playback of a media stream to be started using only the information contained in Representation data starting from that position onwards (preceded by initializing data in the Initialization Segment, if any)

	Previous chapter	Next section	Next chapter
ToC	Systems And Services	Inside MPEG-H – Systems	Glimpses Of The Future