Библиотека

Магистр ДонНТУ Морозов Дмитрий Сергеевич

Тема выпускной работы: Система моделирования технологической схемы производства и организации работы с документами

Научный руководитель: доцент кафедры компьютерной инженерии, кандидат технических наук Теплинский Сергей Васильевич



Original Article

Content Delivery Techniques


Content Delivery Techniques / IIS Smooth Streaming Technical Overview

Alex Zambelli, Media Technology Evangelist

Microsoft Corporation – March 2009


Media delivery on the Web today uses three general delivery methods: traditional streaming, progressive download, and adaptive streaming.


Traditional Streaming

RTSP (Real-Time Streaming Protocol) is a good example of a traditional streaming protocol. RTSP is defined as a stateful protocol, which means that from the first time a client connects to the streaming server until the time it disconnects from the streaming server, the server keeps track of the client's state. The client communicates its state to the server by issuing it commands such as PLAY, PAUSE or TEARDOWN (the first two are obvious; the last one is used to disconnect from the server and close the streaming session).

After a session between the client and the server has been established, the server begins sending the media as a steady stream of small packets (the format of these packets is known as RTP). The size of a typical RTP packet is 1452 bytes, which means that in a video stream encoded at 1 megabits per second (Mbps), each packet carries approximately 11 milliseconds of video. In RTSP the packets can be transmitted over either UDP or TCP transports—the latter is preferred when firewalls or proxies block UDP packets, but can also lead to increased latency (TCP packets are re-sent until received).


Figure 1. RTSP is an example of a traditional streaming protocol.


HTTP, on the other hand, is known as a stateless protocol. If an HTTP client requests some data, the server responds by sending the data, but it won't remember the client or its state. Each HTTP request is handled as a completely standalone one-time session.

Windows Media Services supports streaming over both RTSP and HTTP. But if HTTP is a stateless protocol, how can it be used for streaming? Windows Media Services uses a modified version of HTTP officially known as MS-WMSP (known in Windows Media Services as the Windows Media HTTP Streaming Protocol, or more commonly just as Windows Media HTTP). MS-WMSP uses standard HTTP for transfer of data and messages but also maintains session states, effectively turning it into a streaming protocol like RTSP. Windows Media Services has also supported RTSP streaming since 2003 (in Windows Media Services 9 Series) over both UDP and TCP. Its implementation of the protocol is publicly documented as MS-RTSP.

Silverlight only supports HTTP-based delivery from Windows Media Services.

The most important things to remember about traditional streaming protocols such as RTSP and Windows Media HTTP (MS-WMSP) are:

  • The server sends the data packets to the client at a real-time rate only—that is, the bit rate at which the media is encoded. For example, a video encoded at 500 kilobits per second (kbps) is streamed to clients at approximately 500 kbps.
  • The server only sends ahead enough data packets to fill the client buffer. The client buffer is typically between 1 and 10 seconds (Windows Media Player and Silverlight default buffer length is 5 seconds). This means that if you pause a streamed video and wait 10 minutes, still only approximately 5 seconds of video will have downloaded to the client in that time.

Other examples of traditional streaming protocols include Adobe Systems' proprietary Real Time Messaging Protocol (RTMP) and RealNetworks' RTSP over Real Data Transport (RDT) protocol. The Dynamic Streaming stream-switching feature in the Adobe® Flash® Platform is based on the RTMP protocol and is, therefore, considered a traditional streaming method—not adaptive streaming.


Progressive Download

Another common form of media delivery on the Web today is progressive download, which is nothing more than a simple file download from an HTTP Web server. Progressive download is supported by most media players and platforms, including Adobe Flash, Silverlight, and Windows Media Player. The term "progressive" stems from the fact that most player clients allow the media file to be played back while the download is still in progress—before the entire file has been fully written to disk (typically to the Web browser cache). Clients that support the HTTP 1.1 specification can also seek to positions in the media file that haven't been downloaded yet by performing byte range requests to the Web server (assuming that it also supports HTTP 1.1).

Popular video sharing Web sites on the Web today, including YouTube, Vimeo, MySpace, and MSN Soapbox, almost exclusively use progressive download.

Unlike streaming servers that rarely send more than 10 seconds of media data to the client at a time, HTTP Web servers keep the data flowing until the download is complete. If you pause a progressively downloaded video at the beginning of playback and then wait, the entire video will eventually have downloaded to your browser cache, allowing you to smoothly play the whole video without any hiccups.

There is a downside to this behavior as well—if 30 seconds into a fully downloaded 10 minute video, you decide that you don't like it and quit the video, both you and your content provider have just wasted 9 minutes and 30 seconds worth of bandwidth. To try to mitigate this problem, IIS 7.0 provides a cool extension called Bit Rate Throttling, which allows content providers to throttle the download bit rate in exactly the same way that a streaming server would to reduce costs.


HTTP-Based Adaptive Streaming

Adaptive streaming is a hybrid delivery method that acts like streaming but is based on HTTP progressive download. It's an advanced concept that uses HTTP rather than a new protocol. Both IIS Smooth Streaming and Move Networks Adaptive Stream are examples of adaptive streaming. Even though the two technologies use different codecs, formats, and encryption schemes, they both rely on HTTP as the transport protocol and perform the media download as a long series of very small progressive downloads, rather than one big progressive download.

In a typical adaptive streaming implementation, the video/audio source is cut into many short segments ("chunks") and encoded to the desired delivery format. Chunks are typically 2-to-4-seconds long. At the video codec level, this typically means that each chunk is cut along video GOP (Group of Pictures) boundaries (each chunk starts with a key frame) and has no dependencies on past or future chunks/GOPs. This allows each chunk to later be decoded independently of other chunks.

The encoded chunks are hosted on a HTTP Web server. A client requests the chunks from the Web server in a linear fashion and downloads them using plain HTTP progressive download. As the chunks are downloaded to the client, the client plays back the sequence of chunks in linear order. Because the chunks are carefully encoded without any gaps or overlaps between them, the chunks play back as a seamless video.

The "adaptive" part of the solution comes into play when the video/audio source is encoded at multiple bit rates, generating multiple chunks of various sizes for each 2-to-4-seconds of video. The client can now choose between chunks of different sizes. Because Web servers usually deliver data as fast as network bandwidth allows them to, the client can easily estimate user bandwidth and decide to download larger or smaller chunks ahead of time. The size of the playback/download buffer is fully customizable.


Figure 2. Adaptive streaming is a hybrid media delivery method.


Adaptive streaming, like other forms of HTTP delivery, offers the following advantages over traditional streaming to the content distributor:

  • It's cheaper to deploy because adaptive streaming can use generic HTTP caches/proxies and doesn't require specialized servers at each node.
  • It offers better scalability and reach, reducing "last mile" issues because it can dynamically adapt to inferior network conditions as it gets closer to the user's home.
  • It lets the audience adapt to the content, rather than requiring content providers to guess which bit rates are most likely to be accessible to their audience.

It also offers the following benefits for the user:

  • Fast start-up and seek times because start-up/seeking can be initiated on the lowest bit rate before moving to a higher bit rate.
  • No buffering, no disconnects, no playback stutter (as long as the user meets the minimum bit rate requirement).
  • Seamless bit rate switching based on network conditions and CPU capabilities.
  • A generally consistent, smooth playback experience.

Microsoft created a prototype implementation of HTTP-based adaptive streaming for the NBC 2008 Beijing Summer Olympic Games Web site. To meet the project's rapid development schedule, this implementation was very straightforward. NBC used Digital Rapids and Anystream encoders to produce multiple Windows Media Video (WMV) files of different bit rates/resolutions for each source. The encoders didn't employ any new encoding tricks but merely followed strict encoding guidelines (closed GOP, fixed-length GOP, VC-1 entry point headers, and so on.) which ensured exact frame alignment across the various bit rates of the same video. These WMV files were run through a post-processing tool that physically split each WMV file into thousands of 2-second chunks (files). The rest of the solution consisted of uploading the chunks to the CDN's Web servers and then building a Silverlight player that would download the chunks and play them in sequence.

With this implementation, NBC and Microsoft were able to offer a better-than-WMS streaming experience while using just simple HTTP download, with increased average content viewing times that directly translated to better advertising and monetization opportunities.

However, CDN operators lost many hours managing the millions of tiny files in their systems. Imagine: if each 2-seconds of video is split into a separate file and this is repeated for 5 available bit rates, you end up with 150 files for each minute of video. That's 13,500 files for a 90-minute soccer game!

So despite the NBC Olympics site being a huge success for Silverlight and HTTP-based adaptive streaming, it quickly became apparent that to productize this solution and offer improved file-management benefits, elementary design changes were required.