Streaming Media for Web Based Training

Chad Childers
Ford Motor Company

Frank Rizzo
National Tech Team


Streaming audio became available on the web in 1995, but with the development of the Synchronized Multimedia Integration Language [1], the technology has reached a new level of maturity. SMIL is based on the XML standard, and allows audio, video, jpeg, and text to all be integrated. The implications for web based pedagogy are tremendous. We now have the opportunity to do training on the desktop in a feature rich environment.

We will give a basic background in streaming media technology, discuss the standards, the current state of the art, our experience with RealNetworks[2], how to take advantage of it, and future developments, including the integration of testing engines.

Taking Web Based Training into the 21st Century

In the online world of 1999, what you can do is both empowered and constrained by the technology. A good understanding of the limits of the viewing software, end user hardware, and intervening network allows the instructional designer to make the best possible use of what is available - to push the limits while balancing speed and usability. We are now entering a whole new world, where those limits are being pushed back rapidly, and anything is possible. It is still very important to understand the limits, for now, but it is equally important to break down our preconceived notions of what is possible. For a moment, imagine that anything is possible!

The new SMIL standard allows multimedia content, including text, pictures, sound, and video to all be synchronized for a coherent learning experience. Control of all these media is contained in a simple text file, and tools are rapidly appearing.

In addition, it can greatly reduce the bandwidth required while delivering an experience similar to watching a fully interactive television channel.

Definitions, History, and Current Status

Streaming Media is defined as network based data which can be used before the data file has finished transferring. If you see a picture begin to appear on your screen before the transfer completes (as is possible with PNG or some JPG graphics), or hear an audio file start playing as soon as you click it (the commonest application, with RealAudio), that is an example of streaming media.

The primary advantage of streaming is that large audio and video files can be played as they arrive on the computer rather than waiting for a large file to download, then waiting for it to play. For training in particular, this means that the user interface is much more responsive.

A streaming request can occur in a variety of ways. If a user clicks on a WWW link to a RealAudio sound file, the player will start as a plug-in, and the data will be delivered in the usual way, over the web's HTTP protocol. Or the user could have access to the Mbone, a dedicated IP multicast network, and use a special viewer to watch broadcasts of meetings or events. In the next generation of tools and protocols, however, an integrated, synchronized page including any kind of media is presented, rather than one piece at a time.

SMIL is important for several reasons... first, because it integrates all kinds of media, second because it is an open standard and has gone through the Web Consortium proposal process, and finally because it is an XML language. XML, of course, is also an open W3C standard for an Extensible Markup Language [3]. Simply put, anything can be defined in XML, and it can be extended on the fly simply by defining new tags. If you need to add a new media type, just write the XML definition, and it's immediately usable.

A SMIL player can act like any other plug-in, and display SMIL content over an HTTP connection. But it can also subscribe to a host group and view an IP multicast, or negotiate the control connection and then open a one-way RTSP (Real Time Streaming Protocol) connection to stream the data.

The TCP/IP protocol upon which the Internet is based generally works in a one-to-one relationship. Even though all data to or from computers on a shared local network go across the same wire, they take turns, and each packet is labeled with a header telling which computer sent the packet and which computer should receive the packet. [4]

The TCP/IP network is reliable over a wide variety of physical networks because these packets can be retransmitted by TCP, the Transmission Control Protocol. Early papers on Real-Time Video concluded that the WWW was not suitable for this sort of media, because of the implicit delays.[5]

With IP multicast, a streaming server can send a broadcast message across the network, allowing multiple computers to receive it. New protocols such as RTP allow some packets to expire without retransmission, and new routers or IP tunneling allow the data to go across the network to multiple destinations with only one destination address in the header, which specifies a multicast host group instead of an individual computer.

SMIL, the best direction for Web Based Training

SMIL is a W3C open standard, which means that it builds on the existing base of XML tools and experience and allows different products from different companies, running on different computers to all interoperate seamlessly. It allows for very easy indexing and editing, because the control files are all plain text with tags, just like HTML, allows HTML to be used inline, and allows very easy extensibility for other applications (such as testing) in a well defined framework. Applications which support open standards tend to be available at low or no cost for academic uses, or you can select a high-end package that supports the same standard. Open standards tend to be simple and durable.

Streaming JPEG, a common component of SMIL presentations, has image quality advantages over animated GIF or video, and huge bandwith advantages over video.

In general, bandwidth can be used far more effectively with SMIL than with other implementations. The content can be split into separate tracks, like stereo tracks on a tape, but text takes much less bandwitdh, pics will almost certainly take a little less, and the quality of the audio track can be increased without sacrificing performance, as would have been done in the past. The ability to split up text and video, yet integrate them, makes a huge difference in bandwidth, and the streaming implementation improves the user experience.

The Real Nitty-Gritty

How to build a lesson

  1. Brainstorm on paper.
  2. Tools like DesignersEdge from Allen Communications, may become useful when a reproducable methodology and workflow are required utilizing common design templates. Can be helpful for rapid storyboarding of multiple projects with similar designs.
  3. Define the user experience
    1. What do you want them to learn? Documenting these objectives will help you define your source materials and keep the focus of your project's scope tight.
    2. How do you want them to navigate? What kind of interface do you want to give the user for navigating through the presentation? How will the information be organized on screen? Will a common look-and-feel be utilized for multiple presentations? Consistency between presentations helps maintain a users' focus on the ideas being taught and allows for rapid development by not having to re-design the interface for every presentation.
    3. What information will be presented, and in what order? Define the "timeline" for your presentation. What media clips will be needed, and what order will they be in. Some animations, live-text, video, and audio may be playing simultaneously. These considerations must be taken into account for not just the user's experience, but the timeliness of the presentation given the user's available bandwidth. If the slides or video are not synchronized with the audio, or the video takes seven seconds to buffer while the audio takes only two, then there will be usability problems which must be fixed.
  4. Interface design (backgrounds, graphics, "window dressing")
    1. Developing a usable interface that accommodates all of your design requirements is one of the most valuable talents in the Web industry today. All of the design rules that apply to standard web pages also apply to SMIL presentations. The interface must be clean and uncluttered. An properly designed interface compels the user to explore what you've presented while leading them safely through the entire presentation without confusing them. Strong navigation skills should not be a pre-requisite for a successful experience.
    2. From your final storyboard, design the graphics and text that will support your learning materials. Tools like Adobe Photoshop®, Macromedia Fireworks®, and other graphics design tools are useful here.
    3. Planning your presentations on-screen layout is essential. A lot of web pages are designed around a 600x400 pixel maximum resolution. This is also a good maximum for SMIL presentations. Planning for the he "lowest common denominator" screen resolution is always a safe bet. Again, this is a MAXIMUM recommendation. Smaller is always O.K. Streamed video usually has a maximum resolution of 320x240 pixels due to bandwidth constraints, so a 600x400 pixel resolution maximum for the entire presentation leaves a lot of room around the video for buttons, text, and other graphics.
  5. Informational content (video, audio, animations)
    1. Plan & conduct source material recording sessions
      1. Once the project storyboard and layout are finalized, it's time to build the actual "meat" of the presentation. Successfully planning and producing the actual presentation material is simple if you, the designer, have control over the material being presented. More often than not, the audio and video have to be recorded onto cassette or videotape and then digitized and encoded for use on the Web.

Producing streamed multimedia content

  1. Creating a Web-based presentation of an executive's Microsoft PowerPoint® presentation and audio narration using RealNetworks RealPresenter® is a lot simpler than having to schedule a recording session to videotape the executive actually presenting live, then having to digitally process the minutes (hours?) long video of the presentation. If the video of the executive's "talking-head" is considered a value-add to the project, then by all means use it. The objective here is to plan the multimedia source materials appropriately taking into consideration the time, resources, and funding required to realistically achieve your design goals.
  2. When dealing with video as a streamed medium, there are a lot of factors that influence the final stream quality and playback rates. The golden rule, as usual, applies to streaming video: "Garbage in, garbage out." The higher the quality of the recording used as source material, the smaller and faster the streaming video file will be. The differences in signal-to-noise ratios and overall resolution between VHS, S-VHS, 8mm, High-8, Mini-DV, BetaCam-SC, and DV-PRO video formats directly influence the playback framerate and encoded file size of the streaming video file. The better the format you can afford to record in, the cleaner and better your video will stream to your clients.
  3. For important high-bandwidth content, the use of a professional video production staff equipped with proper lighting and recording equipment will always yield a higher quality recording than those of us with the consumer video camera. This by no means should be interpreted to mean that low budget, consumer quality equipment is incapable of producing satisfactory results. Quite the contrary. However, to provide Internet-based video streams larger than a postage stamp at acceptable quality when network bandwith is at a minimum, starting with premium quality video recordings is essential.
  4. In addition to the format used to record the video, the format into which the video is digitized will effect the final encoder output. Always digitize video uncompressed at 30 frames per second and in Stereo at 16-bit 44-KHz sampling rates. Let the streaming format encoder software have the best quality input so it has the all of the data it needs to provide the highest-quality output. The more data the encoder has to work with, the fewer assumptions the compression routines need to make, and smoother, cleaner video output will result.
  5. Digital editing of video and audio sources before encoding is almost always required. Certain optimizations can be performed to provide optimal output upon playback. Applications for video and audio editing include: Adobe Premiere® and SonicFoundry's SoundForge® .
  6. Streamed animations can be produced using Macromedia's Flash® technology. Flash is in widespread use for non-streamed web-based animations. The same animations can be included into your SMIL presentation after a simple encoding procedure into the RealFlash® format. Now the use for Flash animations is no longer limited to the realm of the static web page and can be unleashed into the dynamic environment of a SMIL presentation. The RealNetworks site has some good examples of RealFlash® SMIL presentations.

Encoding content to RealNetworks formats

  1. Audio and video can be encoded into the RealNetworks® RealMedia® format using a variety of 3rd party applications. The easiest encoder to use is the RealProducer® Plus G2 from RealNetworks. Many different stream options are available from the RealProducer interface. The RealMedia G2 SureStream® format option allows multiple streams at different bandwidths to be encoded into the same file. This allows the RealPlayer and RealServer to better negotiate how much data to send the player based on network performance.
    For example, a video may be encoded for 28.8K modem, 56K modem, 64K Single ISDN, 128K Dual ISDN, 220K xDSL and Cable Modem, and 150K Corporate LAN data rates all within a single file. Depending on the available network bandwidth, the player will switch between these different encoded formats dynamically as the user watches the video. This feature provides much better playback than older streaming technologies which only adapt to changing network conditions by dropping frames or "fuzzing-out" the video into large undistinguishable blocks.

Uploading files to stream server

  1. As the streaming media files are created, they need to be stored on a separate server running the RealNetworks RealServer G2 software.
    The content creator will need to place the files in a subdirectory off the mount point for the server, and will need the address and port number of the server, as well as whether the Ramgen file system, for sending temporary small files, is in use. Then the files can be linked to from a plain web page. Links can be of the format http://server/ramgen/MountPoint/virtual_directory/filename, and once within SMIL, individual components can be specified in a very similar format, rtsp://server/MountPoint/virtual_directory/filename. [6]

Building the RealText, RealPix, and SMIL files

  1. After the actual content files have been created, the actual SMIL presentation files need to be created. This is quite similar to creating an HTML page or two, except we're using SMIL instead of HTML, and the files need to live on the Streaming server, not the web(HTTP) server.
  2. The SMIL technical documentation can be found acan be found at the World Wide Web Consortium Architechture for Synchronized Multimedia [1].


  1. what exists now
    1. tools for generating SMIL files include:
    2. what is needed to make it useful
      1. The evolution of the tools currently in Beta testing will no doubt give rise to a suite of powerful and easy-to-use tools for creating SMIL presentations. Ongoing development of SMIL with ratification via W3 to assure interoperbility.
    3. bandwidth requirement
      1. Bandwidth between you and your target audience is the limiting factor on SMIL design. Designing SMIL presentations includes tradeoffs for each data stream sent to the player. The designer must balance stream buffering times versus compression and the number of streams trying to get loaded simultaneously. These calculations are also affected by the resolution of the data to be streamed. Resizing a video originally intended to stream at 320x240 pixels down to 160x120 will reduce your bandwidth requirements by a factor of four (assuming constant compression rates). The RealNetworks SMIL kit has exhaustive information on this topic.
    4. hardware requirements
      1. Hardware requirements vary widely depending on your application. There are four sets of hardware requirements involved, only three will be discussed here: network infrastructure, web server, stream server, and client browser/player. The web server requirements and configuration are outside the scope of this document and will not be discussed.
      2. Many of the issues discussed here are particularly important to the corporate implementers who have controlled environments into which they wish to introduce streaming technologies. The only successful way to implement streaming media in a corporate environment is to work with the network and computer infrastructure organizations within your company to understand and proactively adapt to the additional requirements imposed by the technology.
      3. Network infrastructure
        1. Both Internet and Intranet bandwidth demands should not be underestimated. Careful analysis of current network loads and capacities can help determine how much streaming traffic can be handled before network upgrades are required to provide adequate quality of service to all users. A good example of a bad implementation was the Victoria Secrets® streaming video webcast as a product advertisement for Valentine's Day, 1999. Their Internet connection and stream server were completely overwhelmed and the webcast deemed a technological disaster (albeit a marketing success…) The Victoria's Secret® coporation failed to provide adequate network and/or server resources for the number of users requesting the video feeds and nobody seems to have gotten a good connection.
        2. Network upgrades are expensive and time consuming. Always consult with your network operations staff before implementing any streaming technologies on a wide-spread basis across your network.
      4. Stream server
        1. The RealNetworks RealServer G2 products run on a variety of UNIX platforms as well as Microsoft NT. Hardware requirements vary depending on expected number of users. Consult the RealNetworks website for details.
      5. Client browser/player
        1. The RealNetworks RealPlayer G2 will currently run on any PC-compatible running Windows 95,98,NT4.0 . Performance will vary depending on CPU speed and available memory. For fast, responsive control and playback the authors recommend at least a 166-Mhz Pentium with 32Mb of RAM. This is a minimum as far as we're concerned. Slower machines will provide sub-optimal playback.
        2. PC's also need to be MPC-2 compliant and have appropriate sound-cards, drivers, and headphones or speakers installed. For corporate Intranet sites, overcoming the current installed base of non multi-media equipped PCs is a non-trivial challenge. For larger corporations like Ford Motor Company, retrofitting the current installed PC base with headphones would be a multi-million dollar expense. Add sound cards to that equation and the price becomes truly astronomical.

    Future Direction

    Most currently available web based testing software requires custom format files, special software, and is not standards-based. New products such as TopClass[7] use plain text and HTML, and are much better suited to integration within a SMIL framework. Testing is the next step.

    Better integration and tools will allow the potential we see to be realized. Right now, streaming media is at the level of maturity the Web was in 1994. The standards are there, and the tools are coming. You can go out and use the technology now... so let us know what you do with it!

    [1] W3C Recommendation: Synchronized Multimedia Integration Language (SMIL) at
    [2] RealNetworks HTML+TIME at
    [3] W3C Recommendation: Extensible Markup Language (XML) at
    [4] Comer, Douglas E. The Internet Book: Everything You Need to Know About Computer Networking and How the Internet Works. Prentice Hall, August 1997
    [5] "Real-Time Video and Audio in the World Wide Web" by Chen, Tan, Campbell, and Li. Proceedings of the Fourth International World Wide Web Conference, December 1995.
    [6] RealServer Administration Guide at
    [7] WBT Systems TopClass Overview at