Essentially WinRTP is a streaming architecture for audio and a set of ready-made building blocks that allow the user to write audio streaming applications using the aforementioned streaming architecture. It is written in C++ using the Win32 API but it is written so that it should be easily ported to other platforms. It is small, fast, and quite powerful. However in its present state, it only handles audio streaming and lacks most of the sophistication and versatility of the Microsoft DirectShow API when it comes to handling multiple different kinds of streams. But WinRTP is much more simple than DirectShow as far as audio streaming is concerned.

What the WinRTP package contains

- A set of C++ source files that implement the streaming architecture mentioned above

- A set of C++ source files that implement various ready-made building blocks that conform to the WinRTP architecture

- Two programs – provided in source and binary

o One of them is a program that implements an audio end-point – i.e. a software component that can sink/play out RTP audio streams, as well as can capture the audio from the microphone and transmit it as an RTP stream. This component uses the ready-made building blocks mentioned above and illustrates how the user can implement a full-fledged audio streaming application using the WinRTP architecture. Most users will probably be interested in this application and not bother with the source code

o A test program that illustrates how to use the above application

- Documentation

The WinRTP Architecture

- Is quite simple and versatile

- Imposes a minimum of restrictions – this makes it powerful, but also susceptible to problems if the programmer is not careful

- Is designed for audio streaming applications

Filters

The WinRTP architecture consists of “filters” (similar to DirectShow filters) that are connected to each other for audio streaming purposes. A filter is a block that can perform one of three things – create audio (source filter), process audio in a certain way (transform filter) and consume audio (renderer filter).

Source Filter examples: a filter that captures audio from the microphone of a sound card, a filter that listens to the network for an incoming RTP audio stream and extracts the audio from it

Transform Filter examples: All encoder filters and decoder filters

Renderer Filter examples: a filter that plays the audio through the sound card’s speaker, or a filter that saves the audio in a file on the disk

Filter-Graphs

The connection of the various filters is called a filter-graph. Each source filter can have one or more outputs, each transform filter can have one or more inputs and one or more outputs, and renderer filters can have one or more inputs. Every output of all the source filters in the filter-graph must finally have a path to an input of a renderer filter

Connecting Filters

One of the very powerful features of WinRTP is its ability to handle any number of inputs and outputs on any filter. Connections are made in a two-step process – connecting the upstream filter to the downstream filter, and then connecting the downstream filter to the upstream filter. A great advantage is that new connections can be set up even while the filter graph is running! Unlike DirectShow, WinRTP does not bother about negotiating the audio format for connections and pins – it just lets you connect any filter to any other and depends on the application writer to make sure that the connection makes sense. This is in line with the stated objective of making WinRTP not impose any restrictions on the user – this is a key factor that makes WinRTP simple and powerful.

Push and Pull based Filter-Graphs

A filter graphs where the data (audio samples) are pushed downstream to the next filter is called a push-based filter-graph. A filter-graph where downstream filters ask the upstream filters for data (audio samples) is called a pull-based filter-graph

Audio Samples

Audio is transported from one filter to another using “audio samples”. These are fixed size buffers that contain audio data and some headers. A typical audio sample contains 20-60 milliseconds of data.

Streaming Architecture

This is the vital component that defines the behavior of WinRTP – the streaming architecture is where WinRTP differs from other frameworks like DirectShow. The WinRTP streaming architecture has been designed to be flexible. As mentioned earlier, data in filter graphs move in the form of Audio Samples, originating at a Source Filter, then going through zero or more transform filters, and finally being consumed by a renderer filters.

Source Filters

Source Filters are generic filters that can generate audio data. They do not have any upstream filters (because they generate the audio themselves), but can have one or more outputs. By output, I mean it can be connected to one or more downstream filters. A source filter sends the same generated audio data to all the downstream filters. The AudioSource class under MTC project implements source filters. To create your own source filter, you have to derive your class from AudioSource. All the functions other than GenerateData are already implemented, so you won’t have to implement too many functions other than that.

Generating Data

The GenerateData () function is called whenever the source filter needs to generate any audio data. You must override this function in your derived class when you create your own source filter

Starting/Stopping the Source Filter: Active and Passive Modes

The source filters can work in one of two modes – Active Mode and Passive Mode.

Active Mode

In active mode, the filter operates by instantiating its own worker thread. This worker thread runs in a loop, calling GenerateData () (to make the source filter generate an audio sample of data) and pushing the generated audio sample downstream to each of the directly connected downstream filters by calling their TakeNextAudioSample () function. Once the data has been pushed to all the downstream connections, it calls GenerateData () again and so on.

To start the source filter in active mode, you must start it using the StartSourceThread () function. This in turn will call the SourceThreadStarted () function when the worker thread has been instantiated. Override the SourceThreadStarted () function to carry out any initialization that may be needed before the first GenerateData () can be called.

To stop an active source filter, you must stop it by calling StopSourceThread (). This will stop the worker thread. Before this thread exits, it will call the SourceThreadStopped () function. Override this function if you want to do any cleanup before your source filter stops

Passive Mode

The source filter can also work in the passive mode. In this mode, it does NOT instantiate any worker thread. All it dies is wait for a downstream filter to call GiveNextAudioSample (). When this call arrives, GiveNextAudioSample () works as follows: it calls GenerateData () to create an audio sample of data. Then it pushes this audio sample to ALL the directly connected downstream filters (EXCEPT the filter that made this call to GiveNextAudioSample ()) by calling their TakeNextAudioSample () functions with the generated audio sample. Finally, GiveNextAudioSample () returns, returning the generated audio sample. Thus, we see that in passive mode, there is no worker thread for the source filter, but it depends on a worker thread of some other downstream filter to make the call to GiveNextAudioSample ().

To start a source filter in passive mode, you must start it with the StartSource () function. If you need to do some initialization before the filter can start generating data, then override the StartSource ( ) function in your derived class and do this initialization in this function.

To stop a source filter running in passive mode, stop it using the StopSource () function. Override this function to implement any cleanup that your filter might need to do before it can stop

Observations

- WinRTP imposes very few restrictions, so you can start a filter in both active and passive modes simultaneously by calling StartSource and StartSourceThread both. Similarly, the source filter will allow all calls, independent of its running mode. For e.g. GiveNextAudioSample () can be called even if the source filter is in active mode. The behavior of this function call will be exactly as described earlier

- All that being said, it is bad programming/wrong to allow the above scenarios to occur. The programmer is in charge of making sure that the filter is only running in either active or passive mode, and by designing the filter graph properly he/she can make sure that things work exactly as planned

- Note from the above description that the WinRTP architecture makes sure that all downstream filters directly connected to the source filter get the same audio sample of data at the same time before the next audio sample is processed. It is important that the user understand how this happens in both active and passive modes from the above descriptions

Implementing Audio Source Filters

The AudioSource class already implements all the behavior of the audio sources. So all you have to do is

- Derive you class from AudioSource

- Override and implement the GenerateData () function

- Depending on which modes your filter will run in (active or passive) you might need to optionally override the StartSource/StopSource and/or SourceThreadStarted/SourceThreadStopped functions. Note that this is not necessary if you do not have any special initialization/cleanup requirements.

- Check out the source code of WaveAudioSource, RTPAudioSource and WaveFileSource for examples of how to implement audio source filters

Renderer Filter

========================

RenderAudioSamples ()

TakeNextAudioSample ()

StartSink ()

StopSink ()

StartSinkThread ()

StopSinkThread ()

SinkThreadStarted ()

SinkThreadStopped ()

Renderer Filters

Renderer Filters are filters that render (sink, consume) audio samples. They do not have any downstream filters because they are the endpoint for the incoming audio samples. They can have one or more upstream filters that supply audio samples to it. The base class that implements the Renderer filter behavior is the AudioSink class. You should derive from AudioSink when you want to create a new Renderer Filter. Renderer filters can consume data in many different ways, for e.g. by playing the audio samples out through the speaker, sending the audio samples as RTP audio packets, saving the audio samples as .wav files on disk, and so on. All functions other than RenderAudioSamples () are already implemented or have stubs, so when you write your own renderer filter, RenderAudioSamples () is the only function that you have to implement – the others are optional

Rendering (Sinking) Data

The RenderAudioSamples () function is called whenever the Renderer Filter needs to sink/render some audio. This function is called with a list of audio samples – one from each of the directly connected upstream filters. You must override this function when you implement your own renderer filter by deriving it from AudioSink

Just like an AudioSource, an AudioSink can also run in either Active mode or Passive mode