WinRTP 2.0 Design Document
What the WinRTP package contains
Push and Pull based Filter-Graphs
Audio Sample Lifetime Management
(Refcounts)
Some Design Pointers and
Illustrations
Example 1: Filters With Buffers
Example 1: Mixing and Playing Three
.WAV Files
Example 2: Mixing and Playing Three
.WAV Files of Different Formats
The Ready-made WinRTP Component
Overriding Trace Levels for All
Components/Turning Off Tracing
Essentially WinRTP is a streaming architecture for audio and a set of ready-made building blocks that allow the user to write audio streaming applications using the aforementioned streaming architecture. It is written in C++ using the Win32 API but it is written so that it should be easily ported to other platforms. It is small, fast, and quite powerful. However in its present state, it only handles audio streaming and lacks most of the sophistication and versatility of the Microsoft DirectShow API when it comes to handling multiple different kinds of streams. But WinRTP is much more simple than DirectShow as far as audio streaming is concerned.
- A set of C++ source files that implement the streaming architecture mentioned above
- A set of C++ source files that implement various ready-made building blocks that conform to the WinRTP architecture
- Two programs – provided in source and binary
o One of them is a program that implements an audio end-point – i.e. a software component that can sink/play out RTP audio streams, as well as can capture the audio from the microphone and transmit it as an RTP stream. This component uses the ready-made building blocks mentioned above and illustrates how the user can implement a full-fledged audio streaming application using the WinRTP architecture. Most users will probably be interested in this application and not bother with the source code
o A test program that illustrates how to use the above application
- Documentation
- Is quite simple and versatile
- Imposes a minimum of restrictions – this makes it powerful, but also susceptible to problems if the programmer is not careful
- Is designed for audio streaming applications
The WinRTP architecture consists of “filters” (similar to DirectShow filters) that are connected to each other for audio streaming purposes. A filter is a block that can perform one of three things – create audio (source filter), process audio in a certain way (transform filter) and consume audio (renderer filter).
Source Filter examples: a filter that captures audio from the microphone of a sound card, a filter that listens to the network for an incoming RTP audio stream and extracts the audio from it
Transform Filter examples: All encoder filters and decoder filters
Renderer Filter examples: a filter that plays the audio through the sound card’s speaker, or a filter that saves the audio in a file on the disk
The connection of the various filters is called a filter-graph. Each source filter can have one or more outputs, each transform filter can have one or more inputs and one or more outputs, and renderer filters can have one or more inputs. Every output of all the source filters in the filter-graph must finally have a path to an input of a renderer filter
One of the very powerful features of WinRTP is its ability to handle any number of inputs and outputs on any filter. Connections are made in a two-step process – connecting the upstream filter to the downstream filter, and then connecting the downstream filter to the upstream filter. A great advantage is that new connections can be set up even while the filter graph is running! Unlike DirectShow, WinRTP does not bother about negotiating the audio format for connections and pins – it just lets you connect any filter to any other and depends on the application writer to make sure that the connection makes sense. This is in line with the stated objective of making WinRTP not impose any restrictions on the user – this is a key factor that makes WinRTP simple and powerful.
A filter graphs where the data (audio samples) are pushed downstream to the next filter is called a push-based filter-graph. A filter-graph where downstream filters ask the upstream filters for data (audio samples) is called a pull-based filter-graph
Audio is transported from one filter to another using “audio samples”. These are fixed size buffers that contain audio data and some headers. A typical audio sample contains 20-60 milliseconds of data.
This is the vital component that defines the behavior of WinRTP – the streaming architecture is where WinRTP differs from other frameworks like DirectShow. The WinRTP streaming architecture has been designed to be flexible. As mentioned earlier, data in filter graphs move in the form of Audio Samples, originating at a Source Filter, then going through zero or more transform filters, and finally being consumed by a renderer filters.
Source Filters are generic filters that can generate audio data. They do not have any upstream filters (because they generate the audio themselves), but can have one or more outputs. By output, I mean it can be connected to one or more downstream filters. A source filter sends the same generated audio data to all the downstream filters. The AudioSource class under MTC project implements source filters. To create your own source filter, you have to derive your class from AudioSource. All the functions other than GenerateData are already implemented, so you won’t have to implement too many functions other than that.
The GenerateData () function is called whenever the source filter needs to generate any audio data. You must override this function in your derived class when you create your own source filter
The source filters can work in one of two modes – Active Mode and Passive Mode.
In active mode, the filter operates by instantiating its own worker thread. This worker thread runs in a loop, calling GenerateData () (to make the source filter generate an audio sample of data) and pushing the generated audio sample downstream to each of the directly connected downstream filters by calling their TakeNextAudioSample () function. Once the data has been pushed to all the downstream connections, it calls GenerateData () again and so on.
To start the source filter in active mode, you must start it using the StartSourceThread () function. This in turn will call the SourceThreadStarted () function when the worker thread has been instantiated. Override the SourceThreadStarted () function to carry out any initialization that may be needed before the first GenerateData () can be called.
To stop an active source filter, you must stop it by calling StopSourceThread (). This will stop the worker thread. Before this thread exits, it will call the SourceThreadStopped () function. Override this function if you want to do any cleanup before your source filter stops
The source filter can also work in the passive mode. In this mode, it does NOT instantiate any worker thread. All it dies is wait for a downstream filter to call GiveNextAudioSample (). When this call arrives, GiveNextAudioSample () works as follows: it calls GenerateData () to create an audio sample of data. Then it pushes this audio sample to ALL the directly connected downstream filters (EXCEPT the filter that made this call to GiveNextAudioSample ()) by calling their TakeNextAudioSample () functions with the generated audio sample. Finally, GiveNextAudioSample () returns, returning the generated audio sample. Thus, we see that in passive mode, there is no worker thread for the source filter, but it depends on a worker thread of some other downstream filter to make the call to GiveNextAudioSample ().
To start a source filter in passive mode, you must start it with the StartSource () function. If you need to do some initialization before the filter can start generating data, then override the StartSource ( ) function in your derived class and do this initialization in this function.
To stop a source filter running in passive mode, stop it using the StopSource () function. Override this function to implement any cleanup that your filter might need to do before it can stop
- WinRTP imposes very few restrictions, so you can start a filter in both active and passive modes simultaneously by calling StartSource and StartSourceThread both. Similarly, the source filter will allow all calls, independent of its running mode. For e.g. GiveNextAudioSample () can be called even if the source filter is in active mode. The behavior of this function call will be exactly as described earlier
- All that being said, it is bad programming/wrong to allow the above scenarios to occur. The programmer is in charge of making sure that the filter is only running in either active or passive mode, and by designing the filter graph properly he/she can make sure that things work exactly as planned
- Note from the above description that the WinRTP architecture makes sure that all downstream filters directly connected to the source filter get the same audio sample of data at the same time before the next audio sample is processed. It is important that the user understand how this happens in both active and passive modes from the above descriptions
The AudioSource class already implements all the behavior of the audio sources. So all you have to do is
- Derive you class from AudioSource
- Override and implement the GenerateData () function
- Depending on which modes your filter will run in (active or passive) you might need to optionally override the StartSource/StopSource and/or SourceThreadStarted/SourceThreadStopped functions. Note that this is not necessary if you do not have any special initialization/cleanup requirements.
- Check out the source code of WaveAudioSource, RTPAudioSource and WaveFileSource for examples of how to implement audio source filters
|
Renderer Filters are filters that render (sink, consume) audio samples. They do not have any downstream filters because they are the endpoint for the incoming audio samples. They can have one or more upstream filters that supply audio samples to it. The base class that implements the Renderer filter behavior is the AudioSink class. You should derive from AudioSink when you want to create a new Renderer Filter. Renderer filters can consume data in many different ways, for e.g. by playing the audio samples out through the speaker, sending the audio samples as RTP audio packets, saving the audio samples as .wav files on disk, and so on. All functions other than RenderAudioSamples () are already implemented or have stubs, so when you write your own renderer filter, RenderAudioSamples () is the only function that you have to implement – the others are optional
The RenderAudioSamples () function is called whenever the Renderer Filter needs to sink/render some audio. This function is called with a list of audio samples – one from each of the directly connected upstream filters. You must override this function when you implement your own renderer filter by deriving it from AudioSink
Just like an AudioSource, an AudioSink can also run in either Active mode or Passive mode
In this mode, the renderer filter creates its own worker thread (please see description source filters). The behavior of the worker thread is as follows: It runs in a loop
Loop
{
- Ask for data from the first directly connected upstream filter by calling GiveNextAudioSample ()
- Ask for data from the second directly connected upstream filter by calling GiveNextAudioSample ()
- ….
- Ask for data from the last upstream filter
- Put all the received data into a list of audio samples, one from each upstream filter
- Render the data, by calling RenderAudioSamples () and passing to it this list of audio samples
}
Please note that when the renderer filter is in active mode, all the directly connected upstream filters should be in passive mode, because the renderer filter will be calling their GiveNextAudioSample () function.
To start a renderer filter in active mode, call its StartSinkThread () function. This will instantiate the worker thread that will call SinkThreadStarted () when it is ready to execute. Override SinkThreadStarted () in order to do any initialization that must be done before the renderer can start sinking data.
To stop the renderer running in active mode, call its StopSinkThread () function. This will stop the worker thread. Before exiting, the worker thread will call the SinkThreadStopped () function. Override the SinkThreadStopped function in order to implement any cleanup that your renderer filter needs to do before it can stop
In Passive mode, the renderer filter does not instantiate any worker thread, but it relies on the worker thread of some other upstream filter. In this mode, the renderer filter waits for a call to its TakeNextAudioSample () function. The parameter to this call is a list of audio samples (one from each directly connected upstream filter). This function in turn calls the RenderAudioSamples () function with the same list of audio samples. RenderAudioSamples () consumes the data and returns.
To start a renderer filter in passive mode, start it by calling StartSink (). Override this function in order to implement any initialization that the filter might need to do.
To stop a renderer filter running in passive mode, stop it by calling the StopSink () function. Override it to implement any cleanup code that the filter needs to do before stopping.
- Please see observations for source filter
- Make sure that the filter is running either in active mode or passive mode
- Note that in both modes, the renderer filter takes one audio sample of data from each of the directly connected upstream filters before it renders them
The AudioSink class already implements all the behavior of the audio sinks. So all you have to do is
- Derive you class from AudioSink
- Override and implement the RenderAudioSamples () function
- Depending on which modes your filter will run in (active or passive) you might need to optionally override the StartSink/StopSink and/or SinkThreadStarted/SinkThreadStopped functions. Note that this is not necessary if you do not have any special initialization/cleanup requirements
- Check out the source code of RTPAudioSink and WaveAudioSink for examples of how to implement audio sink filters
Transform filters are both source filters and sink filters because they can both receive data from upstream filters as well as send data to downstream filters. They can have any number (one or more) directly connected input filters, as well as any number of (one or more) directly connected output filters. The AudioTransformer class implements the behavior of Transform filters. It derives both from AudioSource and AudioSink.
Transform filters are of two main classes – transformers and buffers.
A transformer is a passive transform filter that takes an input audio sample, processes it somehow, and sends it downstream – all in the same thread. It can act in a “push-mode” or in “pull-mode”.
- An upstream filter sends data to the transformer by calling TakeNextAudioSample ()
- In response to that, the implementation of TakeNextAudioSample () asks for data from each of the other directly connected upstream filters (except the one that called TakeNextAudioSample ()) by calling their GiveNextAudioSample () function
- Once it gets one audio sample from each of the upstream filters, TakeNextAudioSample () calls TransformAudioSamples () and passes to it the list of audio samples
- TransformAudioSamples () transforms the audio sample(s) for e.g. by encoding them in a certain format, changing the volume, etc. and creates one audio sample as its output
- Once TransformAudioSamples () returns its result, TakeNextAudioSample () pushes it further downstream to all the directly connected downstream filters by calling their TakeNextAudioSample () function
- A downstream filter asks for data from the transformer by calling its GiveNextAudioSample () function
- GiveNextAudioSample () in turns asks all the directly connected upstream filters for data by calling their GiveNextAudioSample () function
- It creates a list of the returned audio samples from the upstream filters (one audio sample from each upstream filter) and calls TransformAudioSamples () with it
- After TransformAudioSamples () is finished, GiveNextAudioSample () then pushes the resulting transformed data to all the other directly connected downstream filters (except the one that called GiveNextAudioSample ()) by calling their TakeNextAudioSample () function
The AudioTransformer class already implements the behavior of transformers, so it is very easy to implement a transformer. All you have to do is
- Derive your class from AudioTransformer, and
- Override the TransformAudioSamples () function
- If you have special initialization/cleanup requirements, optionally override and implement the StartTransform () and StopTransform () functions
- Please see the PCM2G711Transformer and G7112PCMTransformer code for examples of transformers
Since transformers are passive filters (i.e. they do not
have their own worker thread) you should start them by calling their
StartTransform () function and stop them by calling their StopTransform ()
function.
Buffers are special transform filters. Like transformer,
they are passive filters, but unlike transformers, where every input generates
an output, the input and output sides of buffers are not connected. This means
that a buffer is actually an audio sink and an audio source all rolled into
one. To the upstream filters, it looks like a passive audio sink, because it
consumes the audio sample and does not generate any output. To the downstream
filters, it acts like a passive audio source that can generate data upon request
- Upstream Side (where it acts like a sink)
o An upstream filter pushes data to the input side of the buffer filter by calling its TakeNextAudioSample () function
o Just like with renderer filters, TakeNextAudioSample () then asks for data from all the other directly connected upstream filters (except for the one that called TakeNextAudioSample ()) by calling their GiveNextAudioSample () function
o Once one audio sample has been received from each of the upstream filters, TakeNextAudioSample () creates a list of these audio samples, and calls RenderAudioSamples () with it
o When RenderAudioSamples () returns, so does TakeNextAudioSample ()
- Downstream Side (where it acts like a source)
o A downstream filter asks for data from the buffer filter by calling GiveNextAudioSample ()
o GiveNextAudioSample () acts just like it is a source, by calling GenerateData ()
o When GenerateData () returns an audio sample, GiveNextAudioSample () pushes it to all the other downstream filters (except the one that called GiveNextAudioSample ()) by calling their TakeNextAudioSample () function with the generated audio sample
o Finally GiveNextAudioSample returns, returning the generated audio sample to the caller
Implementing a buffer is pretty simple, although not as easy as implementing transformers. In order to implement a buffer filter, you have to
- Derive you class from AudioTransformer as usual
- Override and implement the RenderAudioSamples () function to do what the buffer filter should do when upstream filters send data to it. Most probably you would be saving this data in some kind of a buffer/queue and returning
- Override and implement the GenerateData () functions to implement what the buffer filter should do when a downstream thread asks for data from it. Most probably GenerateData () will extract data from the aforementioned buffer/queue and return it in an audio sample
- Check out the code for FixedSizeAudioBuffer (simple) and RTPJitterBuffer (complicated) for examples of how to implement buffer filters
- Note that there are two threads that access the buffer – one from the upstream filters that send data to the buffer filter by calling RenderAudioSamples (), and the other from the downstream filters that asks for data from the buffer by calling GenerateData ()
- Note that just like transformers, buffers are passive filters, because they do not instantiate their own worker threads, but depend on worker threads from upstream and downstream filters to drive it
- Buffers need not only be queues. They can also transform their data – but you have to make sure you make the call to transform the data in either GenerateData () or RenderAudioSamples ()
A common question from users is about active buffers – the case might be made for an active mode buffer transform filter. Such a filter for e.g. would enqueue the data from upstream filters and would have a worker thread that would drain the queue and push the data downstream in a timed manner. So it might make sense to have an active buffer filter with its own worker thread!
The answer is usually that it is not really required. Transform filters, especially buffer transform filters are already more complicated than transformers, or sources and sinks. Adding a worker thread will complicate matters further and will probably cause more headache than justified. Furthermore, due to the ease with which WinRTP can act in both push and pull modes, what one can do with an active source one can also do with an active sink! So, instead of making a buffer filter act like an active source, the user can more easily make one of the sinks connected to that buffer filter active – achieving the same end-result, but without the complication. Till now I have not come across a case where a transform filter needed to be active and there was no workaround that was simpler to implement.
But still, it would still be possible to make such a filter if the user really wants to – all he/she would have to do is call StartSink and StartSourceThread () on the buffer filter instead of starting it with StartTransform. But though possible, it is not recommended.
As mentioned before, data in the filter graph moves in audio samples. This section discusses audio samples and their lifetime.
Audio samples are buffers that can contain audio data and also have some headers. The audio sample header specifies information about the audio data, for e.g. what is the format of that data, how many bytes of data there are, etc. The AudioSample class implements audio samples. It also contains a set of helper functions. Anyway, the contents of the audio sample are as follows
- Audio sample reference count (just like windows COM objects) – note that AudioSample is NOT a COM object although it maintains refcounts
- Audio format that describes the encoding of the audio data
- Size of audio data
- Size of the audio buffer (i.e. max amount of audio data it can hold)
- RTP header (optional – can be null). In some cases it might be useful to have access to the header of the RTP packet that contained the data in this audio sample (for e.g. for packet reordering in the RTP jitter buffer). This is only usable if the audio sample was generated from a RTP packet
- Data Buffer containing the audio data
For efficiency purposes, it makes sense to have a pool of preallocated re-usable audio samples rather than creating them dynamically every time one is needed. The AudioSampleManager class implements such a sample pool. It always maintains a list of free audio samples, and when a filter needs an audio sample, it gives out one from the list. A newly created audio sample give out by the Audio Sample Manager always has a refcount of 1 (one). When a filter is done with an AudioSample, it calls Release () on the audio sample that reduces its refcount by one. When the refcount reaches zero, the audio sample goes back to the free list of the audio sample manager
Lifetime management of audio samples is automatically handled by the WinRTP framework using reference counts (great, isn’t it J). Buffer filters are the only case when the user might have to bother with refcounts. Anyways, here are the rules that are used for refcount management by WinRTP
Buffer filters are passive filters connected to active source. Due to their nature, they may frequently want to enqueue the received audio sample. In order to make sure that the enqueued audio sample is not returned to the AudioSampleManager, the buffer filter should call AddRef () on the received audio sample and Release () it when it has no more use for it. If the user is writing a buffer filter, then he/she has to take care of this AddRef () and Release ()
This section contains some design pointers that should help the user make a good design of the filter graph and application. Creating/Designing an application using WinRTP framework involves the following steps
- Finding out what components (filters) are needed and implementing them
- Building the filter graph by connecting the filters in the proper way
- Deciding how the filter graph will handle timing
- Deciding which filters need to be active and which ones should be passive
- Running the filter graph
Filters are the smallest unit of processing in the filter graph, and they should be designed that way. It is much better design to break a job down into many small operations, and designing a filter for each of them. Designing a filter that tries to do too much is a frequent mistake. In a simple filter graph, a filter is little more than a function call, so having more filters does not have any significant performance impact. On the other hand, having a lot of filters creates a “library” of operations and it is frequently the case that these small filters will be reusable in many parts of the filter graph.
A common mistake is while designing of filters that incorporate buffers. It will always be better to implement a buffer filter, and connect that to the input side of the filter. This way, buffers will be usable with any filter that needs to buffer the input, and each of them wont have to solve the problem again and again
Timing is an especially important topic as far as filter graphs are concerned. For efficiency and quality of audio, it is very important to minimize the number of threads in the filter graph and have one filter that drives the timing for all the others. Also, hardware based timers are always preferred to software timers – usually timer threads are not a good idea – do not use then unless you have to. One great source of timing is the sound card itself – since the sound card sources and sinks audio data at a fixed rate on average, we can use that to drive timing in the filter graph (hardware timing) rather than have timer threads. For e.g. if you want to mix three .wav files on the disk and play them through the speaker, the following filter graph will do the job. Rather than have a timer thread, we will let the graph run at full speed and let the speaker of the sound card control the timing
This is how the above filter graph will operate
- We start each of the file sources and mixers in passive mode
- We start the speaker in active mode
- The speaker worker thread will ask for data from the mixer which in turn will ask for data from each of the file sources, mix them and send the mixed data to the speaker
- The speaker will play the audio sample. If the data is 20ms in length, then the speaker will need 20ms to play it. This will automatically regulate the speed at which the worker thread runs
- When we are done, we should first stop the speaker’s worker thread by calling StopSinkThread, and then stop each of the file sources and the mixer in any order because there is not worker thread running
Some Notes/Observations
In this case, we would need a transcoding resource. WinRTP provides one – the ACMTransformer, which is actually implemented as a buffer. Remember that this means that there will be two threads operating on it – one on the input side feeding data to it, and one on the output side extracting data from it. The ACMTransformer can convert between any two formats that Windows can convert between, because it uses the Windows transcoding API. The filter graph would look exactly as before with one exception, i.e. between each file source and the mixer, we will have an ACMTransformer filter
This is how the above filter graph will operate
Operation of the File Source Threads
Operation of the Speaker Filter Thread
Notes
Timing of the File Source Threads
As you all know one of the things that come with WinRTP is the ready-made program that is written using the WinRTP architecture that makes it easy to implement a two way audio streaming application. It is in the form of a COM DLL called CCNSMT.dll. Lot of people use the name WinRTP to mean this component. A lot of you have expressed interest in understanding how it operates etc.
Armed with the knowledge in this document, once the user understands the WinRTP streaming architecture, he/she will be able to understand CCNSMT.dll pretty easily. Just look at the two files CCNMediaTerm.h and CCNMediaTerm.cpp that implement the interface. By looking at the code and what CCNSMT.dll does for each of the calls in its interface, the user will know all the details about
In short, CCNSMT.dll implements streaming by constructing two filter graphs – one for the transmit side and another for the receive side. In addition, it implements another class called FilePlay that is in turn implemented using WinRTP filters. The file play is nothing other than a file source, and ACMTransformer and a Volume setting filter connected together. I think this document along with the CCNSMT.dll source code is the best explanation of this component. Writing a doc would take too long and serve a limited purpose
Tracing of WinRTP components is configured through the registry. There is a DLL called TraceServer.dll packaged with WinRTP that needs to be in the search path (the %SYSTEM% directory is the best bet). Trace Files are saved in the working directory of the application that uses WinRTP. Trace files are named TraceFile_0000.txt to TraceFile_0009.txt. Each file has a max size of 2 Mbytes. Tracing is circular, i.e. once TraceFile_0009.txt is full, the Tracer will wrap around and overwrite TraceFile_0000.txt and so on.
You can set the desired level of testing by adjusting some registry keys. Trace levels are set per filter type. If there are multiple instances of a filter, the trace usually includes the value of “this” pointer so that we can differentiate between the different instances. The trace level registry keys are located in
HEY_CURRENT_USER\Software\Cisco Systems\MTC\Tracing
There is a registry key for each filter type (e.g. RTPAudioSource, WaveAudioSink etc.). The value is a hex number that determines the current trace level. If the trace level in a statement (in the source code) is numerically equal or lower than that specified in this registry key, then the statement will be traced. If it is numerically higher than the registry key, then it will NOT be traced. Please look at the source code for details of trace levels and their values. Important levels are
ALL (i.e. all trace statements to be printed) = 0x1fff0000
DET (Detailed) = 0x80000
ARB (ARBITRARY level) = 0x200000
EE (Entry Exit into functions) = 0x100000
SIG (Significant) = 0x80000
STATE (State Transitions) = 0x40000
SPECIAL = 0x20000
ERR (Errors) = 0x10000
The only way to find out what will be traced and what the trace statements mean will be to look at the source code for WinRTP
The “AllComponents” is a registry key that overrides the setting for every traceable component. If AllComponents is “0x00” then each component uses a tracing level specified by the corresponding registry key. If AllComponents is anything but 0x00, then its value is used as the tracing level for every component. So setting AllComponents to 0x10000 would make every component report only traces at ERROR level or numerically lower levels. So setting AllComponents to something like 0x1 would prevent every component from tracing anything. Of course the trace files will still be generated and they will contain some basic timestamp info, but none of the components will trace anything
© 2003, Cisco Systems, Inc.
THE INFORMATION HEREIN IS
PROVIDED ON AN “AS IS” BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF
NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.