October, 2003
MP4LIVE INTERNALS
Program Control
Media Flows and Nodes
Media Sources
Media Codecs
Media Frames
Media Sinks
List of Media Sources
List of Codecs
List of Media Sinks
MP4LIVE INTERNALS
This document provides an overview of the internals of the mp4live
application for those who intend to modify mp4live. See the README for information on using mp4live.
Program Control
The control flow of mp4live is easiest to understand if one first considers
the no GUI (--headless) mode of operation. In this mode, mp4live reads
a configuration file, and then creates a "media flow" which will use
this configuration. The media flow is started which causes the appropriate
threads of execution to be started. The main program thread then sleeps
for the desired duration, and upon awaking tells the media flow to stop.
Program execution then ends.
When the mp4live GUI is active the main program thread runs the GUI code.
In this case the GUI actions cause configuration information to be changed,
and the media flow to be started and stopped.
Media Flows and Nodes
The "media flow" (media_flow.h) is the top level concept that organizes
the processing activities of mp4live. A media flow is a collection of
media nodes (media_node.h), with forwarding rules between the nodes. Each
media node is a thread within the mp4live process. Each thread has
a message queue which can be used to control the threads. Currently, messages
are typically only used for starting and stopping the threads, and notifying
media sinks when new media frames from the media sources have become available.
Other coordination between threads can be achieved via the shared configuration
information data.
Media Sources
A "media source" (media_source.h) is a media node that acquires raw media
frames and processes them according to the target output(s) of the current flow
configuration. Currently, a media source may produce either audio, video,
or both. It may produce multiple media frames for an input frame. For
example, a video source may generate a reconstructed YUV video frame,
and an MPEG-4 encoded video frame.
Since much of the media processing is shared regardless of the details of
media acquisition, the base media source class (media_source.cpp) contains
the central code to encode media and maintain timing and synchronization.
The generic media processing is somewhat over-engineered at present to
allow for transcoding scenarios where the source is pre-existing encoded
media instead of a capture device that can be configured to match the
desired output.
The generic process for audio includes:
The generic process for video includes:
Media Codecs
There are two defined types of media codecs (encoders), audio and video.
These are defined in audio_encoder.h and video_encoder.h as abstract classes
that provide simple generalized interfaces to the encoder libraries.
The media codec classes are used by the media sources to invoke the media
encoders to transform raw, uncompressed media into encoded media.
Each supported media codec derives a class from the appropriate abstract
class, and provides code to map the generic interface to that provided by
the codec library.
Each encoder type also has a number of calls to set various variables for
rtp transmission or saving to an mp4 file. These are also defined in
audio_encoder.h and video_encoder.h
Encoders can be added dynamically by adding a new video encoder class, and
adding to the video_encoder_tables.cpp, or adding a new audio encoder
class and adding to the audio_encoder_tables.cpp. You will also want to add
the correct hooks to each routine in audio_encoder.cpp and video_encoder.cpp.
Media Frames
The output(s) of a media source is a "media frame" (media_frame.h). This is a
reference counted structure that contains: a pointer to malloc'ed media data,
the media data length, the media type, the timestamp of the media frame, the
media frame duration (used in audio only), and the timescale of the media frame duration (ticks
per second). If creating your own source, it is imperative to get a synchronized
timestamp between sources.
A media source constructs one or more "media frames" during its processing for
each acquired media frame. Each one of these output media frames is sent in
a message to all the registered sinks of the source. Note that if there are
N sinks, N messages are created that all point to 1 media frame, i.e. only
1 copy of the media data exists. The reference count on the media frame
ensures that as each media sink "frees" the media frame the reference count
is decremented. When the media frame reference count reaches 0, the media
frame data is free'ed and the media frame is destroyed.
An adjunct to the RTP transmitter is the SDP file writer (sdp_file.cpp)
which constructs an SDP file that can be used to tune into the RTP streams
V4L video_v4l_source.cpp
Acquires YUV12 images from a V4L (Video For Linux) device
V4L2 video_v4l2_source.cpp
Acquires YUV12 images from a V4L2 (Video For Linux) device (recommended)
OSS audio_oss_source.cpp
Acquires PCM16 audio samples from an OSS (Open Sound System) device
MP4 file_mp4_recorder.cpp
Writes media frames to an mp4 file
RTP rtp_transmitter.cpp
Transmits media frames via RTP/UDP/IP
Implements media specific RTP payloads as defined in IETF RFCs
SDL Previewer video_sdl_preview.cpp
Display video frames to a local video display
via the SDL multi-platform library
Raw Sink file_raw_sink.cpp
Writes raw media frames (YUV12 and PCM16) to a local named pipe
This enables sharing of the capture devices between mp4live and another
application.