Media Stream Selection (MESS)Ericsson ABFarogatan 6SE - 164 80 KistaSweden+46107147505+46107175550daniel.grondal@ericsson.comwww.ericsson.comEricsson ABFarogatan 6SE - 164 90 KistaSweden+46107141311+46107175550bo.burman@ericsson.comwww.ericsson.comEricsson ABFarogatan 6SE- Kista 164 90Sweden+46107148287magnus.westerlund@ericsson.comwww.ericsson.comThis document describes how media stream selection can be achieved in
both a conferencing scenario and peer to peer communication. To allow
endpoints to select specific media streams, all available media in the
session must be identifiable and there is a need for messages than can
be securely transported between endpoints and network nodes. This
document also describes a way to distribute the identification
information to all participating endpoints. The necessary messages can
potentially be mapped onto several different encodings, and this
document proposes one mapping that uses an extended version of the
Binary Floor Control Protocol.Multimedia conferencing is becoming more and more important. The
setup up of a multimedia conference is well defined, using for example
SIP and SDP. However, as SIP/SDP is used for session setup it leaves
little or no dynamic control over what media content to receive from
other participants during the session. This document targets this
weakness and describes functionality that grants receiving endpoints
capabilities to dynamically select what information and media content
are received from other participating clients.These terms are commonly used throughout the document:Media being sent from one specific
media capture device, such as a microphone for audio media, or
video camera for video media.An device that handles media that either
originates a number of media content, terminates a number of media
content, or some combination of both. As an example, an RTP Mixer
is considered as an endpoint, while a simple RTP Translator that
simply forwards all input streams is not.The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119.In a communication scenario where one or more endpoints offers more
than one media content, but where a receiving endpoint cannot handle all
simultaneous media content at once, there may be a need for that
endpoint to actively and dynamically during an ongoing conference select
what content to receive. A typical scenario would be a video conference
where some endpoints have multiple cameras capturing different aspects
of a room and a receiving endpoint can only render one video stream due
to e.g. hardware limitations. Today, the only way to solve this is to
have an RTP mixer handle the conference and let that choose one of the
streams based on some criteria. It is up to the RTP mixer implementation
which stream to choose, but a common criteria is some type of speaker
activity.It would be possible to let the receiving endpoints to choose which
media content(s) to receive, given that endpoints publish information
about what media content is available to all other endpoints and if
there would exist a protocol to request specific media content from
other endpoints. This functionality is what Media Stream Selection
(MESS) described in this document targets. It describes how to generate
and distribute media content information in both conferencing scenarios
as well as in point to point sessions. It also describes how to set up a
control channel to send messages between endpoints and finally defines a
set of messages that can be used to handle media content requests.This section presents some typical use cases targeted by MESS. The
scenario is an endpoint participating in a conference, receiving media
from a centralized conference node. It is assumed that all participating
endpoints have published information about what media content they are
offering. There are more available media from other participants in the
conference than what the receiving endpoint in the use case can present
simultaneously, and the conference node has some implemented policy how
to select which media to forward.An endpoint selects what content to receive from another endpoint
based on that endpoint's published media content information. An
endpoint can make new decisions about what content to receive
dynamically at any time during the session.An endpoint wishes to stop receiving content from another endpoint
e.g. due to low quality or other reasons. The set of excluded media
during a session is subject to change and an endpoint can make new
decisions to exclude content dynamically at any time during the
session.An endpoint renders received media and wants to replace the
received media with some other available media content. It can be seen
as an atomic combination of the two use-cases above, first excluding
one media content and effectively replacing it by including another.
And endpoint can make new substitute decisions dynamically at any time
during the session.An endpoint no longer has any specific wish to always include or
always exclude a certain content, but wants to return the decision to
forward streams or not to the conference node. An endpoint can reset
any previously included or excluded stream at any time during the
session. At the beginning of the session, all media streams SHALL have
a state corresponding to being reset and thus be under the conference
node policy control.An endpoint wishes to remove all previous decisions about included
and excluded media. This method is a shortcut to avoid repeated reset
messages described in .To be able to identify the available media content, all different
content must be given a unique media ID. The given ID must also be
distributed to all participating endpoints. The following sections
describe how to generate such IDs and how to distribute them.The text in SDP Media
Description describes the specific case where media description
is signaled with SDP, but other signaling
methods MAY be used, in which case the mapping to SDP-specific lines and
attributes do not apply and other mandatory mappings SHOULD instead be
defined in a separate RFC.The text in describes the
specific case when RTP is used for media
transport. Other media transports MAY be used, in which case the mapping
to RTP does not apply and other mandatory mappings SHOULD instead be
defined in a separate RFC.To request specific media content, all involved endpoints need to
agree on how to uniquely identify different content with a unique
media ID.There is no particular algorithm specified how to generate unique
media IDs, as it will depend on which media transport is used. The
main requirements on such an algorithm are that media IDs are unique
among all communicating endpoints and that all endpoints share the
same definitions on what media streams are identified by what media
IDs.Assuming all available media content from all communicating
endpoints are associated with some kind of media ID, those media IDs
need to be distributed to endpoints wishing to actively control what
content to receive. There might also be other interesting per-media
related information that needs distribution, such as e.g. naming or
describing individual media content to aid selection.Endpoints wishing to join a session are responsible to send
information about media content they will make available to the other
party or parties. This is done by generating media IDs, or other
sufficiently unique identification that can be used for generation of
media IDs, for all transmitted media content. Depending on the
capabilities of the signaling protocol used, an endpoint can also have
the opportunity to convey other information than the media ID, such as
e.g. describing or naming media content explicitly.The SIP Event Package for Conference
State defines an XML schema used for distribution of conference
information. The schema defines elements (among others) for users,
endpoints and media. The defined <media> element contains a
media ID attribute. This attribute SHALL be used to carry generated
media IDs. This means that media ID only needs to be unique within an
endpoint context and referring clients MUST use both user, endpoint
information and media ID to uniquely identify media content. User and
endpoint information are relevant in a scenario covering multiple
users and/or endpoints (e.g. where a middle node is responsible for
forwarding requests or making decisions about media content
selection), but may be redundant for a point to point scenario.Any description or naming of individual media content published by
endpoints (as described in the previous section) SHOULD be included in
the XML as body of <display-text>, which is another sub element
of <media>. There may exist alternatives to obtain naming and
description information, but it will in general depend on what is
supported by the used media description protocol.Reception of media content information is dependent upon in what
context the endpoint exists. In a conferencing scenario, the
distribution of media information is in general different than
distribution of media content information in a point to point session,
which must be taken into account when defining use of MESS with media
description protocols.When RTP is used for transmission of media content, a single RTP
session can transfer a number of different media content. In such case
every received data packet must carry an identifier, or something that
can be used as identifier, to separate individual content. Without
such an identifier it is simply not possible to demultiplex incoming
packets correctly. Using other protocols for transmission offers
similar problems when multiplexing.In the case of RTP, SSRC could be used as the sole identifier, but
to avoid changing ID if the SSRC changes (e.g. due to an SSRC
collision) an identifier not dependent on, but related to, SSRC is the
best choice.RFC 4575, a sub element of
<media> defines an element <src-id> that MUST be used to
carry the SSRC selected for the corresponding media content. This
enables an endpoint to do reverse look-up of media ID on incoming
packets using SSRC, or CSRC in the case media streams are aggregated
by an RTP mixer.This section applies when SDP media description is used with RTP
Media Transport. Use of MESS with other media transport in SDP MAY be
used, but that is out of scope for this document and SHOULD instead be
described in a separate RFC.The generated RTP media IDs MUST be included as ssrc attributes
(described in Source-Specific SDP
Attributes).Assuming a single media in an SDP media block, using an i-line (as
described in SDP) is sufficient to name
an individual media content. If a media block carries information
about multiple SSRCs, this method is not enough to name all different
media content. For this purpose a new source-specific attribute is
proposed (previously mentioned in
draft-lennox-mmusic-sdp-source-selection-02).The new, optional, source-specific attribute, with identical syntax
and semantics of <description> as the i-line <session
description> in SDP, except that it is specified per SSRC, provides
a textual description of the media content represented by the SSRC
included in the attribute declaration.In the case of RTP, an intercepting node in the network could be
responsible for generating media descriptions upon reception of the
actual RTP stream. However, such a solution will suffer from the fact
that not all media may be sent to that node at all times. This would
introduce a delay of media description creation until the intercepting
node has received RTP packets from all media sources.In cases where a Media Gateway and it's controller are separate
entities (see e.g. Media Gateway Control
Protocol), such as in 3GPP IMS split architecture where MRFP
and an MRFC exchange SDP information, e.g. through H.248 or SIP, the
MRFC receives the SIP INVITE with SDP from participants and therefore
also information about what SSRCs the endpoint intends to use. The
MRFP will see incoming SSRCs in the actual RTP streams, but not before
any media traffic has occurred. The MRFC is also responsible for
publishing the conference XML data, e.g.
as a body in SIP NOTIFY to SUBSCRIBE'd endpoints. In short, the MRFC,
or any other node acting as Conference AS, has the best information
for generating and distributing media IDs and is chosen as the
responsible node.There is no big difference in a call-out conferencing scenario
where a conferencing node calls out to invited participants. The
initial SDP will hold information about the capabilities of the
network node and responding endpoints provide answer SDP's with media
description (including SSRC) of there intended/offered media.In a distributed conference with several involved Conferencing
AS'es, and also if 3GPP IMS split architecture is not used, the
protocol to transfer media ID and SSRC information between
Conferencing AS'es / MRFC's is out of scope for this document.A conference node SHOULD try to locate information from endpoints
that name or describe individual media content in the SDP, and include
the information in the body of the per-media <display-text> tag.
The information SHOULD be taken from, in this order if more preferred
information is missing:The value from an "information" SSRC attribute described
aboveThe value from an i-line within the media blockThe value field of a label
attribute within the media blockThe value from an i-line at the SDP session levelOther sources of information MAY be used, MAY be more
preferred, and the <display-text> MAY also be empty. The
receiving client MAY e.g. use the <display-text> content to
amend originating user/endpoint information presented to the receiving
user with the media content specific information.In point to point communication, endpoints could publish SSRC
information using SDP in request and response. This is e.g. valid
for the SDP in both the SIP INVITE and the corresponding 200 OK, or
in any provisional responses.The list of published SSRCs is the list of offered media content
available for request. Also, the SDP can be searched for the
information attribute described in to extract information
about naming of media content.In a conferencing scenario, the media content information is
distributed using an XML body following the schema defined in Conference package, e.g. carried by a SIP
NOTIFY. For use with SIP and once a client has SUBSCRIBEd for
conference information, it SHOULD be prepared to receive SIP
NOTIFYs. If the SIP NOTIFY carries this type of XML, the receiving
endpoint can extract information about media IDs and media content
descriptions by finding all <media> elements in the received
XML. This produces a valid request list of available media ID's and
their corresponding SSRC values.To request media streams, a communication channel between the
endpoint and the node in control of the media streams needs to be setup.
This document describes use of SIP/SDP for this purpose, but other
methods MAY be used and SHOULD then be described in a separate RFC. The
basic requirements on the communication channel used for MESS are to
offer reliable transmission and a near real time response.Binary Floor Control Protocol is described in RFC 4582. BFCP is a protocol that is likely to
already be supported by conference-aware nodes and clients. This makes
it easy to extend existing implementations to handle any new defined
message. It also uses a reliable transport. In the context of media
stream selection it is highly related and is thus regarded a feasible
choice.All MESS messages defined in this document are extensions to the
existing messages described in BFCP.
This means that they are not dependent upon any other message and can
be implemented separately from legacy messages.The legacy floor control functionality of BFCP requires additional
protocols to handle floor creation. That is not needed by MESS and
thus out of scope for this document. A possible way is described in
SDP for BFCP.BFCP defines 13 primitives used in
BFCP. To implement MESS as an extension to BFCP requires this set of
primitives to be extended with two other called "MediaSelection"
having a value of 32 and "MediaSelectionAck" having a value of 33.
MESS uses the same common header, referred to as COMMON-HEADER, as
defined in BFCP. The attributes also
follows the same pattern as described in that RFC, i.e. they are in
the format Type-Length-Value.Table 1: Media Selection PrimitivesIn addition to these new primitives, MESS also defines a set of new
attributes.Table 2: Media Selection AttributesThe following is the format of the OPERATION attribute.Operation id: This field contains a 16-bit vale that identifies
an operation to be performed. Defined entries in this document is
Include, Exclude, Substitute, Reset, and Reset All.Table 3: MESS OperationsThe MEDIA-IDENTIFICATION attribute is a grouped attribute
consisting of a header, referred to as MEDIA-IDENTIFICATION-HEADER
with identification type information followed by a sequence of other
MEDIA-IDENTIFICATION attributes. The following is the format of the
MEDIA-IDENTIFICATION-HEADERThe ID Type field is a 8 bit field describing the type of media
id. Defined types in this document are:Table 4: MESS Media Identification TypesThe following describes the format of the grouped attribute. The
Media ID field will contain different information based on the ID
Type. The Media ID field in MEDIA-IDENTIFICATION attributes of type
"User" is only allowed to hold MEDIA-IDENTIFICATION of type
"Endpoint", and Media ID field in MEDIA-IDENTIFICATION attributes of
type "Endpoint" is only allowed to hold MEDIA-IDENTIFICATION
attributes of type "Media". The Media ID field in
MEDIA-IDENTIFICATION attributes of type "Media" holds the actual
media ID number.This allows expression of tree-like identifications with
attributes of type User being root node with attributes of Endpoints
as leafs containing only attributes of type "Media". The following
expresses this relationship in ABNF
syntax.The following is a description of the CHANNEL-IDENTIFICATION
attribute.This attribute is used to identify a specific channel to/from an
endpoint.MESS defines 5 messages used to control what media content to
receive.Floor participants MAY use the messages in this clause without
having obtained a floor, and floor servers MAY accept the messages
from participants not owning the floor. When floor control is bypassed
in this way, the FLOOR-ID SHALL be ignored by receivers of this
message implementing this RFC, and senders implementing this RFC SHALL
set it to 0.If a floor chair requires a floor participant to own the floor
before using the messages of this clause, they SHALL both follow
regular BFCP floor control procedures as defined in BFCP. For example, a floor participant not
allowed to access the floor will receive a BFCP Error message
containing Error Code 5 (Not authorized).When a floor control server implementing this RFC sends a BFCP
SUPPORTED-PRIMITIVES attribute, the codes for messages defined in this
clause MUST be included in the Primitives list.Extension attributes that may be defined in the future are referred
to as EXTENSION-ATTRIBUTE in the ABNF, similarly as was done in
section 5.3. of BFCP.All MediaSelectionMessages MUST be replied to with a
MediaSelectionAck. The format of the MediaSelectionAck is as
follows:The COMMON-HEADER of such a message MUST contain the transaction
id of the acknowledged message.MESS Include messages are sent as BFCP messages with primitive
"Media Selection" and the OPERATION attribute set to value
"Include". Then follows a list of media identifications representing
media streams that are always to be included from now on. Since
there might be more than one transport channel in between the
requesting node and the receiving node, the message MAY also contain
information about which transport channel to use, a channel ID. In
case RTP is used as transport, this channel ID SHOULD be a
combination of SSRC and RTP session identification. If channel ID is
missing there are no restrictions on the used transport and any
transport channel MAY be used to deliver the stream. Other
transports are out of scope for this document but need a similar
identification possibility. Requests to Include an already included
media SHALL be ignored. Note that the message is defined in a way
that makes it additive and identifications for previously included
media SHOULD NOT be included for every new request.A receiver of an include message MUST respond with a
MediaSelectionAck containing the same transaction id.MESS Exclude messages are sent as BFCP messages with primitive
"Media Selection" and the OPERATION attribute set to value
"Exclude". Then follows a list of media identifications representing
media streams that are always to be excluded from now on. Requests
to Exclude an already excluded media SHALL be ignored. Note that the
message is defined in a way that makes it additive and
identifications for previously excluded media SHOULD NOT be included
for every new request. The exclude message MAY contain an optional
channel ID limiting the exclude message so that the excluded stream
might be sent using any other transport channel if available. If the
channel ID is missing in the exclude message this means that the
exclude concerns any channel between an endpoint and a sender.A receiver of an exclude message MUST respond with a
MediaSelectionAck containing the same transaction id.MESS Substitute messages are sent as BFCP messages with primitive
"Media Selection" and the OPERATION attribute set to "Substitute".
Then follows a list of pairs of tuples called MEDIA-TUPLE. A
MEDIA-TUPLE contains a MEDIA-IDENTIFICATION and an optional
CHANNEL-IDENTIFICATION.The following is a formal description of MEDIA-TUPLE.The following is a formal description of the Substitute
message.In the list of pairs of MEDIA-TUPLEs, the pair MUST be interpret
as follows. The first MEDIA-TUPLE defines the media stream, and
possibly a transport channel, that should be replaced and the second
MEDIA-TUPLE defines the media stream, and optionally a transport
channel, to use as a replacement for the first MEDIA-TUPLE.Note that the included MEDIA-INDENTIFICATIONs typically need to
be of type USER-SUB-IDENTIFICATION, since they in general do not
refer to media from the same user, but other addressing MAY be
sufficient.Since CHANNEL-IDENTIFICATION is optional and might be missing for
any MEDIA-TUPLE in the above description, such a missing attribute
should be interpreted as follows.All media occurrences
should be replaced using the already used channels. This is the
same as an atomic version of a message series containing an
exclude message and an include message without
CHANNEL-IDENTIFICATION attributes.Replace
the identified media only on the identified channel. This is the
same as an atomic version of a message series containing an
exclude message with a CHANNEL-IDENTIFICATION attribute and an
include message without CHANNEL-IDENTIFICATION attribute.Replace
all occurrences of an identified media with the replacing media
stream using the identified channel. This is the same as an
atomic version of an exclude message without
CHANNEL-IDENTIFICATION attribute followed by an include message
with a CHANNEL-IDENTIFICATION.Replace
the identified media on the identified channel with the
replacing media using the identified channel. This is the same
as an atomic version of an exclude message followed by an
include message, both holding a CHANNEL-IDENTIFICATION
attribute.A receiver of a substitute message MUST respond with a
MediaSelectionAck containing the same transaction id.MESS Reset messages are sent as BFCP messages with primitive
"Media Selection" and the OPERATION attribute set to "Reset". The
message carries a list of MEDIA-IDENTIFICATION to be reset. It does
not matter if the media described by MEDIA-IDENTIFICATION has been
excluded, included or neither of them before. The result at the
floor control is always the same, the media associated with the
received id will no longer be subject to explicit
inclusion/exclusion. Requests to Reset an already reset media SHALL
be ignored.A receiver of a reset message MUST respond with a
MediaSelectionAck containing the same transaction id.MESS Reset All messages are sent as BFCP messages with primitive
"Media Selection" and the OPERATION attribute set to "Reset All". It
has no attributes. The message is equivalent to a MESS Reset message
including MEDIA-IDENTIFICATION attributes of all streams that have
previously been specified in "Include", "Exclude" or as second
MEDIA-IDENTIFICATION attribute in "Substitute", effectively
releasing all existing media streams from being subject to
inclusion/exclusion. This operation can fully reset the
inclusion/exclusion state even if the requesting endpoint has lost
track of what restrictions were previously put.A receiver of a reset all message MUST respond with a
MediaSelectionAck containing the same transaction id.This document does define an acknowledge response () as well as an error message
with several different error reasons.BFCP defines attributes for error
handling. The BFCP Error message in BFCP section
5.3.13 SHALL be used also for error reporting applicable to this
RFC.BFCP Table 5 defines 9 error codes used
in floor control. This document defines the following additional error
codes that MAY be used in MESS responses:Table 5: Media Selection Error CodesThe exact reason for the failure MAY be included as UTF8 encoded text
in the field "Error specific details" of the BFCP ERROR-CODE attribute.
The ERROR-INFO attribute MAY also be used.RTP is a widely used protocol to transfer media. Usage of MESS when
media transport is handled using RTP might impact how RTCP reports must
be handled when excluding media. In the case where RTP Translators exists in between endpoints and
if the RTP Transport Translators are able to adjust their forwarding
rules based on the signalling defined in this document, RTCP reporting
may become inconsistent for excluded media content. How this should be
handled is out of scope for this document. The operations described in
MESS are consistent with the operation of RTP mixers or direct end-point
to end-point topologies.Note that the SDP in the examples below is not complete. Only
relevant parts have been included.A clients joins a conference by sending an SDP according to the
following:In this SDP Alice explicitly names her video stream "Alice cam" by
using the new attribute defined in this document. This information is
associated with a specific SSRC.A conferencing node in the network then sends the following SIP
NOTIFY sample body to subscribed clients.Any subscribing endpoint that receives this information can now
actively request the "Alice cam" media from sip:alice@example.com to
be explicitly included in received media streams. This is done by
sending an Include message as defined in this document (some fields
not encoded for clarity):The receiver of this message MUST send an acknowledgement using the
same transaction ID as soon as possible.Following the guidelines in SDP, in
SDP Grouping Framework and in RTP, the IANA is requested to register:A new source-specific attribute named "information" as defined in
.Add the following entries to the BFCP
registry:Primitives from Table 1Attributes from Table 2Error Codes from Table 5Start a new registry for this document with:Operations from Table 3Media Identification Types from Table 4When using MESS there is a potential risk of exposing client behavior
to other participants. Consider the case where multiple endpoints
participates in a conference. Also assume that media transport is done
using RTP. If the network between endpoints contains one (or more) RTP
translators and even if MESS communication is strictly between floor
server and floor participant, the RTCP traffic to/from endpoints could
expose information about endpoints excluding other endpoints. Previously
received RTCP traffic replaced with no traffic could be leaking
information about an endpoint excluding other endpoints.Jonanthan Lennox and Henning Schulzrinne for their proposal of a
source-specific information attribute in the expired Internet Draft
draft-lennox-mmusic-sdp-source-selection-02.