Multi-path Label Switched Paths Signaled Using RSVP-TE
Juniper Networks1194 N. Mathilda Ave.SunnyvaleCA94089USkireeti@juniper.net
Routing
ECMP multipath TE
This document describes extensions to Resource ReSerVation
Protocol - Traffic Engineering for the set up of multi-path
Traffic Engineered Label Switched Paths (LSPs) in Multi Protocol
Label Switching and Generalized MPLS networks, i.e., LSPs that
conform to traffic engineering constraints, but follow multiple
independent paths from the source to the destination that allow
load balancing.
In selecting a protocol for setting up and signaling "tunnel"
Labeled Switched Paths (LSPs) in Multi Protocol Label Switching
(MPLS) and Generalized MPLS (GMPLS) networks, one first chooses
whether one wants Equal Cost Multi-Path (ECMP) load balancing or
Traffic Engineering (TE). For the former, one uses the Label
Distribution Protocol (LDP) (); for the
latter, the Resource ReSerVation Protocol - Traffic Engineering
(RSVP-TE) (). [Two other criteria, the
need for fast protection and the desire for less configuration,
are no longer the deciding factors they used to be, thanks to "IP
fast reroute" () and "RSVP-TE automesh"
()].
This document describes how one can set up a tunnel LSP that has
both ECMP and TE characteristics using RSVP-TE. The techniques
described in this document can be used to create a single overall
"ECMP TE LSP" to a single destination that consists of several
"sub-LSPs", each taking a different path through the network to
the same destination. The techniques can also be used to create a
single ECMP TE LSP to multiple equivalent destinations (such as
equidistant BGP nexthops announcing a common set of reachable
addresses), such that each destination is served by one or more
sub-LSPs. The techniques described here borrow the notion of
sub-LSPs from .
Several options are available for ECMP TE LSPs. One is that the
ingress Label Switching Router (LSR) computes (or otherwise
obtains) all sub-LSP paths; alternatively, LSRs along the various
paths can compute paths further downstream (using techniques such
as "loose hop expansion", as in ).
Another is that an RSVP Path message can contain information about
exactly one path through the network (or sub-LSP); alternately, a
Path message can contain information about more than one such
path. A third option is that the various paths that make up the
multi-path LSP have equal cost (or distance) from ingress to
egress (i.e., ECMP), as opposed to paths that may have differing
costs. Another option (mentioned above) is to terminate a
multi-path LSP on a single egress or on several equivalent
egresses. For now, the first of each of these alternatives is
assumed; future work can explore other choices.
The terms "tunnel", "tunnel LSP" and "LSP" all refer to a
container LSP from an ingress LSR to egress LSR(s). An LSP is
the unit of configuration, signaling and management.
An ECMP (or generally, a multi-path) TE LSP is called a
Multi-path Label Switched Path (MLSP), and consists of one or
more sub-LSPs.
A sub-LSP consists of a single path from the ingress to one
egress. A "regular" point-to-point TE LSP is equivalent to an
MLSP with a single sub-LSP.
The "downstream links" of an LSR X with respect to an MLSP Z is
the set of all links adjacent to X traversed after X by at least
one sub-LSP of MLSP Z.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described
in .
An MLSP is configured at the ingress with various constraints
typically associated with TE LSPs, such as destination LSR(s),
bandwidth (on a per-class basis, if desired), link colors, Shared
Risk Link Groups, etc. [Auto-mesh techniques
() can be used to reduce configuration;
this is not described further here.] In addition, parameters
specifically related to MLSPs, such as how many (or the maximum
number of) sub-LSPs to create, whether traffic should be split
equally across sub-LSPs or not, etc. may also be specified.
The ingress LSR can use the configuration parameters to decide how
many sub-LSPs to compute for this MLSP and what paths they should
take. Each sub-LSP MUST meet all the constraints of the MLSP
(except the bandwidth). The bandwidths (per-class, if applicable)
of all the sub-LSPs MUST add up to the bandwidth of the MLSP. If
a Path Computation Element (PCE; ) that is
multi-path LSP-aware is used, the PCE is subject to these same
requirements; how MLSP requirements are signaled to a PCE is
beyond the scope of this document.
Having computed (or otherwise obtained) the paths of all the
sub-LSPs, the ingress A then signals the MLSP by signaling all the
individual sub-LSPs across the MPLS/GMPLS network. If multiple
sub-LSPs of the same MLSP pass through LSR Y, and Y has downstream
links YP, YQ and YR for the various sub-LSPs, then Y has to load
balance incoming traffic for the MLSP across the three downstream
links in proportion to the sum of the bandwidths of the sub-LSPs
going to each downstream (see ).
One must distinguish carefully between the (signaled) bandwidth of
a sub-LSP, a static value capturing the expected or maximum
traffic on the sub-LSP, and the instantaneous traffic received on
a sub-LSP, a constantly varying quantity. Suppose there are three
sub-LSPs traversing Y, with bandwidths 10Gbps, 20Gbps and 30Gbps,
going to P, Q and R respectively. Suppose further Y receives some
traffic over each of these sub-LSPs. Y must balance this received
traffic over the three downstream links YP, YQ and YR in the ratio
1:2:3.
In an IP or LDP network, incoming traffic arriving at A headed
for B will be split equally between M and X at A. Similarly,
traffic for B arriving at Y will be split equally among P, Q and
R. If the traffic arriving at A for B is 120Gbps, then the AMB
path will carry 60Gbps, the paths AXYPTB, AXYQTB and AXYRB will
each carry 10Gbps, and the AXSB path will carry 30Gbps. We'll
call this "IP-style" load balancing.
Note: all load balancing is subject to the overriding
requirement of mapping the same "flow" to the same downstream.
(What constitutes a "flow" is beyond the scope of this
document.) This requirement takes precedence over all attempts
to balance traffic among downstreams. Thus, the statements
above (e.g., "the AMB path will carry 60Gbps") are to be
interpreted as ideal targets, not hard requirements, of load
balancing.
One can simulate the IP or LDP ECMP behavior with TE-based ECMP
by creating an MLSP with five sub-LSPs S1 through S5 taking
paths AMB, AXYPTB, AXYQTB, AXYRB and AXSB, with bandwidths
60Gbps, 10Gbps, 10Gbps, 10Gbps and 30Gbps, respectively.
With such an arrangement, the MB link carries 60Gbps while the
RB link carries just 10Gbps. If one wishes instead to carry
equal amounts of traffic on the links incoming to B, then one
could arrange the sub-LSPs S1 to S5 to have bandwidths 30Gbps,
15Gbps, 15Gbps, 30Gbps and 30Gbps, respectively. In this case,
the bandwidth on each of the four links going to B is 30Gbps,
illustrating some of the capabilities of TE-based ECMP.
Staying with this example, A has one sub-LSP of bandwidth 30Gbps
to M and four sub-LSPs of total bandwidth 90Gbps to X. Thus, A
should load balance traffic in the ratio 1:3 between the AM and
the AX links. Similarly, X has three sub-LSPs of total
bandwidth 60Gbps to Y and one sub-LSP of bandwidth 30Gbps to S,
so X should load balance traffic 2:1 between Y and S. Y has a
sub-LSP of bandwidth 15Gbps to each of P and Q and one sub-LSP
of bandwidth 30Gbps to R, so Y should load balance traffic 1:1:2
among P, Q and R, respectively. Thus, in general, TE-based ECMP
does not assume equal distribution of traffic among downstream
LSRs, unlike IP- or LDP-style ECMP.
In , there are potentially 2x3x5=30 ECMP
paths between A and B. With IP or LDP, exploiting all these
paths is straightforward, and doesn't need a lot of state. With
an MLSP as seen so far, this would require 30 sub-LSPs to
achieve equivalent load balancing. This suggests that a
different approach is needed to efficiently achieve IP-style
load balancing with TE LSPs. To this end, we introduce the
notion of "equi-bandwidth" (EB) sub-LSPs and EB MLSPs. A
sub-LSP is equi-bandwidth if its "E" bit is set (see
). An MLSP is equi-bandwidth if all
of its sub-LSPs are equi-bandwidth.
If a set of EB sub-LSPs of the same MLSP traverse an LSR S, say
to downstream links SP, SQ and SR, then S MUST attempt to load
balance traffic received on these EB sub-LSPs equally among the
links SP, SQ and SR, independent of how many sub-LSPs go over
each of these links. Furthermore, S MUST redistribute traffic
received from each of its upstream LSRs, and SHOULD redistribute
all traffic received from upstream as a whole. One can do the
former by signaling the same label to each of its upstream LSRs;
one can do the latter by signaling the same label to all
upstream LSRs (see ). For example, in
, if L sends 12Gbps of traffic to S and M
sends 18Gbps to S, S can redistribute L's traffic by sending
4Gbps to each of P, Q and R; and can similarly send 6Gbps of M's
traffic to each of P, Q and R. Alternatively, S can load
balance the aggregate 30Gbps of traffic received from L and M to
each of P, Q and R, thus sending 10Gbps to each. EB sub-LSPs
have an added benefit of not requiring unequal load balancing
across links, which may pose problems for some hardware.
Given the notion of EB sub-LSPs and EB MLSPs, A can signal an EB
MLSP Z comprised of five EB sub-LSPs E1 through E5 with the
following paths: ALSPTUB, AMSQTVB, ALSRTWB, AMSPTXB and ALSQTYB
(respectively). Then, A has two downstream links for the five
sub-LSPs, AL and AM, between which A will load balance equally.
Similarly, S has three downstream links, SP, SQ and SR; and T
has five downstreams, TU, TV, TW, TX and TY. Thus the load
balancing behavior of the MLSP will replicate IP load balancing.
The state required for an EB MLSP to achieve IP-style load
balancing is somewhat greater than for LDP LSPs, but
significantly less than that for multiple "regular" TE LSPs, or
for a non-EB MLSP.
Some of the power of TE-based ECMP was illustrated in the above
examples. Another is ability to request that all sub-LSPs avoid
links colored red. If in the example network in
, the QT link is colored red but all other
links are not, then there are four ECMP paths that satisfy these
constraints, and the traffic distribution among them will
naturally be different than it would without the link color
constraint.
One can also ask whether an MLSP with sub-LSPs is any better
than N "regular" LSPs from the same ingress to the same egress.
Here are some benefits of an MLSP:
With an MLSP, there is a single entity to provision, manage
and monitor, versus N separate entities in the case of LSPs.
A consequence of this is that with an MLSP, changes in
topology can be dealt with easily and autonomously by the
ingress LSR, by adding, changing or removing sub-LSPs to
rebalance traffic, while maintaining the same TE
constraints. With individual LSPs, such changes would
require changes in configuration, and thus are harder to
automate.
An ingress LSR, knowing that an MLSP is for load balancing,
can decide on an optimum number of sub-LSPs, and place them
appropriately across the network to optimize load balancing.
On the other hand, an ingress LSR asked to create N
independent LSPs will do so without regard to whether N is a
good number of equal cost paths, and, more importantly, may
place several of the N LSPs on the same path, defeating the
purpose of load balancing.
The EB sub-LSP mechanism will, in many cases, result in far
fewer sub-LSPs than independent LSPs and thus less control
plane state.
Finally, an MLSP will usually have less data plane state
than N independent LSPs: whenever multiple sub-LSPs traverse
a link, a single label will be used for all of them, whereas
if multiple LSPs traverse a link, each will need a separate
label.
Definition: Let G=(V, E) be a directed graph (or network), and
let A and B in V be two nodes in G. Let T be the traffic
arriving at A destined for B. T is said to be "IP-style" load
balanced if for every node X on a shortest path from A to B, the
portion of T arriving at X is split equally among all nodes Yi
that are adjacent to X and are on a shortest path from X to B.
Theorem: An MLSP can accurately mimic IP-style load balancing
between any two nodes in any network.
Proof: left to the reader.
Corollary: MLSPs provide a strictly more powerful load
balancing mechanism than IP-style load balancing.
An MLSP is identified by an LSP_TUNNEL SESSION object defined in
. All sub-LSPs of an MLSP have the same
field values in their LSP_TUNNEL SESSION object.
A sub-LSP of an MLSP is identified by the LSP_TUNNEL SESSION
object plus a new Sender Template object called the MLSP_TUNNEL
Sender Template. The MLSP_TUNNEL Sender Template comes in two
flavors, IPv4 and IPv6, shown below. The 15-bit Sub-LSP ID
uniquely identifies a sub-LSP of an MLSP, and stays the same
during the lifetime of the sub-LSP. The LSP ID may change as in
to let a sub-LSP share resources with
another incarnation of the sub-LSP, for example to reroute
and/or change bandwidths of the sub-LSP. The "E" bit defines
whether a sub-LSP is an equi-bandwidth sub-LSP (E=1) or not
(E=0). The equi-bandwidth character of a sub-LSP (i.e., the
value of the E bit) MUST remain the same from ingress to egress
as well as during the lifetime of a sub-LSP.
The MLSP_TUNNEL Filter Specification also comes in two
flavors, IPv4 and IPv6. The formats are identical to the IPv4
and IPv6 formats (respectively) of the MLSP_TUNNEL Sender
Template. The Class numbers for both are FILTER
SPECIFICATION, and the C-Types are (respectively)
MLSP_TUNNEL_IPv4 (TBD) and MLSP_TUNNEL_IPv6 (TBD).
A LSR S that receives Path messages for several sub-LSPs of the
same MLSP from the same upstream LSR SHOULD allocate the same
label for all the sub-LSPs. This simplifies load balancing for
the aggregate traffic on those sub-LSPs. If the sub-LSPs are EB
sub-LSPs, then S SHOULD allocate the same label for all EB
sub-LSPs of the same MLSP that pass through S, regardless of
which upstream LSR they come from. This allows S to load
balance the aggregate traffic received on the MLSP, as all the
MLSP traffic arrives at S with the same label. However, an LSR
that can achieve the load balancing requirements independent of
label allocation strategies is free to do so.
Since MLSPs are traffic engineered, there needs to be strict
bandwidth accounting, or admission control, on every link that
an MLSP traverses. For non-EB sub-LSPs, this is
straightforward, and analogous to regular TE LSPs. However, for
EB sub-LSPs, two new procedures are needed, one for signaling
bandwidth, and the other for admission control. First, for a
given MLSP Z, an LSR X MUST ensure (via signaling) that the
total incoming bandwidth of EB sub-LSPs of MLSP Z is divided
equally among all the downstream links of X which at least one
of the EB sub-LSPs traverses. Second, LSR X MUST ensure that,
for each upstream link of X, there is sufficient bandwidth to
accommodate all EB sub-LSPs of MLSP Z that traverse that link.
Let's take the example of , with MLSP Z
having five EB sub-LSPs E1 to E5, and say that MLSP Z is
configured with a bandwidth of 30Gbps. Here are some of the
steps involved.
LSR A, being the ingress, has no upstream links. A has two
downstream links, AL and AM. Three EB sub-LSPs of MLSP Z
traverse AL, and two traverse AM. A MUST signal a total of
15Gbps for the sub-LSPs to L, and a total of 15Gbps for the
sub-LSPs to M. The required bandwidth may be divided up
among the sub-LSPs to L (similarly, to M) in any manner so
long as the total is 15Gbps. For example, A can signal
sub-LSP E1 with 15Gbps, and sub-LSPs E3 and E5 with 0
bandwidth.
LSR L has one upstream link AL with three EB sub-LSPs with a
total bandwidth of 15Gbps. L MUST ensure that 15Gbps is
available for the AL link. If this bandwidth is not
available, L MUST send a PathErr on ALL of the EB sub-LSPs
on the AL link. Let's assume that the AL link has
sufficient bandwidth.
Next, it is up to L to decide how to divide the incoming
15Gbps among the three downstream EB sub-LSPs to S. Say L
signals sub-LSP E1 with 15Gbps, and the others with 0
bandwidth.
LSR S has two upstream links: LS with three EB sub-LSPs with
a total bandwidth of 15Gbps, and MS with two EB sub-LSPs
with a total bandwidth of 15Gbps. S MUST ensure that 15Gbps
is available for each of the LS and MS links. S has thus a
total incoming bandwidth of 30Gbps on MLSP Z. S has to
divide this equally among its downstream links SP, SQ and
SR, yielding 10Gbps each. S MUST ensure that the total
bandwidth requested on the SP link for sub-LSPs E1 and E4 is
10Gbps. S may choose to signal these sub-LSPs with 5Gbps
each. Similarly for the SQ and SR links.
There are two important points to note here. One is that the
bandwidth reservation (TSpec) for a given EB sub-LSP can (and
usually will) change hop-by-hop. The second is that as new EB
sub-LSPs are signaled for an MLSP, the bandwidth reservations
for existing EB sub-LSPs belonging to the same MLSP may have to
be updated. To minimize these updates, it is RECOMMENDED that
the first EB sub-LSP on a link be signaled with the total
required bandwidth (as far as is known), and later sub-LSPs on
the same link be signaled with 0 bandwidth.
Traffic intended to be sent over an MLSP is determined at the
ingress LSR by means outside the scope of this document, and at
transit LSRs by the label(s) assigned by the transit LSR to its
upstream LSRs. In the case of non-EB sub-LSPs, this traffic is
load balanced across downstream links in the ratio of the
bandwidths of the sub-LSPs that comprise the MLSP. In the case
of EB sub-LSPs, the traffic belonging to an MLSP from an
upstream LSR (or better still, the aggregate traffic for the
MLSP from all upstream LSRs) is load balanced equally among all
downstream links.
As noted above, the overriding concern is that flows are mapped
to the same downstream link (except when the MLSP or some
constituent sub-LSPs are changing); this is typically done by
hashing fields that define a flow, and mapping hash results to
different downstream LSRs. Hash-based load balancing typically
assumes that the numbers of flows is sufficiently large and the
bandwidth per flow is reasonably well-balanced so that the
results of hashing yields reasonable traffic distribution.
Entropy labels (
and ) can be used to improve
load balancing at intermediate nodes.
This document introduces no new security concerns in the setup and
signaling of LSPs using RSVP-TE, or in the use of the RSVP
protocol. specifies the message
integrity mechanisms for RSVP signaling. These mechanisms apply
to RSVP-TE signaling of MLSPs described in this document, and are
highly recommended pending newer mechanisms for RSVP.
The author would like to thank the Routing Protocol group at
Juniper Networks for their questions, comments and encouragement
for this proposal. While many participated, special thanks go to
Yakov Rekhter, John Drake and Rahul Aggarwal.
IANA is requested to assign two new C-Types for the Class "Sender
Template", one for the "MLSP_TUNNEL_IPv4" Sender Template and one
for the "MLSP_TUNNEL_IPv6" Sender Template.
IANA is also requested to assign two new C-Types for the Class
"Filter Specification", one for the "MLSP_TUNNEL_IPv4" Filter
Specification and one for the "MLSP_TUNNEL_IPv6" Filter
Specification.