INTERNET-DRAFT Mingui Zhang Intended Status: Informational Dacheng Zhang Expires: May 3, 2012 Huawei October 31, 2011 Adaptive VLAN Assignment for Data Center RBridges draft-zhang-trill-vlan-assign-02.txt Abstract If RBridges are casually assigned as Appointed Forwarders for VLANs without considering the number of MAC addresses and traffic load of these VLANs, it may overload some of the RBridges while leave other RBridges lightly loaded, which reduces the scalability of a RBridge network and undermines its performance. A new protocol is designed in this document to support the adaptive VLAN assignment (or Appointed Forwarder selection) based on the forwarders' reporting of their usage of MAC tables and available bandwidth. Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/1id-abstracts.html The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright and License Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. Mingui Zhang Expires May 3, 2012 [Page 1] INTERNET-DRAFT VLAN Assignment October 31, 2011 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Data Center RBridge . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Scalability . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2. East-West Capacity Increase . . . . . . . . . . . . . . . . 4 2.3. Virtualization . . . . . . . . . . . . . . . . . . . . . . 5 3. MAC Entries Usage . . . . . . . . . . . . . . . . . . . . . . . 5 4. Load Distribution . . . . . . . . . . . . . . . . . . . . . . . 7 4.1. Egress Traffic . . . . . . . . . . . . . . . . . . . . . . 7 4.2. Ingress Traffic . . . . . . . . . . . . . . . . . . . . . . 7 5. TLV Definition . . . . . . . . . . . . . . . . . . . . . . . . 8 5.1. MAC Entries Report sub-TLV . . . . . . . . . . . . . . . . 8 5.2. Traffic Bit Rate Report sub-TLV . . . . . . . . . . . . . . 9 6. Adaptive VLAN Assignment . . . . . . . . . . . . . . . . . . . 10 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 11 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 11 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 9.1. Normative References . . . . . . . . . . . . . . . . . . . 11 9.2. Informative References . . . . . . . . . . . . . . . . . . 12 Author's Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 Mingui Zhang Expires May 3, 2012 [Page 2] INTERNET-DRAFT VLAN Assignment October 31, 2011 1. Introduction The scales of Data Center Networks (DCNs) are expanding very fast these years. In DCNs, Ethernet switches and bridges are abundantly used for the interconnection of servers. The plug-and-play feature and the simple management and configuration of Ethernet are appealing to the DCN providers. A whole DCN can be a simple large layer 2 Ethernet which is either built on a real network or on a virtual platform. Cloud Computing is growing up from DCNs which can be seen as a virtual platform that provides the reuse of the network resources of DCNs. A lot of cloud applications have been developed by DCN providers, such as Amazon's Elastic Compute Cloud (EC2), Akamai's Application Delivery Network (ADN) and Microsoft's Azure. Cloud Computing clearly brings new challenges to the traditional Ethernet. The scales of the DCNs are becoming too large to be carried on the traditional Ethernet. The valuable MAC-tables of the bridges are running out of use for storing millions of MAC addresses. The broadcast of ARP messages consumes too much bandwidth and computing resources. The mobility of end stations brings dynamics to the network which can be a heavy burden if the management and configuration of the network involves too much manpower. The Spanning Tree Protocol used in the traditional Ethernet is becoming outdated since there is only a single viable path on the tree for a node pair and this path is not always the best path (e.g., shortest path). RBridges are designed to improve the shortcomings of the traditional Ethernet. To make use of the rich connections, RBridges introduce multi-pathing to the Ethernet to break the single-path constraint of STP. Multiple points of attachment is a basic feature supported by RBridges and common for Data Center Bridges. This feature not only increases the "east-west" capacity but also greatly enhances the reliability of DCNs [VL2] [SAN]. If several RBridges are attached to a bridged LAN link at the same time, the DRB is responsible for the assignment of a VLAN to one of the RBridges as the Appointed forwarder. However, the current VLAN assignment is done in an one-way manner. The DRB casually assign a VLAN to an RBridge attached to the local link without knowing its available MAC-table entries or bandwidth. The appointed forwarder does not feedback the utilization of its MAC-table or bandwidth either. This document proposes to balance the load among VLANs and the load within a single VLAN. Two types of sub-TLVs are defined, with which a forwarder can report its MAC entries and traffic bit rate respectively. By gathering these report messages, the VLAN assignment can be done in a way that the usage of the MAC tables and bandwidth of the attached RBridges are balanced among the VLANs. Mingui Zhang Expires May 3, 2012 [Page 3] INTERNET-DRAFT VLAN Assignment October 31, 2011 1.1. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Data Center RBridge Data Center Networks grow rapidly recently. Ethernet is widely used in data centers because of its simple management and plug-and-play features. However, there are shortcomings of Ethernet. RBridges are designed to improve these shortcomings. In this section, we analyze the characteristics of the DCNs that impact the design of RBridges and reveal why the adaptive VLAN assignment is important for RBridges to be used in DCNs. 2.1. Scalability In the past years, a large DCN is typically composed of tens of thousands servers interconnected through switches and bridges. In the cloud computing era, there can be as many as millions of servers in one DCN. The management of the numerous MAC addresses of the servers on the layer2 devices will become more and more complex. RBridges are aimed to replace the traditional bridges. Valuable CAM-tables on RBridges can easily be used up if they are not used reasonably [CAMtbl]. For RBridges to be widely used in DCNs, the VLANs should be assigned to the RBridges in a manner that the MAC entries of the VLANs on the RBridges are balanced. 2.2. East-West Capacity Increase The Spanning Tree Protocol (STP) in the traditional LAN blocks some ports of the bridges for the purpose of loop avoidance. However, the side-effects of STP are obvious. The link bandwidth attached to the blocked ports are not used which greatly wastes the capacity of the network. On the tree topology, the communication between the bridges of the left branch and right branch must transit the single root bridge, which forms a "hair-pin turn". With the rapid increase of the amount of servers in DCNs and their traffic demand, it is urgent to break the constraint of STP and enhance the "east-west" capacity of DCNs which are always richly connected. RBridges use the multi-path routing to set up the data plane of a TRILL campus. Multiple RBridges may be attached to the same LAN link, which offers multiple access points to the LAN link. The hosts on this LAN link is therefore multi-homed to a TRILL campus. All the attached RBridges can act as the packet forwarder for the VLANs carried on this LAN link. In the worst case, all the VLANs Mingui Zhang Expires May 3, 2012 [Page 4] INTERNET-DRAFT VLAN Assignment October 31, 2011 are assigned to the same RBridge. Under this scenario, the ingress capacity on the other RBridges is wasted. It is necessary to balance the traffic load of the VLANs among these RBridges through the assignment of the VLANs. 2.3. Virtualization Virtualization is important for increasing the utilization of network resources in DCNs. For example, the VPNs can be used to separate the traffic from different services therefore they can be carried on the same pool of resources. When the VPNs is carried over a TRILL campus, RBridges can use a VLAN tag to identify each VPN. However, the use of VLANs multiplies the entries in the MAC table of the RBridges. Since a host can be a member of several VLANs at the same time, the RBridges have to store multiple copies of its MAC address in its precious MAC table. Virtual Machines (VM) are widely used in DCNs. A physical host can support multiple VMs and each VM has to be identified by one MAC address that is need to be stored in the MAC tables of RBridges. This seriously increases the numbers of MAC entries in RBridges. Moreover, the number of VMs in a VLAN is not necessarily equal to the number of the physical hosts. VMs are spawned or destroyed based on the demand of the applications. They can also migrate from one location to another, which may be either an in-service or out-of-service move. VMs bring about the volatility of the size of VLANs. It is hard for a TRILL campus to provide one static VLAN assignment based on the numbers of physical hosts of VLANs that is proper for all applications all the time. It is necessarily to do VLAN assignment adaptively. 3. MAC Entries Usage A CAM-table on a switch is expensive, which is a major constraint on the scalability of Ethernet [CAMtable]. When a RBridge is used to connect lots of hosts in large Data Center Networks, the entries of the CAM-table can easily be used up. The network should be tactically interconnected and the valuable MAC table entries should be used economically. RBridges support multiple points of attachment [RFC6325]. When RBridges are used in a DCN to form a TRILL campus, a LAN link MAY have multiple access points to this network. All the access RBridges are able to act as the packet forwarder of the VLANs carried on this LAN link. The DRB of this LAN link is responsible to pick out one RBridge attached to this LAN link as the appointed forwarder for each VLAN-x. In other words, the DRB assigns VLAN-x to one of the RBridge. For an assigned VLAN, its forwarder is not only responsible for Mingui Zhang Expires May 3, 2012 [Page 5] INTERNET-DRAFT VLAN Assignment October 31, 2011 forwarding the packets but also need to store the active MAC addresses of the hosts on this VLAN. If the VLANs on the LAN link are not appointed properly, some of the RBridges's MAC tables are easily to be used up while the other RBridges are left idle. Take Figure 3.1 as an example, there are four VLANs carried on the LAN link: w, x, y and z. There are two hosts in both VLAN-w and VLAN-x and one host in both VLAN-y and VLAN-z. RB1 and RB2 are both attached to this LAN link. RB1 is elected as the Designated RBridge who is responsible to choose the appointed forwarder for the above VLANs. In the figure, VLAN-w,x are assigned to RB1 and VLAN-y,z are assigned to RB2. Obviously, this assignment is not balanced, since the MAC table of RB1 has four entries while the MAC table of RB2 only has two entries. If the DRB can reassign VLAN-w to RB2 and reassign VLAN-y to RB1, both RBridges will have three MAC entries, therefore a more balanced assignment is achieved. MAC Entries +-----+ | w | +-----+ | w | MAC Entries +-----+ >---+ +-----+ | x | | | y | +-----+ | +---< +-----+ | x | | | | z | +-----+ | | +-----+ | | DRB&AF:w,x | | AF:y,z +-----+ | | +-----+ | RB1 |-----+ +-----| RB2 | +-----+ +-----+ | | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ @ @ +-------+ +-------+ +-------+ +-------+ @ @ |[H] [H]| |[H] [H]| | [H] | | [H] | @ @ +-------+ +-------+ +-------+ +-------+ @ @ VLAN-w VLAN-x VLAN-y VLAN-z @ @ @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ LAN link Figure 3.1: Unbalanced assignment among VLANs In order to assign the VLANs in a balanced way, the DRB need to know the usage of the MAC tables of its appointed forwarders and the sum of the MAC addresses in each VLAN. Since RBridges only store the Mingui Zhang Expires May 3, 2012 [Page 6] INTERNET-DRAFT VLAN Assignment October 31, 2011 active MAC addresses and a virtual machine can move from one location to another, the MAC entries a VLAN occupies on an RBridge varies from time to time. The assignment of the VLANs cannot be done once for all. It is necessary for the DRB to do the assignment adaptively taking the usage of MAC tables of its appointed forwarders into consideration. In Section 5.1, the MAC Entries Report sub-TLV is defined to deliver this kind of information from a forwarder to a DRB. 4. Load Distribution The traffic from the TRILL campus to the local LAN link is the egressing traffic while the traffic from the local LAN link to the TRILL campus is the ingressing traffic. A forwarder RBridge acts as both the ingress and egress point of a VLAN's traffic. The assignment of the appointed forwarder for each VLAN affects both the egress and ingress traffic load distribution. 4.1. Egress Traffic One RBridge MAY have multiple ports attached to the same local LAN link. These ports are called "port group" [RFC6325]. When a DRB assigns a VLAN to an RBridge, its total available egress bandwidth of the port group needs to be taken into consideration. After VLAN-x has been assigned to an RBridge, the forwarding port assignment of one of the port group to VLAN-x as the forwarding port is entirely a local matter. Since a LAN link is a STP domain, more than one forwarding port for one VLAN will cause a loop. The forwarder MUST assign one and only one port for each VLAN. Load balancing can be realized through splitting the load among different VLANs as suggested in Section 4.4.4 of [RFC6325]. 4.2. Ingress Traffic After the known unicast packets enter the TRILL campus from the ingress RBridge, they can be sent through the paths starting at this ingress point. Since the DRB knows the whole topology of the TRILL campus, it can figure out these paths as well. Therefore, the DRB should take the available bandwidth of these paths into consideration when assigning the appointed forwarder of a VLAN. Any assignment that is possible to congest an already busy ingress point or a path should be avoided. Traffic Matrices are usually taken as the input to the traffic engineering methods [TE]. VLAN assignment actually changes the Traffic Matrix of a TRILL campus. Therefore, our adaptive VLAN assignment can work together with traffic engineering methods to Mingui Zhang Expires May 3, 2012 [Page 7] INTERNET-DRAFT VLAN Assignment October 31, 2011 achieve a balanced traffic distribution of the whole network. However, the design of this kind of cooperative mechanism is out the scope of this document. With the TLV defined in Section 5.2, the egressing and ingressing traffic of an appointed forwarder are reported to its DRB. 5. TLV Definition The Appointed Forwarders TLV has already been defined in [RFC6326]. With this TLV, the DRB can appoint an RBridge on the local link to be the forwarder for each VLAN. However, there is no feedback from the appointed forwarder to notify the DRB whether the assignment is reasonable. Two sub-TLVs are defined in this section to open the passageway of feedback. 5.1. MAC Entries Report sub-TLV The appointed forwarder use MAC Entries Report sub-TLV to report the usage of its MAC table to the DRB. It has the following format: +-+-+-+-+-+-+-+-+ |Type=MACEtrRep | (1 byte) +-+-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DRB Nickname | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Maximum MAC Entries | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Available MAC Entries | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC Entries of VLAN (1) | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...... | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC Entries of VLAN (n) | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ where each MAC Entries of VLAN is of the form: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV | VLAN ID | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | The Number of MAC Entries | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o Type: MAC Entries Report sub-TLV. Mingui Zhang Expires May 3, 2012 [Page 8] INTERNET-DRAFT VLAN Assignment October 31, 2011 o Length: 6+4n bytes, where n is the number of VLANs that the appointed forwarder selects to report their numbers of MAC entries in its MAC table. o DRB Nickname: The nickname of the Designated RBridge of the local link. o Maximum MAC Entries: The maximum number of the entries of the MAC table of the appointed forwarder. o Available MAC Entries: The number of available entries of the MAC table of the appointed forwarder. o RESV: 4 bits that MUST be sent as zero and ignored on receipt. o VLAN ID: This field identifies one of the VLANs that assigned to the appointed forwarder. o The Number of MAC Entries: The number of MAC Entries that the given VLAN occupies in the MAC table of the appointed forwarder. These MAC entries does not only contain the local MACs of the hosts on the local link but also includes the MAC addresses from the same VLAN on the remote link (i.e., the same virtual link). 5.2. Traffic Bit Rate Report sub-TLV The appointed forwarder use Traffic Bit Rate Report sub-TLV to report the bandwidth utilization of its port group to the DRB. This sub-TLV has the following format: +-+-+-+-+-+-+-+-+ |Type=TrafficRep| (1 byte) +-+-+-+-+-+-+-+-+ | Length | (1 byte) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DRB Nickname | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Maximum Link Bandwidth | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Available Link Bandwidth | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Traffic Bit Rate of VLAN (1) | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ...... | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Traffic Bit Rate of VLAN (n) | (4 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Mingui Zhang Expires May 3, 2012 [Page 9] INTERNET-DRAFT VLAN Assignment October 31, 2011 where each Load of VLAN is of the form: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RESV|D| VLAN ID | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Traffic Bit Rate | (2 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ o Type: Traffic Bit Rate Report sub-TLV. o Length: 6+4n bytes, where n is the number of VLANs that the appointed forwarder selects to report their traffic load that egress onto the port group. o DRB Nickname: The nickname of the Designated RBridge of the local link. o Maximum Link Bandwidth: The maximum bandwidth of the port group attached to the local link. o Available Link Bandwidth: The available bandwidth of the port group attached to the local link. o RESV: 3 bits that MUST be sent as zero and ignored on receipt. o D: The direction of the traffic. Value "0" denotes the egressing traffic volume and value "1" denotes the ingressing traffic volume. o VLAN ID: This field identifies one of the VLANs that assigned to the appointed forwarder. o Traffic Bit Rate: The traffic bit rate of the given VLAN onto the local link through the port group of the appointed forwarder. 6. Adaptive VLAN Assignment All appointed forwarders advertise the MAC Entries Report TLV and Traffic Bit Rate Report TLV in LSPs to their DRBs. The number of VLANs in these TLVs is constrained by the inter-RBridge link MTU whose default value is 1470 octets. If the MTU is not big enough to hold the statistics of all VLANs, appointed forwarders choose VLANs in the order of most MAC entries first and highest load first to report to the DRB. An appointed forwarder SHOULD report these two TLVs to its DRB when its adjacency to the DRB enters Report state [RFC6327] and each time the DRB assigns a VLAN to it. The usage of the MAC table and the utilization of bandwidth of this appointed forwarder is calculated on the DRB. The DRB can choose an appointed Mingui Zhang Expires May 3, 2012 [Page 10] INTERNET-DRAFT VLAN Assignment October 31, 2011 forwarder for the local link based on Least Usage/Utilization First (LUF) policy. If an RBridge on the local link is already heavily loaded, the DRB should refrain from assigning additional VLANs to it. It has been a common practice to collect MAC table usage in bridge networks. Through the collection of TLVs (These TLVs can be stored in MIB [RBmib].) defined in this document, operators will have a vision of the MAC table usage and bandwidth utilization of the RBridges on the LAN link. Based on this vision, operators can easily recognize the hot spot of the TRILL campus. Because of host mobility, a former balanced VLAN assignment MAY become unbalanced. If a forwarder's MAC table or bandwidth is running out of use, operators can remove some VLANs from it and reassign them to another RBridge on the local link as the new forwarder. 7. Security Considerations The delivery of the messages types in this document can be protected with the cryptographic mechanism proposed in [RFC5310]. 8. IANA Considerations This document suggests to specify two new sub-TLVs from existing sub- TLV sequences -- namely MACEtrRep and TrafficRep. Section sub- MT Port Router Extended NUM TLV# Capabil. Capabil. IS Reach MACEtrRep 5.1 4 X - - * TrafficRep 5.2 5 X - - * 9. References 9.1. Normative References [RFC6325] R. Perlman, D. Eastlake, et al, "RBridges: Base Protocol Specification", RFC 6325, July 2011. [RFC6326] D. Eastlake, A. Banerjee, et al, "Transparent Interconnection of Lots of Links (TRILL) Use of IS-IS", RFC 6326, July 2011. [RBmib] A. Rijhsinghani, K. Zebrose, "Definitions of Managed Objects for RBridges", draft-ietf-trill-rbridge-mib-04.txt, working in progress. [RFC6327] D. Eastlake, R. Perlman, et al, "Routing Bridges (RBridges): Adjacency", RFC 6327, July 2011. Mingui Zhang Expires May 3, 2012 [Page 11] INTERNET-DRAFT VLAN Assignment October 31, 2011 9.2. Informative References [CAMtbl] B. Hedlund, "Evolving Data Center Switching", http://internetworkexpert.s3.amazonaws.com/2010/trill1/TRILL- intro-part1.pdf [SAN] "Configuring an iSCSI Storage Area Network Using Brocade FCX Switches", Brocade CONFIGURATION GUIDE, 2010. [VL2] A. Greenberg, J.R. Hamilton, et al, "VL2: A scalable and flexible data center network", in Proceedings of ACM SIGCOMM, 2009. [TE] M. Roughan, M. Throup, and Y. Zhang, "Traffic Engineering with Estimated Traffic Matrices" , in Proceedings of ACM IMC, 2003. [RFC5310] M. Bhatia, V. Manral, T. Li, et al, "IS-IS Generic Cryptographic Authentication", RFC 5310, February 2009. Mingui Zhang Expires May 3, 2012 [Page 12] INTERNET-DRAFT VLAN Assignment October 31, 2011 Author's Addresses Mingui Zhang Huawei Technologies Co.,Ltd HuaWei Building, No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District, Beijing, 100085 P.R. China Email: zhangmingui@huawei.com Dacheng Zhang Huawei Technologies Co.,Ltd HuaWei Building, No.3 Xinxi Rd., Shang-Di Information Industry Base, Hai-Dian District, Beijing, 100085 P.R. China Email: zhangdacheng@huawei.com Mingui Zhang Expires May 3, 2012 [Page 13]