INTERNET DRAFT                                          S. Bandyopadhyay
draft-shyam-mshn-ipv6-06.txt                            January 03, 2012
Intended status: Proposed Standard
Expires: July 03, 2012


            Mesh Structured Hierarchical Networking and IPv6
                      draft-shyam-mshn-ipv6-06.txt

Abstract

   This document tries to address an approach for reorganization of
   entire network in a large address space. It describes how a three-
   tier mesh structured hierarchy can be established based on
   fragmenting the entire space into some regions and sub regions inside
   each of them. It addresses issues which could be relevant to this
   architecture in the context of IPv6. This document also tries to come
   out with an approach how IP switch based network can perform as good
   as ATM network for the processing of real time traffic.

Status of this memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


Bandyopadhyay             Expires July 03, 2012                 [Page 1]

Internet Draft                MSHN and IPv6             January 03, 2012


1. Introduction

   Transition from IPv4 to IPv6 is in the process. Work has been done to
   upgrade individual nodes (workstations) from IPv4 to IPv6. Also,
   there are established documents to make router/switches to work to
   support IPv4 as well as IPv6 packets simultaneously in order to make
   the transition possible [1].  The CIDR[2] based hierarchical
   architecture in the existing 32-bit system is supposed to be
   continued in IPv6 too with a large address space. There are
   documents/concerns over BGP table entries to become too large in the
   existing system [3]. There are proposals to upgrade Autonomous System
   number to 32-bit from 16-bit to support the demand at the same time
   [4]. The challenge relies on how to make the transition smooth from
   IPv4 to a real IP world with least changes possible. ATM network
   performs faster than the network with IP switches. The difference
   becomes more prominent for real time applications.  Whereas they have
   disadvantages as far as bandwidth usages is concerned compared to the
   IP-switch based network. This document tries to address approaches
   for IP-switch based network to process real-time applications as fast
   as ATM network also a mesh structured hierarchical network with flat
   address space for routing convenience.

2. A Three-tier mesh structured hierarchical network

   Existing system is in work with Autonomous System (AS) and inter-AS
   layer with the approach of CIDR. In order to meet the need within the
   32-bit address space, Autonomous Systems of various sizes maintain
   CIDR based hierarchical architecture. With the help of NAT [5], a
   stub network can maintain an user ID space as large as a class A
   network and can meet its useful need to communicate with the rest of
   the world with very few real IP addresses. With the combination of
   CIDR and NAT applied in the entire space, most of the part of 32-bit
   address space gets effectively used as network ID. This is how,
   16-bit 'Autonomous System Number' is realized as insufficient in
   order to meet the need of growing customers. If the same gets
   continued with a larger network ID, load in the switches will become
   too high.

   As Autonomous Systems of various sizes are supported, Autonomous
   Systems and the nodes inside the Autonomous Systems can be viewed as
   graphically lying in the same plane within the address apace. If
   network can be viewed as lying in different planes, routing issues
   can be made simpler. If network is designed with a fixed length of
   prefix for the Autonomous System everywhere, routing information for
   the rest will get confined with the other part of the network prefix.
   Which means the maximum size of AS gets assigned to all irrespective
   of their actual sizes. This can be made possible with the advantage
   of using a large address space and dividing it into number of regions


Bandyopadhyay             Expires July 03, 2012                 [Page 2]

Internet Draft                MSHN and IPv6             January 03, 2012


   of fixed sizes inside it. Thus entire network can be viewed as a
   network of inter-AS layer nodes. Each node in the inter-AS layer can
   act either only as a router in the inter-AS layer or as a router in
   the inter-AS layer with an Autonomous System attached to it with a
   single point of attachment or as an Autonomous System with multiple
   Autonomous System border routers (ASBR) appearing like a mesh. Thus
   two tier mesh structured hierarchy gets established between AS layer
   and inter-AS layer with each AS having a fixed length of prefix.

   Based on the definition of Autonomous System, it is a small area
   within the entire network that maintains its own independent identity
   that communicates with the rest of the world through some specific
   border routers. In the similar manner, if a larger area (say region
   or state) can be considered as network of Autonomous Systems, that
   can maintain its own identity by communicating with the rest of the
   world through some border routers (say, state border router), mesh
   structured hierarchy can be established within the inter-AS layer.
   The inter-AS layer will be split into inter-AS-top and inter-AS-
   bottom. To maintain this hierarchy, each node of inter-AS-top needs
   to have multiple regional or state border routers (say, SBR) through
   which each one will communicate with the rest of the world in the
   similar manner an Autonomous System maintains ASBR. Thus, entire
   network will appear as a network of nodes of inter-AS-top layer. To
   maintain hierarchy, each node of the inter-AS-top needs to have a
   fixed length of prefix. i.e. each node of the inter-AS top will be
   assigned a maximum (fixed) number of nodes of Autonomous Systems.

   Thus, with three-tier mesh structured hierarchy in the network layer,
   network ID can be viewed as A.B.C. If pA, pB and pC be the prefix
   lengths of inter-AS-top, inter-AS-bottom and AS layers respectively,
   there will be 2^pA nodes at the topmost layer, 2^pB at the inter-AS-
   bottom layer and 2^pC nodes at the AS layer. Thus the entire space
   gets divided into a fixed number of regions and each region gets
   divided into fixed number of sub regions. This division is supposed
   to be made based on geography, population density and their demands
   and related factors.

   Let nMaxInterASTopNodes be the possible maximum number of nodes
   assigned at the top most layer and nMaxInterASBottomNodes be that at
   the inter-AS-bottom layer and nMaxASNodes at the AS layer. Where
   nMaxInterASTopNodes <= 2^pA and nMaxInterASBottomNodes <= 2^pB and
   nMaxASNodes <= 2^pC.

2.1. Route propagation

   With hierarchy established, routing information that gets established
   inside a node of inter-AS-top, does not need to be propagated to
   another node of inter-AS-top. Entire routing information of inter-AS-


Bandyopadhyay             Expires July 03, 2012                 [Page 3]

Internet Draft                MSHN and IPv6             January 03, 2012


   top layer needs to be propagated to inter-AS-bottom layer. So, each
   router of inter-AS layer will have two tables of information, one for
   the inter-AS-top and another for the inter-AS-bottom of the inter-AS-
   top node that it belongs to. BGP (with little modification) will work
   very well with a trick applied at the SBRs. Each SBR will not
   propagate the routing information of inter-AS-bottom layer of its
   domain to another SBR of neighboring domain. i.e. SBR of one top
   layer node will propagate routing information only of inter-AS-top
   layer to SBR of another top layer node. Inside a node of inter-AS-
   top, routing information of inter-AS-top and inter-AS-bottom need to
   be propagated from one ASBR to another neighboring ASBR. Inside a top
   layer node A, routing information of another top layer node B will
   have two parts; one for the list of SBRs through which a packet will
   traverse from top layer node A to B and another for the list of ASBRs
   through which the packet will traverse from one AS to another inside
   A. In terms of BGP, AS_PATH attribute will be split into two parts;
   one for the information of the top layer and another for the bottom
   layer. Within the same node A routing information of one AS to
   another AS will not have any top layer information. i.e. the top
   layer information will be set to as NULL.

   Similarly, each node of the AS layer will have three tables of
   routing entries. One for the inter-AS-top, one for the inter-AS-
   bottom and another for the routing information inside the Autonomous
   System itself.

   With traditional CIDR based hierarchy, a node of higher prefix can be
   divided into number of nodes with lower prefixes. Each divided node
   can further be subdivided with nodes of further lower prefixes. This
   process can be continued till no further division is possible. The
   point worth noting is at each point the designer of the network has
   to preconceive the future expansion of the network with the concept
   in the mind that the resource can not be exhausted at any point of
   time. This phenomenon leads the designer to allocate resources much
   higher than whatever is needed which leads to a space of unused
   address space and the concept of H-D (host-density) ratio comes into
   play. The problem gets aggravated once resource gets exhausted by any
   chance. e.g. a node of prefix /16 can be divided with a number of
   nodes of prefixes /24. If any one of the nodes /24 gets exhausted,
   resources of other nodes of prefixes /24 can not be used even if they
   are available.

   Introduction of hierarchy at the inter-AS layer reduces the size of
   the routing table substantially. With the availability of hardware
   resources if flat address space is maintained at each layer, problems
   related to CIDR can be avoided. With flat address space, no
   hierarchical relationship needs to be established between any two
   nodes in the same layer. So, all the nodes inside each layer can be


Bandyopadhyay             Expires July 03, 2012                 [Page 4]

Internet Draft                MSHN and IPv6             January 03, 2012


   used till they get exhausted. With flat address space (i.e.  without
   prefix reduction), BGP tables will have nMaxInterASTopNodes +
   nMaxInterASBottomNodes entries.

   IGP like OSPF has got provision to divide AS into smaller areas. OSPF
   hides the topology of an area from the rest of the Autonomous System.
   This information hiding enables a significant reduction in routing
   traffic. With the support of subnetting, OSPF attaches an IP address
   mask to indicate a range of IP addresses being described by that
   particular route. With this approach it reduces the size of the
   routing traffic instead of describing all the nodes inside it, but
   introduces another level of hierarchy. If subnetting concept can be
   avoided from the AS layer(with the additional overhead of computation
   inside the SPF tree), each area can be configured from a free pool of
   addresses based on its requirement dynamically. So, an AS can be
   divided into number of areas of heterogeneous sizes with the nodes
   from a free pool of address space.

   Similarly, the concept of area can be introduced in the inter-AS-
   bottom layer the way it works in OSPF. The area border routers in the
   inter-AS-bottom layer have to behave exactly in the similar manner
   the way an ABR behaves in OSPF.  i.e. an area border router will hide
   the topology inside an area to the rest of the world and will
   distribute the collected information inside the area to the rest. It
   will distribute the collected routing information from outside to the
   nodes inside as well. In order to implement this, protocol running in
   the inter-AS layer (say BGP) will have to introduce a 'cost' factor.
   This cost factor can be interpreted as the cost of propagation of a
   packet from one AS to another. The protocols running inside AS layer
   (RIP/OSPF, etc) will have to the supply the cost information for a
   packet to travel from one ASBR to another. All the protocols must
   behave in unison for supplying this information. The cost factor is
   needed for a remote node while sending a packet to a node inside an
   area while more than one area border routers are equidistant from
   that remote node. Thus inter-AS-bottom layer (i.e. one inter-AS-top
   level node) can be divided into number of areas of heterogeneous
   sizes with nodes of AS from a free pool of address space. BGP adopts
   a technique called route aggregation. Along with route aggregation it
   reduces routing information within a message. In the similar manner,
   introduction of area inside inter-AS-bottom layer will not only
   reduce the complexity of the protocol, but will reduce the size of a
   BGP packet substantially.

   With this architecture, each node(router) inside an AS is represented
   as A.B.C.  Each node may or may not be attached with a network which
   acts as a leaf node (i.e. a network will not act as a transit). In
   order to make use of user-id space properly and to support customer
   networks of heterogeneous sizes, the user-ID space needs to be


Bandyopadhyay             Expires July 03, 2012                 [Page 5]

Internet Draft                MSHN and IPv6             January 03, 2012


   divided as subnet-ID and user-ID. Profoundly, a VLSM (variable length
   subnet mask) type of approach has to be adopted at each node of an
   AS. So, each node of the AS layer will act as the root of a tree
   whose leaves are independent small customer networks which will act
   as stub. As the routing information of inter-AS layer as well as AS
   layer need not be passed inside any node of the VLSM tree, each
   router inside the tree should maintain default route for any address
   outside of its network. With this approach, load on each router of
   the service providers will become negligible. Protocols that supports
   VLSM with MPLS/VPN has to be implemented inside the tree (inside the
   VLSM tree, all the physical ports of a switch have to be configured
   with the subnet mask. So, mere MPLS on top of static routing table
   should do the rest).

   The fundamental assumptions based on which this architecture lies can
   be summarized as follows:

   i) Entire network can be viewed as a network of regions or states
   where each region or state can have its own identity by communicating
   with the rest of the world through some state border routers. Each
   region or state is a network of Autonomous Systems. Each region as
   well as each Autonomous System inside them will have a fixed
   (maximum) length of prefix.

   ii) Availability of hardware resources is such that flat address
   space can be maintained at the inter-AS layer.

   Introduction of mesh-structured hierarchy at the inter-AS layer will
   have several advantages:

        o   Load at each router will get reduced substantially.
        o   Concept of CIDR style approach and complexity related to
              prefix reduction can be easily avoided.
        o   Full mesh hierarchy will make traffic evenly distributed.
        o   Physical cable connection can be optimized.
        o   Administrative issues will become easier.

2.2. Determination of prefix lengths

   With this architecture, IP address can be described as A.B.C.D where
   the D part represents the user id. Each router in the inter-AS layer
   will have two tables of information, one for the inter-AS-top and
   another for the inter-AS-bottom of the inter-AS-top node that it
   belongs to. Whereas, each node of the AS layer will have three tables
   of routing entries; one for the inter-AS-top, one for the inter-AS-
   bottom and another for the routing information inside the Autonomous
   System itself. In the worst case. a node inside an AS needs to
   maintain nMaxInterASTopNodes + nMaxInterASBottomNodes + nMaxASNodes


Bandyopadhyay             Expires July 03, 2012                 [Page 6]

Internet Draft                MSHN and IPv6             January 03, 2012


   entries in its routing table.

   The dynamic nature of allocating an area from a free pool of address
   space is more frequent at the AS layer than at the inter-AS-bottom
   layer. As OSPF supports all the features needed, it can be considered
   as default choice in the AS layer.  Existing implementation of OSPF
   (Version 2) supports subnetting, by which an entire area can be
   represented as a combination of network address and subnet mask. With
   this approach, entire routing table gets reduced substantially.  With
   the removal of subnetting, all the nodes inside an area will have an
   entry inside the routing table (OSPF Version 1). So the deterministic
   factor is what is the maximum number of nodes inside an AS OSPF can
   support once subnetting support gets removed. So the prefix length of
   AS layer will be determined by this factor of OSPF.

   With the introduction of hierarchy in the inter-AS layer, number of
   entries in the BGP routing table will get reduced substantially. Even
   if pA and pB both are selected as 16, number of routing entries come
   within the admissible range of existing BGP protocol. But, it is the
   responsibility of IANA to come out with a scheme how
   nMaxInterASTopNodes and nMaxInterASBottomNodes are to be selected.
   Each top level node will have nMaxInterASBottomNodes nodes. It will
   be a waste of address space if each country gets assigned a top level
   nodes (e.g. china has got a population of 1,306,313,800 people where
   as Vatican City has got only 920 according to a census of 2006). So a
   moderate value of nMaxInterASBottomNodes is desirable, with which
   larger countries will have a number of top level nodes. e.g. each
   state of USA can be assigned a top level node. With the introduction
   of area in the inter-AS-bottom layer, each top level node can be
   divided into number of areas of heterogeneous sizes. So, a group of
   neighboring countries with less population can share the address
   space of a top level node. Similarly, user-id space has to be decided
   based on the largest area VLSM tree should be spanned through. All
   these issues are completely geo political and have to be decided by
   IANA.

2.2.1. A pseudo optimal distribution of prefixes in a 64bit architecture

   In order to have optimal use of cable connections, length of the VLSM
   tree is expected to be as short as possible. Also any single
   organization may prefer to have its user id space to be under the
   same network id. So, a 16bit user-id may become insufficient for
   places like large university campus, where as 32bit will become too
   large. Hence, 24bit user-id will be a moderate one which is the class
   A address space in ipv4 (also used as the space for private IP). As
   published in 1998 [8], OSPF can support an area with 1600 routers and
   30K external LSAs. So, 11 bits are needed to support this space. With
   the assumption that OSPF can support much more address space with the


Bandyopadhyay             Expires July 03, 2012                 [Page 7]

Internet Draft                MSHN and IPv6             January 03, 2012


   advancement of hardware technology as well as to keep the space open
   for future expansions, 12 bits are assigned for the AS layer. 16 bits
   are assigned for the inter-AS-bottom layer. So, if on the average,
   16bit equivalent space gets used within the user-id space and 8bit
   equivalent nodes gets used inside an AS (16% of 1600), for a top
   level node (with 16bit equivalent AS nodes), it will generate 2^40 IP
   addresses, which will give 8629 IP addresses per person in Japan
   (with a population of 127417200; Japan is at the 10th position from
   the top in the population list of the world). So, even if all the
   countries with population less than or equal to Japan are assigned a
   top level node and all the provinces/states of countries with larger
   population are assigned a top level node each, total number of nodes
   will come well under 1024. If a number of neighboring countries with
   lesser population shares a top level node, total number of top level
   nodes will come down further.  This suggests that 62 bit equivalent
   (10(pA)+16(pB)+12(pC)+24(user-id)) space will be good enough for
   unicast addresses. This distribution expects OSPF to support 65K
   (64K+1K) external LSAs.

   64bit address space may be divided into two 63bit blocks as follows:

   i. Global unicast addresses with the most significant bit set to 0.
   In order to separate out router address space from the host computers
   of customer networks, routers may be assigned a prefix 01 whereas the
   host computers will have prefix 00. With three-tier hierarchy,
   network ID is represented as A.B.C.  Any router inside the VLSM tree
   including the root will have an address 01A.B.C.router-id.  Where as
   a host interface inside a customer network will be represented as
   00A.B.C.uid.

   As the number of nodes representing routers in the provider network
   will be way too less than the user-id space for the customer
   networks, in order to keep more space for unicast addresses of
   customer networks as well as to keep the option open for future
   expansion, entire 63 bit address space with the MSB set to 0 has been
   assigned to customer networks for unicast addresses. So, the
   distribution will look like 10(pA)+17(pB)+12(pC)+24(user-id).  Router
   address space will be assigned from the address space with the MSB
   set to 1.

   ii. Address space with the MSB set to 1 will be distributed within
   the rest.  This distribution will be based on the requirements and
   the work that have already been done in connection to IPv6 with the
   additional two requirements:

   a) Router address space: Any node in the router address space will be
   designated with a prefix followed by A.B.C.router-id. The prefix will
   be determined based on the distribution of the 63 bit address space.


Bandyopadhyay             Expires July 03, 2012                 [Page 8]

Internet Draft                MSHN and IPv6             January 03, 2012


   b) Provider independent address space: This space will be used for
   the customers who would like to retain their number even after
   changing their providers. Each of these addresses has to be mapped
   with an address from the global unicast address space. Customers who
   would like to have mobility support, the mapped address can be
   considered as the "Home Address" of the mobile node as defined in the
   specification of "IP Mobility Support"[9].

2.2.2. Whether to go for a two-tier or three-tier hierarchy

   Establishment of hierarchy in the inter-AS layer reduces the size of
   BGP entries to a great extent, but leads to an improper use of
   address space due to geo-political reason. If hierarchy in the inter-
   AS space gets removed, entire 26bit (10+16) space will be available
   for a single layer and use of inter-AS space will be true to its
   sense, but will increase external LSA (and/or number of entries in
   the BGP table) dramatically. So, it depends on to what extent OSPF
   can support external LSAs. BGP expects the packet length to be
   limited to 4096 bytes. BGP manages to make it work with this
   limitation with the concept of prefix reduction in the CIDR based
   environment.  As the number of inter-AS nodes increases, BGP has to
   change this limit in order to make it work in flat address space. The
   alternate will be to divide the inter-AS space into number of areas
   as defined in section 2.1. The area border routers will advertise the
   aggregated information to the rest of the world. BGP may have to
   incorporate both the options at the same time.  As the number of
   nodes in the inter-AS layer increases, in order to reduce the number
   of entries in the routing table, inter-AS space has to be split into
   two separate planes.  So, two-tier hierarchy can be considered as an
   interim state to go for three-tier hierarchy.  If it so happen that
   current available data is good enough to support the present need, it
   will be worth to look for to what extent it can support in the
   future. Assignment of inter-AS nodes in two-tier hierarchy should be
   based on the geographical distribution as if it is part of three-tier
   hierarchy.  Otherwise, introduction of three-tier hierarchy in the
   future will become another difficult task to go through. Based on the
   report of year 2011, BGP supports ~400,000 entries in the routing
   table. With this growing trend, BGP may have to change the limit of
   packet length even in a CIDR based environment. With the introduction
   of two-tier hierarchy, number of entries in the routing table will
   come down drastically and with the three-tier approach, it will come
   down further.

2.3. Issues related to Satellite communications

   Establishment of hierarchy in the inter-AS layer expects the only way
   any two autonomous systems in two different top level nodes
   communicate is through their SBRs. If two autonomous systems inside


Bandyopadhyay             Expires July 03, 2012                 [Page 9]

Internet Draft                MSHN and IPv6             January 03, 2012


   the same top level node communicate through satellite, it will be
   considered as a direct link between them. Whenever autonomous system
   'ASa' of top level node 'A' communicates with autonomous system 'ASb'
   of top level node 'B' through satellite, they have to go through
   their state border routers. i.e.  satellite port inside 'A' that
   communicates with a satellite port inside 'B' will be considered as
   state border router. If multiple such ports exists inside node 'A',
   all of them will be equidistant from any port inside 'B'.  Which
   expects any satellite port inside 'B' to have prior knowledge of list
   of autonomous systems that will be under the purview of any port
   inside 'A'. So, all the satellite ports of 'A' have to exchange such
   group of information with all the satellite ports of 'B' and vice
   versa.  These group of autonomous systems can be considered as a
   cluster of autonomous systems inside an area of a top level node. If
   number of such ports is small, some heuristics can be applied while
   assigning AS numbers in order to reduce the processing time during
   the circuit establishment phase.  It will become difficult to
   maintain such heuristics once the number of such ports becomes large.
   So, in case of satellite communication, the advantage of establishing
   hierarchy inside inter-AS layer diminishes as the number of satellite
   ports increases. If any private corporate maintains its own satellite
   channel to communicate between its offices at distant locations, all
   of these offices are going to be considered as under the user-id
   space of its network. Service providers that provide satellite
   services to the end-site customers, can operate in the usual manner
   as they will provide connection to customer networks which will act
   as stub.

4.2. A proposed solution for multihoming

   The following solution is based on a new IP specification with a
   64bit architecture with an additional "address field" in the IP
   header with the modification of the TCP/IP stack. This field can be
   interpreted as "forwarding address" or "final destination" based on
   its use. The forwarding address will carry the address based on which
   the packet will be forwarded and will be changed by the intermediate
   router in a suitable manner. So, IP header will have three addresses:

      Source Address: Address of the originating source
      Destination Address: Address of the final destination
      Forwarding Address: Address used for forwarding a packet. It may
      be the final destination or a router that will forward the packet
      towards the final destination.

   Consider a customer site with addresses A1.B1.C1.D11 to A1.B1.C1.D12
   as provided by provider P1 and A2.B2.C2.D21 to A2.B2.C2.D22 as
   provided by provider P2 such that D12 - D11 = D22 - D21. Let us
   consider that all the interfaces inside the customer site will be


Bandyopadhyay             Expires July 03, 2012                [Page 10]

Internet Draft                MSHN and IPv6             January 03, 2012


   configured with network id A1.B1.C1 as primary address whereas the
   same with A2.B2.C2 as secondary address. Inside the customer network,
   all the hosts will use their primary address to communicate with each
   other. Let R1 and R2 be the routers that are connected to the
   provider networks P1 and P2 respectively. Whenever R1 receives a
   packet from outside, it forwards the same to the host without any
   modification, but R2 needs to maintain a mapping table to map the
   addresses with net id A2.B2.C2 to A1.B1.C1 to forward packets to the
   destinations. As D12 - D11 = D22 -D21, the mapping should not become
   a big task.  Let us consider a host H with a primary address
   A1.B1.C1.u1 and secondary address A2.B2.C2.u2. Let us consider two
   remote hosts RH1 with IP address RA1.RB1.RC1.Ru1 and RH2 with IP
   address RA2.RB2.RC2.Ru2 which are both single homed who wish to
   communicate with the host H. RH1 looks at H as A1.B1.C1.u1 and RH2
   views it as A2.B2.C2.u2.

   Whenever RH1 initiates a connection with H, it fills a packet with
   both the fields "Destination Address" as well as "Forwarding Address"
   with A1.B1.C1.u1.  The packet gets received by R1 and forwards the
   same to H without any modification. While replying, host H sets the
   fields "Source Address" as the "Destination Address" of the receiving
   packet as A1.B1.C1.u1; "Destination address" as the "Source Address"
   of the receiving packet as RA1.RB1.RC1.Ru1 and the "Forwarding
   Address" as the address of the router based on the "Destination
   Address" of the receiving packet. i.e, R1 if A1.B1.C1.u1 or R2 if
   A2.B2.C2.u2. Router R1 compares "Forwarding Address" with the
   "Destination Address". If they are not the same, it replaces the
   "Forwarding Address" with the "Destination Address" and forwards the
   same to RH1.

   Whenever RH2 initiates a connection with H, it fills a packet with
   both the fields "Destination Address" as well as "Forwarding Address"
   with A2.B2.C2.u2.  The packet gets received by R2. R2 replaces the
   "Forwarding Address" with the mapped address of host from the mapping
   table and forwards the same to H.  While replying, host H sets the
   fields "Source Address" as the "Destination Address" of the receiving
   packet as A2.B2.C2.u2; "Destination address" as the "Source Address"
   of the receiving packet as RA2.RB2.RC2.Ru2 and the "Forwarding
   Address" as the address of the router based on the "Destination
   Address" of the receiving packet. i.e, R1 if A1.B1.C1.u1 or R2 if
   A2.B2.C2.u2. Router R2 compares "Forwarding Address" with the
   "Destination Address". If they are not the same, it replaces the
   "Forwarding Address" with the "Destination Address" and forwards the
   same to RH2.

   Host H also have to configure either R1 or R2 as its default router.
   The default router will be used as the "Forwarding Address" while
   initiating a communication with the outside world. "Source Address"


Bandyopadhyay             Expires July 03, 2012                [Page 11]

Internet Draft                MSHN and IPv6             January 03, 2012


   will be filled based on which provider the default router is
   associated with, i.e. A1.B1.C1.u1 if it is R1 and A2.B2.C2.u2 if it
   is R2. Hosts can select either R1 or R2 based on the status of the
   link with the outside world. Different hosts may select different
   router at the same time based on their choices.

   For all single homed customer networks, the field "Forwarding
   Address" will be filled with the value of "Destination Address" and
   the rest will work as usual.

   Traditionally the field "Destination Address" is used to make
   decision for forwarding packets. So, "Forwarding Address" can be
   termed as "Destination Address" and "Destination Address" can be
   termed as "Final Destination". It is just a matter of convenience.

   With the introduction of "Forwarding Address" in the IP packet
   header, applications that need to tunnel packets just to forward them
   to a different location, can avoid the cost of tunneling.

   2.4.1. Multihoming and misuse of address space

   With multihoming, same host gets identified with as many service
   providers as the customer network is connected with leading to a
   misuse of global unicast address space. With the approach of
   "Addressing follows topology" this redundancy can not be avoided. The
   alternate approach would be "Topology follows addressing" with
   provider independent address space. In order to support provider
   independent address space the system expects a separate address space
   for the provider network along with the address space for the
   customer networks with a suitable mapping scheme between them.  There
   are approaches that makes use of 128bit address space in order to
   support services for 64bit address space. So, redundancy seems to be
   unavoidable.  Also, whenever a host needs to communicate with a
   remote one in a provider independent address space, it needs to go
   for a query to find out the location of the later. If entire address
   space is made provider independent, this query may become a costly
   affair while dealing with a large address space.

   Statistics based on the distribution of section 2.2.1. expects an
   average use of 16 bit address space out of 24bit space (i.e. one out
   of 256) assigned for user-id. So, even if all the customer sites opt
   for multihoming, there should not be a problem for address space with
   a suitable distribution in a 64bit architecture.

3. Processing of real time packets (QoS issue)

   Here is an attempt to come out with a solution for IP switch based
   network to operate in the most user-friendly manner to transport data


Bandyopadhyay             Expires July 03, 2012                [Page 12]

Internet Draft                MSHN and IPv6             January 03, 2012


   traffic (IP) as well as real time (RT) traffic (as RTP[6] packet) in
   the existing 32-bit system.

   In case of IP routing/switching entire packet gets collected at the
   intermediate router/switch and forwarded based on the forwarding
   table. Inside the switch/router the variable length IP packet gets
   fragmented into smaller size frames at the ingress side. The frames
   gets transported through the switching fabric with proper priority
   mechanism (to support QoS) and then reassembled at the egress side
   and passed through the media for the next hop.

   In case of ATM, packets get fragmented at the ingress edge devices
   into small size cells. Entire packet gets transported as a stream of
   cells and gets collected at the egress edge device. The success of
   ATM over IP routing as far as speed is concerned is due to the fact
   that the latency gets reduced as the entire packet does not get
   collected, fragmented and reassembled at the intermediate nodes. So,
   in case of IP switch based network, if RT packets can be passed
   without getting fragmented inside the switch, better performance can
   be expected. i.e. one RT packet needs to get to fit inside one
   internal frame of the switch fabric. Additionally, to make this
   approach successful, maximum size of MPLS label stack has to be
   defined.  Inside the switch all the IP packets will be assumed to
   carry same number of MPLS labels whether they are having one or the
   maximum in real sense. In fact, to reduce overhead, this limit should
   be the minimum number of labels needed to satisfy all sorts of
   features supported by MPLS. i.e. label stacking of depth n (without
   limit) needs modification.

   If minimum frame size is selected to fit one RTP packet, overhead
   becomes too high due to very large (40 bytes: 20 bytes IP + 8bytes
   UDP + 12 bytes RTP) packet header. Again, if large frame size is
   used, fragmentation loss becomes too high for the small size packets
   (say, 40 bytes IP packets). So, a compromise is needed that will give
   a better result based on the IP packet size distribution. Frame size
   is selected based on the minimum value of the overhead due to the
   fragmentation loss of data packet as well as the overhead as header
   of the RT packets.

   Studies show that primarily IP data packets of three different sizes
   are found common in nature. Almost
          ~50% packets of size 40 bytes (TCP ACK),
          ~20% packets of size 576 bytes (path MTU set by X.25) and
          ~30% packets of size 1500 bytes (path MTU set by ethernet)
   Other packets are less compared to the above three categories and
   almost evenly distributed. For the sake of simplicity of calculation,
   traffic of the first three categories are only considered. Payload of
   the data traffic is the actual IP packet size where as the payload of


Bandyopadhyay             Expires July 03, 2012                [Page 13]

Internet Draft                MSHN and IPv6             January 03, 2012


   RT traffic is the payload inside RTP packet.

   If totBytes are to be transported across the internet and dataPcnt be
   the %of data traffic,

        totBytes*dataPcnt/100 = data traffic and
        (100-dataPcnt)*totBytes/100 = RT traffic;

   Out of data traffic 50% of 40 bytes length; 20% of 576 bytes length;&
                       30% of 1500 bytes length.

   If totDataPkts be the total data packets,
      totDataPkts*(50*40/100 + 20*576/100 + 30*1500/100) =
                                   totBytes*dataPcnt/100;
   or, totDataPkts*58520 = totBytes*dataPcnt;

   Let totBytes = 58520*100, for the ease of calculation;
   i.e.  totDataPkt = dataPcnt*100;
      40 bytes packets = 50*totDataPkt/100 i.e. 50*dataPcnt
      576 bytes packets = 20*totDataPkt/100 i.e. 20*dataPcnt
      1500 bytes packets = 30*totDataPkt/100 i.e. 30*dataPcnt

   RT packets = totBytes * (100 - dataPcnt)/100
              = 58520 * (100-dataPcnt);

   If n is considered to be the depth of MPLS label stack,
   inside the switch, actual size of
           40 bytes packet = 40+4*n bytes,
           576 bytes packet = 576+4*n bytes &
           1500 bytes packet = 1500+4*n bytes

   Let frameSize be the payload of a frame (excluding the frame header)
   inside the switch. If a RT packet fits exactly inside frameSize,

        RT packet payload = (frameSize-40-4*n) bytes;

   Total overhead = packet header overhead (of RT packets) +
                    fragmentation overhead (of data packets);

   If a plot is drawn for frameSize = 40+4*n+1 to 1500+4*n for different
   dataPcnt (with dataPcnt=80 to 100) minimum of overhead are found at
   frameSize = (84, 101, 118, 126 and 152) for n==3; frameSize = (119,
   127 and 152) for n==4 and at frameSize = (118, 127 and 152) for n==5.

   Actual data of the IP traffic has to be collected to get the best
   result. As dataPcnt increases minimum values are found at a lower
   frameSize and it gives better result with the higher range for lower
   dataPcnt. With average IP packet size 585 bytes, switches will


Bandyopadhyay             Expires July 03, 2012                [Page 14]

Internet Draft                MSHN and IPv6             January 03, 2012


   encounter a loss of 4*(n-1) bytes for packets that will need only one
   label.

   In order to make this scheme work, a standard for maximum label stack
   size has to be defined. RTP packet size also has to be standardized.
   The same scheme is applicable to all the switching systems where IP
   packets get transported in hop by hop basis unlike the way it works
   in ATM networks.

3.1. Dual mode operation

   Inside ingress as well as in the egress card, packets need to follow
   certain functional steps. In order to maximize the output, a series
   of processing units work in pipeline mode for these operations.
   Ingress service cards need to act in dual mode to process RT packets
   and non-RT packets. i.e. the RT packets should follow a direct path
   that won't need fragmentation and related complexities before they
   are sent to the VOQs (virtual output queues, where from packets gets
   picked up to be sent to the switching fabric). Whereas other packets
   need to follow a different path for fragmentation operations. This
   will prevent a RT packet to be blocked by the fragmentation procedure
   of not-RT packets that arrive in the service card prior to the
   arrival of RT packet. So, mere mapping of RT packet size with the
   frameSize of switch fabric will not achieve the speed of ATM
   switches.

   Simulation studies show that significant improvement is achieved once
   RT packets are directly sent to VOQs after the operation of label
   processing.  It will be worth to study by the hardware people to
   figure out whether entire set of data can be placed into queues based
   on their priorities and segmentation operation is done in each queue
   in parallel mode before putting the frames into their respective
   VOQs. Entire operation will be lot costlier, but simulation result
   shows that in such case, RT packets need not be restricted to fixed
   size cells. Standardization of label stack depth need not be imposed
   as well.

4. Refinements over existing IPv6 specification

   As IPv6 was envisioned long before some of the newer technologies
   e.g. MPLS came into picture, some refinements can be made over the
   existing specification. These considerations are related to bandwidth
   usages and performance inside switches. Previous chapter shows that
   smaller packet size gives better result for processing of RT packet.
   So, it is desirable to have IP packet header to be as small as
   possible.

   As described earlier, evaluation of the parameters


Bandyopadhyay             Expires July 03, 2012                [Page 15]

Internet Draft                MSHN and IPv6             January 03, 2012


   nMaxInterASTopNodes, nMaxInterASBottomNodes and nMaxASNodes is geo-
   political and have to be decided by IANA. Once these parameters are
   determined with mutual agreements, values of pA, pB, pC and prefix
   length of user id can be determined. If the total length comes out to
   be less than 128, length of IP header will be reduced accordingly.

   The 'flow label' field of IPv6 packet header may not be of any use
   with MPLS is in use. ATM used to have 4 priority classes. The first
   specification of IPv6 RFC-1883 used a 4bit type of service field
   along with a 24bits flow label field. These two were modified to a
   8bit type of service field and a 20bit flow label field in the
   current spec RFC-2460.  Too many priority classes may increase
   complexities to process inside switches. If type of service field of
   IPv6 header may be reduced to be of 4bit length as it was stated in
   RFC-1883 and 'flow label' field gets removed, another three bytes may
   be reduced from the IPv6 header.

   The field 'Hop Limit' has got a 8bit value in the existing spec. The
   role of this field needs to be discussed properly with a large
   address space.

4.1. Distributed processing and Multicasting

   With the inherent hierarchy involved in this architecture,
   distributed applications can also be structured in a suitable manner.
   Say, for a commonly used web based application a master level server
   will be there at every top level node. Any change that might happen
   in the application, has to be synchronized within these master level
   servers first. There might be servers at the middle layer (inside
   each inter-AS-bottom) inside each top level node. Once the changes
   get reflected at the master node, all the servers at the middle layer
   needs to update themselves with their master level node. This will
   reduce network traffic substantially. Inherent hierarchy in the
   architecture will also help establishing multicast tree in the
   similar manner. Work on these issues can be progressed only after
   this architecture gets approved.

5. Expected changes at the application layer

   IP packets with size 576 in most of the cases come out of those TCP
   layers that do not process maximum path-MTU and takes the default one
   that was set during X.25. The 576 factor can be corrected very easily
   with path-MTU set to 1500. With the consideration that label switch
   path do not get changed very frequently in between two arbitrary
   network points for any particular type of packet, most of the
   applications are expected to become UDP based with negative ACK. TCP
   in turn might go through changes. Once this comes into effect, 40
   bytes packets will come down drastically. Switch fabric frame size


Bandyopadhyay             Expires July 03, 2012                [Page 16]

Internet Draft                MSHN and IPv6             January 03, 2012


   needs to be determined keeping these two factors in mind along with
   changes in IP packet header. With the existing 32-bit system, frame
   size (excluding the frame header) of 152 and 127 are most viable
   solution in general for label stack depth=3,4 &5.

6. IANA Consideration

   This is a first level draft for proposed standard. Hence, IANA
   actions should come into play at a later stage, if needed.

7. Security Consideration

   This document does not include any security related issues.

8. Acknowledgments

   The author would like to thank to Professor Amitava Datta of
   University of Western Australia for his review and constructive
   comments.

9. Normative References

   [1]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for
        IPv6 Hosts and Routers", RFC 4213, October 2005.

   [2]  Fuller V., Li. T., "Classless Inter-Domain Routing (CIDR): The
        Internet Address Assignment and Aggregation Plan", RFC 4632,
        August 2006.

   [3]  Huston, G., "Commentary on Inter-Domain Routing in the
        Internet", RFC 3221, December 2001.

   [4]  Q. Vohra, E. Chen., "BGP Support for Four-octet AS Number
        Space", RFC 4893, May 2007.

   [5]  Srisuresh, P. and K. Egevang, "Traditional IP Network Address
        Translator (Traditional NAT)", RFC 3022, January 2001.

   [6]  Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson.
        "RTP: A Transport Protocol for Real-Time Applications", RFC
        3550, July 2003.

   [7] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
        Networks(VPNs)", RFC 4364, February 2006.

   [8] J. Moy., OSPF Standardization Report, RFC 2329, April 1998

   [9] C. Perkins, "IP Mobility Support for IPv4, Revised", RFC5944,


Bandyopadhyay             Expires July 03, 2012                [Page 17]

Internet Draft                MSHN and IPv6             January 03, 2012


       November 2010.

10. Informative References

   [10] Postel, J., "Internet Protocol", STD 5, RFC 791,
        September 1981.

   [11] Rekhter, Y., and T., Li, "A Border Gateway Protocol 4 (BGP-
        4)",RFC 1771, March 1995.

   [12] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
        Specification, RFC 1883, December 1995.

   [13] Moy, J., "OSPF Version 2", STD 54, RFC 2328, April 1998.

   [14] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6)
        Specification", RFC 2460, December 1998.

   [15] Rosen, E., Viswanathan, A. and R. Callon, "Multiprotocol
        Label Switching Architecture", RFC 3031, January 2001.


10. Author's Address
Shyam Bandyopadhyay
HL No 205/157/7, Inda
Kharagpur 721305
India

Phone: +91 3222 225137
e-mail: shyamb66@gmail.com


Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.


Bandyopadhyay             Expires July 03, 2012                [Page 18]