Network Working Group A. Dalela Internet Draft Cisco Systems Intended status: Informational M. Hammer Expires: July 2012 January 4, 2012 Service Orchestration Protocol (SOP) Requirements draft-dalela-orchestration-00.txt Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html This Internet-Draft will expire on July 4, 2012. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Dalela Expires July 4, 2012 [Page 1] Internet-Draft SOP Requirements January 2012 Abstract Cloud services need to interoperate across cloud providers, service vendors and private/public domains. To enable this interoperability, there is need for a standard protocol for exchanging service information. This draft describes requirements for such a protocol. Current cloud implementations expose application level APIs, which are not syntactically and semantically compatible with each other. One approach to interoperate cloud services is to standardize the protocol while leaving the API definition implementation specific. Standard protocols have been used widely in the Internet and can be extended to cloud. Use of such protocols is compatible with existing cloud APIs, which can exchange information in a standard protocol. New APIs may also be developed using a standard protocol. By this, it would be possible to interoperate diverse APIs across service providers, service vendors and service users. Table of Contents 1. Introduction...................................................3 2. Conventions used in this document..............................4 3. Terms and Acronyms.............................................4 4. Interoperability Scenarios.....................................6 5. Cloud Open Source and Open Standards..........................10 6. Is Cloud Control an Internet Problem?.........................11 7. Overview of Standard Work.....................................13 8. Deficiencies of Current Models................................14 8.1. Service Discovery........................................15 8.2. Service Publishing.......................................15 8.3. Persistent Identities....................................15 8.4. Blocking Calls...........................................16 8.5. Transaction Support......................................16 8.6. Interactive Behaviors....................................17 9. Extensibility Considerations..................................17 9.1. Service-Independent Components...........................17 9.2. Service-Dependent Components.............................19 10. Protocol Requirements........................................19 11. Separating Control and Policy Planes.........................20 12. Service Management Policies..................................23 12.1. Routing Policies........................................23 12.2. Security Policies.......................................24 12.3. Service Policies........................................24 13. Architecture Requirements....................................25 14. IANA Considerations..........................................26 15. Conclusions..................................................26 16. References...................................................26 16.1. Normative References....................................26 Dalela Expires July 4, 2012 [Page 2] Internet-Draft SOP Requirements January 2012 16.2. Informative References..................................26 17. Acknowledgments..............................................27 1. Introduction Cloud computing has become important for an on-demand delivery of a variety of services, broadly called XaaS, such as Infrastructure, Platform and Software as a Service [NIST]. Users of such services may be individuals, enterprises, content providers, or other cloud providers. These users need to be able to request and manage services seamlessly across private, public, hybrid, or community clouds. Lack of interoperability across these domains will lead to new kinds of cloud silos, which will in turn hinder economies of scale. Current cloud deployments use web-services (SOAP or REST) to deliver services over the Internet. Each provider exposes different APIs that generally do not interoperate, because each API has different syntax and semantics. To interoperate, we must either converge on one API format, or translate between them. Both alternatives are hard. API translations are difficult because APIs have different semantics. Converging to one API means current services may be broken. We want to maintain diverse APIs, while enabling interoperability. Historically, in Internet, different APIs have interoperated through use of standard protocols. Basically, we separate the network view of information from the application view. Network carries information via protocols while applications consume information via APIs. Web-services equate the network view of information with the application view. Basically, each API has its own packet format which is derived from the API, and changes to API syntax or semantics will change the packet format. This is at the root of interoperability issues. As applications proliferate, each API will project its view of information into the network. As a result, there will be as many communications "protocols" as there are applications. This is contrary to the (unstated) assumption in Internet that there are far fewer protocols than there are applications, so that many applications can communicate using the same protocols. To remedy this problem we should separate the network and application views of information and design them independently. Applications may design APIs in many ways and two applications should communicate using a standard protocol whether or not they use the same API. This document describes requirements for such a standard protocol. Dalela Expires July 4, 2012 [Page 3] Internet-Draft SOP Requirements January 2012 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. In this document, these words will appear with that interpretation only when in ALL CAPS. Lower case uses of these words are not to be interpreted as carrying RFC-2119 significance. 3. Terms and Acronyms +------------+ +--------------------------------------+ | Customer |<-------->| Provider | | +--------+ | | +----------------------------------+ | | | User | | | | Service=X +--------------------+ | | | | |**************| |Product-1 (Vendor-A)| | | | | | | | | +--------------------+ | | | | | | | +----------------------------------+ | | | | | | +----------------------------------+ | | | |**************| Service=Y +--------------------+ | | | | | | | | |Product-2 (Vendor-A)| | | | | | | | | +--------------------+ | | | | | | | | +--------------------+ | | | | | | | | |Product-3 (Vendor-B)| | | | | | | | | +--------------------+ | | | +--------+ | | +----------------------------------+ | +------------+ +--------------------------------------+ Fig-1: Cloud Ecosystem and Relations Provider: A Provider is a supplier of cloud services who offers these services to cloud Customers and Users, per some business agreement. Service: Any virtual instance of a hardware or software product that can be owned by a Customer or User for their personal use. Vendor: A Vendor is a hardware/software product vendor who provides the technology implementation of a service. In some cases, Providers and Vendors may be the same business entity. Product: A unit of software or hardware entity that is sold by the Vendor to the Provider to be made available as a service. Customer: A business entity that enters into an agreement with a Provider to source cloud services for their users. A customer would be an enterprise that buys cloud services. A customer would define Dalela Expires July 4, 2012 [Page 4] Internet-Draft SOP Requirements January 2012 policies for service and may authenticate its users. Customers may also be called Subscribers of a cloud provider's services. User: A user is the end consumer of cloud services. Users belong to the customer and place service requests on the provider. These requests are controlled by Customers who could be enterprises, individuals or other cloud providers who source services from one cloud and provide them to another. Virtual Provider: A Provider who does not host or manage services, but redirects requests to other providers who do that. A Virtual Provider has customers but does not operate services. Orchestration: This is the act of creating, modifying, moving or deleting services. It may involve one or more actions performed in sequence or in parallel. These actions could be invoked on hardware and software services, or even on other cloud providers. Service Domain Name (SDN): This is a dotted-decimal notation to represent service names hierarchically. For example, a virtual machine can be represented as iaas.compute.virtual. Each SDN will be associated with a set of service specific attributes. Dalela Expires July 4, 2012 [Page 5] Internet-Draft SOP Requirements January 2012 4. Interoperability Scenarios The following interoperability scenarios should be covered by the protocol. We list them here because depending on the context interoperability may mean different things. Scenario S-1. Users Interoperate Across Cloud Providers. Users must be able to use cloud services from different cloud providers in the same way. This allows a user to move across providers, or source the same service in a different geography from a different provider. In Figure-2, a user accesses the compute service across various providers in the same way. | | | | Customer | Virtual | Provider 1 | Provider 2 | | Provider | | | +--------------------------------------S-1-------+ | | | | | | | +-----S-1-----+ | | | | | | | | | | | +-------+ | +-------+ | +-------+ | +--------+ | | Cloud | | | Cloud |--------| Cloud | | | Cloud | | |Control| | |Control| | |Control| | | Control| | +-------+ | +-------+ | +-------+ | +--------+ | | | | | | | | +-------+ | | +---------+ | +---------+ | | User | | | | Compute | | | Compute | | +-------+ | | +---------+ | +---------+ | * | | * | * | * | service usage| * | * | ************************************************** | | | | | Fig-2: Scenario S-1 - User Interoperates Across Providers Dalela Expires July 4, 2012 [Page 6] Internet-Draft SOP Requirements January 2012 Scenario S-2. Users Interoperate Between Private and Public Clouds. Users should be able to interoperate private clouds with those in the provider domain. This could involve moving workloads between private and public clouds. It also means creating virtual services in the same way in the public cloud as that in the private cloud. In Figure- 3, a user creates compute in the private and public clouds in same way. Private storage is accessed from public and private compute. | | | | Customer | Virtual | Provider 1 | Provider 2 | | Provider | | | | | | | +--------------------------------------S-2---------+ | | | | | | | +-----S-2-----+ | | | | | | | | | | | +-------+ | +-------+ | +-------+ | +-------+ | | Cloud | | | Cloud |--------| Cloud | | | Cloud | | +--|Control| | |Control| | |Control| | |Control| | | +-------+ | +-------+ | +-------+ | +-------+ | | | | | | | | | | +-------+ | | | | | | | | User |********************************************* | | | | | | | * | | * | | | +-------+ | | * | | * | | | * | | * | | * | | +-S-2--+ * | | * | | * | | | | * | | * | | * | | | +-------+ | | +---------+ | +---------+ | | |Compute|<-- move workload -->| Compute |<-move->| Compute | | | +-------+ | | +---------+ | +---------+ | +-S-2--+ * | | * | * | | * | | * | * | +-------+ | | * | * | |Storage|********************************************* | +-------+ | | | | Fig-3: Scenario S-2 - User Interoperates Across Private and Public Dalela Expires July 4, 2012 [Page 7] Internet-Draft SOP Requirements January 2012 Scenario S-3. Providers Interoperate With Other Providers. Providers should be able to interoperate their services with other providers. This could mean sourcing each other services when the demand suddenly grows, or using one vendor's services as backup or for disaster recovery under an outage. Providers might agree to host services in each other clouds in a follow the sun models, where workload moves between providers located in different geographies. There could be "provider of providers" - a virtual provider that sources services across different providers by using interoperability. In Figure-4, a provider sources the storage service to complement their compute service, and offers compute and storage as a bundle to the user. | | | | Customer | Virtual | Provider 1 | Provider 2 | | Provider | | | | | | | +-------+ | +-------+ | +-------+ | +--------+ | | Cloud |-----| Cloud |--S-3---| Cloud |---S-3---| Cloud | | |Control| | |Control| | |Control| | | Control| | +-------+ | +-------+ | +-------+ | +--------+ | | | | | | | | | | | | | | | | | | | | | | +-------+ | | +---------+ | +---------+ | | User |*********************| Compute |*******| Storage | | | | | | +---------+ | +---------+ | +-------+ | | | | | | | | Fig-4: Scenario S-3 - Providers Interoperate Amongst Providers Dalela Expires July 4, 2012 [Page 8] Internet-Draft SOP Requirements January 2012 Scenario S-4. Providers Interoperate Services Across Service Tiers. A cloud provider may deliver many kinds of services, layered on top of one another. For instance, SaaS may use PaaS, which in turn may use IaaS, network and security services, etc. Since cloud providers build services incrementally, it should be possible to interoperate services across these tiers, without having to build a new IaaS system for every new PaaS, or a new PaaS for every SaaS. In Figure-5, a provider sources IaaS services for their PaaS from another provider in the same way as they source them internally. | | | | Customer | Virtual | Provider 1 | Provider 2 | | Provider | | | | | | | +-------+ | +-------+ | +-------+ | +--------+ | | Cloud |-----| Cloud |--------| Cloud |---S-4---| Cloud | | |Control| | |Control| | |Control| | | Control| | +-------+ | +-------+ | +-------+ | +--------+ | | | | | | | | | | | | S-4 | | S-4 | | | | | | | | | +-------+ | | +------+ | | +------+ | | User |*******************| SaaS |**************| IaaS | | | | | | +------+ | | +------+ | +-------+ | | * | | * | | | * S-4 | * | | | +------+ | | * | | | | PaaS |--+ | * | | | +------+****************** | | | * | | | | | * S-4 | | | | +------+ | | | | | | IaaS |--+ | | | | +------+ | | Fig-5: Scenario S-4 - Providers Interoperate Across Tiers Dalela Expires July 4, 2012 [Page 9] Internet-Draft SOP Requirements January 2012 Scenario S-5. Providers Interoperate Across Service Vendors. A cloud provider may source a service from more than one vendor. Examples of these include compute virtualization, storage, network, security, etc. A customer's existing orchestration solution should be able to orchestrate multi-vendor products and services. In Figure-6, providers deliver a service using offerings from multiple vendors in the same way. These inter-vendor services may also be connected. | | | | Customer | Virtual | Provider 1 | Provider 2 | | Provider | | | | | | | +-------+ | +-------+ | +-------+ | +-------+ | | Cloud |-----| Cloud |--------| Cloud |---S-4---| Cloud | | |Control| | |Control| | |Control| | |Control| | +-------+ | +-------+ | +-------+ | +-------+ | | | | | | | | | | | | | / S-5 | S-5 \ | | | | / | | | \ | +-------+ | | + +-------+ | +-------+ + | | User |********************|**|Compute|********|Compute| | | | | | | | |Vendor1|** | **|Vendor3| | | +-------+ | | | +-------+ * | * +-------+ | | | | S-5 +-------+ * | * +-------+ S-5 | | | | |Storage| * | * |Storage| | | | | +--|Vendor2|** | **|Vendor4|--+ | | | +-------+ | +-------+ | Fig-6: Scenario S-5 - Providers Interoperate Across Service Vendors The above scenarios are illustrative and non-exhaustive. There could be many permutations of the above scenarios. Standardization will benefit users, vendors and providers - the total cloud ecosystem. 5. Cloud Open Source and Open Standards Some efforts towards cloud openness today are focused on Open Source implementations of cloud services. This leads to the question of the relation between Open Source and Open Standards, as different ways to achieve interoperability. Obviously, cloud will not be totally open or closed source. The key problem in cloud is not the ability to inspect and modify code, which open source enables, but to integrate services, both open and closed. To integrate Open and Closed Source services, Open Standards are required, which may be implemented as Open Source. Open standards don't detract us from open source. Dalela Expires July 4, 2012 [Page 10] Internet-Draft SOP Requirements January 2012 On the other hand, lack of open standards can make open source less attractive because there can be many open source implementations that are incompatible. Within an implementation, various versions may be incompatible. This means that Open Source alone cannot solve problems of interoperability unless everyone converges to a common code base and contributes their private changes back into the common base. This is impossible to mandate and unlikely to happen. Open Standard implementations on the other hand will be interoperable even when implementations are enhanced in different ways. So Open Standards enhance rather than detract from benefits of Open Source. The key problem for cloud is service integration across vendors, providers and customers. These services will be Open Source, Closed Source, or multiple variations of Open Source. Integrating the variety of services is best done through Open Standards. 6. Is Cloud Control an Internet Problem? Given that problems of cloud interoperability need to be addressed through standards, it may not be obvious that they need to be addressed by IETF. Why is cloud control an IETF problem? First, to create, modify or move a distributed system, orchestrators need to know network topology. For instance, if firewall rules have to be installed for a VM, they must be installed on a device that lies in the "path" to the VM. To know which firewall lies on the path to a given VM, topology needs to be known. Similarly, if bandwidth needs to be provisioned between two sites, it is necessary to know which routers are at the edge of the two sites so that bandwidth can be provisioned between those routers. Likewise, if a VM is moved from one location to another, all associated network port configuration (such as VLAN or policy) needs to be dragged along with the VM. That requires the orchestrator to know which port the VM was attached on, and where it is going to move next. In some cases, the VLAN and policy may need to be provisioned not just in the access but also on the trunk ports to permit the packet flow. That requires the orchestrator to know which access is mapped to which trunks. To ensure that performance of a VM does not degrade after a move, it may be necessary to determine whether sufficient bandwidth is available at the destination location before the move is made. That requires knowledge of the paths that will be used and if those paths are congested. An orchestrator may need to assess the "distance" between the compute and network storage and between the user's location and the service's location for optimal performance. There are also cases when knowledge of topology is needed for network optimization. For example, the network paths may not be optimal after Dalela Expires July 4, 2012 [Page 11] Internet-Draft SOP Requirements January 2012 a VM move, and the paths may need to be re-provisioned. Such things are common with multicast and broadcast traffic that uses trees. During outages, network topologies are dynamically reconfigured. Recovery procedures must be aware of this network reconfiguration. The above examples illustrate a close relation between network information and orchestration of services. These two are currently treated as separate domains, and they need to be linked. Second, cloud service discovery is about knowing the capability of devices in the Internet. Today, IP routing allows us to discover the location of IP addresses, but not their capabilities. For instance, the same IP address can belong to a PC, a router, a storage array, an IP-TV, a mobile phone, etc. Network protocols don't tell us the "semantics" of the IP, namely what that IP can "do". This of course is not a new problem, but cloud makes this problem very important. Cloud is about the ability to know which capabilities are available where in the network. This would be achieved if some protocol advertizes capabilities of IP addresses. Ideally, the systems that advertize addresses and those that advertise capability should be linked because the capability is of the address. To reach that capability, we need to translate it into an address. When a service is yet to be created, it needs to be referred by its capability because the DN or IP for that service is yet to be created. This capability can be advertized by some service orchestrator that can create the service based on a request. In the Internet, a service naming mechanism is needed to advertize and request services by their "type" instead of DN or IP (DN and IP are useful for advertizing and requesting services that exist). These names can have a similar structure like DNS or IP addresses (dotted- decimal) but need to belong to a separate address space. We can call these "type" names Service Domain Names or SDNs. Third, a cloud user may not care about the IP or DN of a service. What users care about is the "type" of service they are looking for. This service may be fulfilled anywhere in the network. The user will issue a request referencing the SDN, and would expect the request to be automatically routed to its correct destination. This is possible if SDNs have been advertised in the network. A user can forward a request to service aware router, and the router will map the request to destination. Mappings between service types and addresses can be done at the edges of the Internet allowing users to be unaware of IP addresses while the Internet to be unaware of services. A variety of policy controls can be built at the network edges to determine how a service "prefix" is mapped into an IP "prefix". Dalela Expires July 4, 2012 [Page 12] Internet-Draft SOP Requirements January 2012 The problem of routing based on "types" is similar to routing based on IP addresses. In both cases, addresses need to be discovered, aggregated by some meaningful prefix, and advertized to routers upstream. These similarities imply that service routing can be implemented in ways similar to Internet routing in the past. Fourth, thus far the link between capability and address has been done for services that are already created, generally within an administrative domain. For instance, it is possible to use DNS to discover the address of a printer or email server. Cloud deals with creation of services on-demand. This discovery over the Internet needs a somewhat different ability, such as policy control, routing, billing of services, authentication, security from denial of service, SLA announcements, etc. There is a greater amount of complexity in advertizing service information, publishing service interest, policies to control per-user services, etc. However, these issues are similar to things that have been done in IETF earlier. In summary, orchestration needs to know network topology. The network can learn and advertize service capabilities like IP addresses. A mapping between addresses and capability is needed to perform service request routing. Such mappings have been created in the past, but just not to the extent required for cloud. The problem is both relevant for IETF and optimally solved within IETF. 7. Overview of Standard Work To run the service exchange network over the current Internet, three important enhancements to the current schemes are proposed. First, we need a service naming convention that addresses services by their "types" rather than by their DN or IP addresses. This naming system should also be hierarchical, in order to aggregate service types into "classes" of services. For instance, virtual machines may be referred by the name iaas.compute.virtual and firewalls by the name iaas.network.services.firewall. Each class of service may be associated with one or more attributes, or may be further divided into sub-classes, or sub-sub-classes, with suitable names. We can refer to these names as Service Domain Names or SDNs. Second, we need a protocol that advertizes SDNs and routes service requests based upon these SDNs. This protocol will facilitate service aggregation based on names, service discovery, advertisement, selective publishing and indication of service interest, besides mechanisms to route the request based on where it can be fulfilled. We can refer to this as a Service Routing System (SRS). Dalela Expires July 4, 2012 [Page 13] Internet-Draft SOP Requirements January 2012 The SRS needs to map service "prefixes" into IP "prefixes" and will interact with a policy based control system, where users, customers or providers can define rules for routing requests to a destination. The SRS discovers services and their locations and provides the mapping between Service Names and IP or DN. Using this mapping it is possible to identify the service by its name as well as type. The SN, DN and IP names are orthogonal name spaces. That is, any SN may map to any DN, which may in turn map to any IP. Third, there is need for a common format to specify service attributes. This common format can be XML and it is necessary to define cross-service-domain orchestration rules. For example, in a L3 network, the IP of a host must belong to the subnet configured on the switch. The IP access-list on the switch must permit the IP address on the host. The ports open on a host must also be open on the firewall. The file systems accessible to a VLAN must align with the VLAN configured on the host access interface. The user-ids provisioned on the server must be available to authentication on the network storage. The speed of the virtual host interface must be equal to the bandwidth allowed to the host on the virtual or physical network interface. The virtual MAC allocated to a VM must not clash with any other virtual or physical MACs allocated anywhere else on the VLAN. The authentication system must use a combination of the tenant-id on the network in addition to the user-id on the host. These relations represent semantic "rules" of orchestration. Today, we can't express these rules because information schemas across domains are incompatible. In effect it requires us to map some parameter in some CLI to some OID in another MIB. Or, some attribute in some XML schema to some TLV in another Protocol. Or, the value of a resource in a GUI to a range specified via another API. If all services are described in a common format (such as XML) then orchestration rules can be easily specified. This will allow rapid customization of services by defining orchestration rules in a high- level language rather than programming in a low-level language. 8. Deficiencies of Current Models Cloud deployments today use HTTP web-services (SOAP and REST) to distribute service information and manage services. Web-services were designed for distributed application objects, where one object executes requests on other objects. This leads to the question if treating cloud orchestration as a distributed application object is the right approach to thinking about cloud services. In this section we will describe limitations of the web-service model. The web- service model is constrained by the capabilities of HTTP in service discovery, publishing and transaction management. Dalela Expires July 4, 2012 [Page 14] Internet-Draft SOP Requirements January 2012 8.1. Service Discovery HTTP was designed to connect clients to servers, but not designed for clients and servers to discover each other. HTTP assumes that client- server discovery happens through other mechanisms. The Universal Description Discovery and Integration (UDDI) web-service standard for instance defines registries where providers could publish their services but this mechanism is manual and not widely used. In the cloud network, operators require services to be automatically discovered and advertized to consumers. Dynamic service discovery is also needed because as services are allocated or de-allocated, capacity dynamically changes. Manually detecting these changes would be nearly impossible for any large deployment. HTTP does not have procedures by which a network of clients and servers can DISCOVER others and ADVERTISE their presence. HTTP allows a client to connect to a server after it has been discovered. 8.2. Service Publishing With millions of possible services, users may rarely be interested in all such services. They may instead define selected types of service "interest" and expect to be "notified" when new services of interest are available. HTTP does not support SUBSCRIBE and PUBLISH mechanisms by which a client can SUBSCRIBE to select interests and would be notified of new services through a PUBLISH. To know of the existence of new services, a Client must query a registry periodically. This makes service publishing a synchronous phenomenon and can be very hard to scale if millions of users query available services at regular intervals. To scale service publishing, it is necessary to make publishing an asynchronous phenomenon. HTTP is not designed to deal with asynchronous publishing. 8.3. Persistent Identities HTTP loses the identity of a client after a transaction (such as GET or POST) has been completed. This means that every new transaction has to be authenticated and may require a new key-exchange. When millions of service instances have to advertize their presence or publish capabilities periodically, it is imperative that the underlying control protocol can maintain identity information persistently across these multiple transactions. For instance in Session Initiation Protocol [SIP] users REGISTER with a SIP Proxy, at which time they are authenticated. Subsequent session Dalela Expires July 4, 2012 [Page 15] Internet-Draft SOP Requirements January 2012 initiations don't require authentication. The identity established at the time of registration can be used across all transactions. This mechanism can be very useful as a single sign-on capability because after registering once, every other service does not require the user to be authenticated. The user can interact with all services by using the identity established during the registration. HTTP does not enable this because authentication is done by the server. 8.4. Blocking Calls In a web-service call, a client blocks waiting for a response from the server. There is no mechanism for the client to timeout on a request, or cancel the request midway. If the server fails to respond to the request, the client must separately terminate the connection. This is not ideal because the server may in fact be taking a longer period of time to fulfill the request. When requests are used to orchestrate complex services, a server needs to send provisional responses indicating that a "session is in progress". When a service involves multiple independent but related components (such as network, storage and compute), failure in one component may render the entire service unusable. In such cases, it is necessary to cancel the request midway. HTTP blocks for the server to respond and cannot cancel on-going transactions. The only mechanism to terminate the transaction mid-way is to close the HTTP connection, which can then result in leaked resources or incomplete actions. 8.5. Transaction Support Complex orchestration scenarios need to treat multiple operations as a single atomic "transaction". For instance, an orchestration request may allocate compute, storage, network and security resources in a single request. Unless all of these operations have succeeded, the resulting service is not useful and must be cancelled as a whole. If all operations have succeeded, then they must be committed as a whole. Complex orchestrations thus need transaction support. There are two ways to build this transaction support. First, each service can have its own transactions and cancelations. Second, transactions can be available natively in the orchestration protocol. Obviously, the first approach is very complex, and the preferred route is to have transaction support in the protocol. HTTP does not have the ability to create transactions. HTTP request- response is atomic and considered complete individually. One HTTP request-response is independent of prior or subsequent request- response even to the same server, let alone another server. Dalela Expires July 4, 2012 [Page 16] Internet-Draft SOP Requirements January 2012 Orchestration requires the ability to correlate request-responses across multiple servers and commit or cancel them as a whole. If an orchestrator that uses HTTP web-services fails after making a request, the client will believe that the transaction has failed, while the service nodes continue to allocate resources towards completion. The client cannot be billed for the service, although the services would be created. To address reliability issues, each service must build application level transactions, and these will rapidly grow as services are modified. A native mechanism at the protocol level is required to address this. 8.6. Interactive Behaviors Incompatibilities between a cloud request and cloud policies or partial failures in service orchestration may require an orchestrator to prompt a user with questions and/or confirmation before proceeding. For example, if a VM has been allocated but the requested amount of network storage is not available, the orchestrator may need to prompt the user to allocate a reduced amount of storage. Such interactive behaviors need to pause a transaction waiting for a confirmation from a user. HTTP does not allow a server to make another client connection to ask this question during an on-going transaction. Also, if the question is passed as a provisional response to the user, a user's response would be treated as a new request. HTTP has no schemes to tie a request to another request in the past, as all requests are independent. 9. Extensibility Considerations One of the key issues in standardizing service orchestration is how this standard can be extended for service variety. To make the orchestration standard extensible to many services, we need to separate things that are service independent from those that are service dependant. Through this separation, it would be possible to extend a service protocol to transmit information about a variety of diverse services. This separation is described below. 9.1. Service-Independent Components - Orchestration Verbs. Regardless of the kind of service that is being offered, there is need for service Discovery, Creation, Modification, Deletion, Migration, etc. There is also need for Confirming and Canceling requests midway through a transaction or indicating Successes and Failures upstream. Cloud involves many such useful "verbs" which are service independent. Whether we are creating a VM, VPN or Disk, the "CREATE" verb can be used to Dalela Expires July 4, 2012 [Page 17] Internet-Draft SOP Requirements January 2012 indicate the operation of service creation. This common "CREATE" can be used for a variety of create tasks, and its meaning can depend on the receiver. Defining the verb once eliminates the need to redefine the same operation for each new service. A collection of such verbs can be standardized for any service to use. - Transaction Nouns. To construct orchestration message transactions, there is need to address messages to destinations and identify their source, match requests with responses, bundle multiple such messages into a single complex exchange, sequence requests in the correct order with sequence numbers, have message fields to identify type of content and content lengths, common procedures for challenge and authentication of requestors, and many other such transaction level functions. Like orchestration verbs, these are service independent and can be standardized, without limiting service diversity and flexibility. - Workflow and Task Language. Different users will request different combinations of services. One user might request a VM with only an IP address, but another user may also require storage allocation, bandwidth reservation, a secure firewall and a VPN to be setup automatically when a VM is allocated. To accommodate variety of service requests, a generic mechanism to define Workflows is required. A Workflow identifies a set of tasks to be performed for service orchestration. Users or providers may define Workflows at various levels of abstractions. Hence, it is important to distinguish Workflows from actual Tasks. A Workflow might equal to one Task, or a Workflow might comprise of several Tasks bundled as a single request. A service independent language to describe Tasks and Workflows is needed. A User should be able to refer to Workflows and Tasks using unique identifiers. - Service Domain Names. To name services, a classification scheme is required. Classification allows us to combine attributes across similar types of services. We can take an object oriented approach for defining service domains. For example, "network" can be a root domain, "switching", "routing" and "network-services" can be child domains of the root "network" domain, "security" and "packet inspection" can be child domains of the "network-services" domain, etc. Child domains may inherit properties of the parent domain. A child domain may override the parent domain's attributes by redefining them in the child domain. Once a domain naming is well understood, service Proxies only need to advertize domains, with references to well-understood domain schemas. Users who request services will know what they are requesting based on domain name of the service. They will also know each domain's attributes. This abstracts a service implementation from the service user. Dalela Expires July 4, 2012 [Page 18] Internet-Draft SOP Requirements January 2012 9.2. Service-Dependent Components - Service Domain Parameters. Each service domain can have its own service specific parameters. They can reuse existing parameters by inheriting an existing domain. Domain parameters are inputs into a request, and effectively can be used like parameters being passed into APIs. Each domain may be associated with its own schema so that an orchestrator that does not understand a domain can still validate the request before forwarding it. The parameters of a domain can be defined in a sufficiently generalized way to apply to a wide variety of services in that domain. - Vendor Specific Domains. Some service might not be standardized through well-defined domain definitions. These definitions cannot be understood by all clients or users. These may however be understood between select network end-points that choose to use such definitions. Using Vendor Specific Domains, experimental or customized domains may be defined. 10. Protocol Requirements A protocol that supports service variety must separate service- independent and service-dependant parts of information. The service- dependant and service-independent information may be carried in the same message. This section describes needed capabilities for various service-independent and service-dependant functions. P-1. N-way transactions - an orchestration controller will need to perform multi-domain (e.g. storage, compute, network, etc.) service operations. The protocol should be able to stitch these varieties of service domains into a single context. All transactions in the client-server model are 2-way, so this needs a new protocol. P-2. It should be possible to sequence and parallelize messages within a single context. Sequences or parallelization would depend on the specific needs of a particular kind of service. For instance, compute and network services may be provisioned in parallel, while workload movement across geographical regions must take place sequentially. Accordingly, the responses to such requests may also be received in sequential or parallel fashion. P-3. When using requests in a parallel or sequential fashion, it should be possible to "commit" these operations as a whole. If errors are encountered in any one of the transactions, it should be possible to "cancel" the entire service context as a whole. Dalela Expires July 4, 2012 [Page 19] Internet-Draft SOP Requirements January 2012 P-4. For reliability, the protocol should support timers and timeouts on requests. These timers may be used to expect a response to a request within the specified timeframe. When the timer expires, recovery actions should be possible. This is also useful in case of network failures, and on-going transactions can be automatically reversed. Through use of timers, and automated reversal, failures would not result in leaked resources, incorrect accounting, etc. P-5. The protocol should support explicit mechanisms to advertize services and discover other service agents in a network. That is, configuration of service agents should be minimized and the protocol should facilitate automated discovery and advertisement. P-6. The protocol should support selective propagation of service information through use of publish-subscribe mechanisms. It should be possible for a client to request specific kinds of service information that it supports and expects to know about. P-7. It should be possible to define workflows and tasks at various levels of abstraction. Some users will prefer abstract requests that are translated to concrete requests at some point before fulfillment. Others may prefer that they define every service parameter. The protocol must be able to support both these cases. P-8. The protocol must support the CRUD (Create, Read, Update and Delete) operations to transact services, after discovery of agents and selective service exchange. These operations are part of HTTP and should be present in the new protocol as well. P-9. It should be possible to refer to services using standard names. Use of standard names establishes convention on how services will be referred to, which in turn facilitates interoperable service publishing, advertizing, discovery and requests. P-10. It should be possible to associate each service name with service-specific properties. These properties may be mandatory or optional. It should be possible to re-use these properties by inheriting a service name into another service name. 11. Separating Control and Policy Planes Each service may be customized according to a variety of needs such as customer profile, user roles, location awareness, service design, SLAs, etc. The set of rules that are used to customize a service represent the "policy plane" as they specify how a service must be designed. This policy must obviously interact with the protocol messages ("control plane") to control service orchestration. Dalela Expires July 4, 2012 [Page 20] Internet-Draft SOP Requirements January 2012 There are two broad approaches in which policy and control can interact. First, we might collapse the difference between control and policy, and just have a single plane that is designed for specific services. Second, we might separate control and policy planes, and allow independent evolution of policy and control planes. These options and their relative merits are discussed below. In many orchestration schemes, the policy and control planes are collapsed into one. The orchestrator is designed and pre-programmed to automate a few types of services. This scheme works well if the desired service variety is small. Basically, for a small number of service types, a few service templates can be hardcoded and published to users. Users may choose from amongst available service templates to create services on-demand. A service template defines a set of business rules using which services would be created, deleted, modified or moved. If pre-defined rules meet the requirements of users, this is a huge simplification over manual service creation, and a good starting point for service automation. +----------+ +----------+ | Policy | | Policy | +----------+ +----------+ | | | | +----------+ +----------+ +----------+ +----------+ | Client |<--->| Server | | Client |<--->| Server | +----------+ +----------+ +----------+ +----------+ Option (a) Policy at Clients: Option (b) Policy at Servers: Client Mgmt Complexity Server Mgmt Complexity +----------+ +----------+ +----------+ | Client |<-----------| Policy |---------->| Server | +----------+ +----------+ +----------+ Option (c) In-Band Policies - Complexity Centralized Figure-7 Policy Deployment Models However, as the service variety grows, this approach cannot scale because the number of orchestrators will increase linearly with the number of service types, and the complexity in each orchestrator will increase exponentially with customization of business rules. Now, it is necessary to separate definition of business rules ("policy") from execution of rules ("control"). Interoperable control requires a protocol and interoperable policy requires an abstract high-level language to define orchestration rules. If the language of rules and Dalela Expires July 4, 2012 [Page 21] Internet-Draft SOP Requirements January 2012 protocol have been separated and standardized, then the hurdles to deploying new services have been significantly reduced. There are still multiple policy deployment options where policy is deployed at different points in the network, and these options can make important differences to the ease of service management. Different policy deployment options are shown in Figure-7. First, policy may be attached to the user, such that users tune their personalized policies about services. Second, policy may be attached to each service, and the hardware-software vendor must give a configurable system for policy controlling each service, which the provider will have to customize to suit the needs of their deployment. Third, policy may be attached to the orchestrator, which may be defined either by provider or customer or jointly. The key difference between these options is who controls the service. Client-based policies are totally in control of clients. Server-based policies are in provider control, but require the provider to individually manage policies on each service instance. When services are created dynamically, these service instances may have to download policies dynamically and refresh them when policies change. Dynamic changes to policies may disrupt existing services unless each server has the intelligence to process policy rules per request. If common policies have to be implemented across a set of clients, then these clients must be updated with the new policy rules. There must also be intelligence in client or server to deal with policy inconsistencies across client and servers. All this entails a significant amount of complexity in implementing and managing services. Orchestrator based policies in contrast are easy to manage because they can be controlled at few network points. When policies change, the client and server don't have to be updated because policies are enforced run-time. Orchestrator policies can also be controlled either by provider or customer or jointly. It is architecturally important to place this control in the right point in the network to facilitate the best control scenarios. Obviously, orchestrator based policy control is more flexible and easier than others. When policies are attached to orchestrators, clients and servers remain unaware of policy. Policy is now enforced at a small number of customer and provider edges. While the total number of policy rules remains unchanged, the complexity in managing these rules is reduced by centralizing the intelligence to define and apply policies. Challenges related to policy consistency are also addressed. Dalela Expires July 4, 2012 [Page 22] Internet-Draft SOP Requirements January 2012 To apply these policies, client requests must be intercepted, policy transformed and policy routed before they reach the server. The clients and servers don't need to be aware of this behavior. The rules for controlling service requests can be defined through configuration in a policy server. Now, an orchestrator can download policy rules for a service, and execute those rules in real-time. The separation of the control and policy planes allows the same control plane to be re-used for a variety of policies. Policies can be defined through configuration instead of being programmed in the orchestrator. And a common control plane can be used to orchestrate variety of services. Through this separation, a service orchestrator becomes a "Programmable Orchestrator", because it does not hardcode service logic. Rather, orchestrators can be "programmed" through policies defined by users in a user-friendly language. This approach eases service creation and customization of existing services while reducing overall management complexity. 12. Service Management Policies This section describes different types of policies that might be used in cloud services. A few of these policies are currently being employed in the industry today, while many of them are desired features of cloud services in future. The totality of these policy types create a level of complexity that cannot be deployed by embedding policy in client or server. These policies should exist in a separate policy plane that interacts with the control plane. 12.1. Routing Policies A service may be sourced from multiple destinations and to route a request to the correct destination, various types of routing policies may be applied. For example, a service request may be routed to the geographically nearest provider. Or, it might be routed to a location that offers the cheapest service rate or, to a different location based on time of day. There might be routing rules based on SLAs. Each user's request may be routed differently based on their roles. There could be rules specific to a type of service, or routing may be determined by the locations that have the necessary capacity. Routing may be determined by legal or governmental regulations. These rules may be dynamically changed, and different rules may apply to different types of services, users, locations, roles etc. The provider and customer may independently or jointly define these policies, and enforce them at customer edge, provider edge, or both. Dalela Expires July 4, 2012 [Page 23] Internet-Draft SOP Requirements January 2012 12.2. Security Policies Security in the context of services encompasses a broad spectrum of issues spanning authentication, authorization and accounting (AAA). For instance, a customer may authenticate its users based on internal user-databases, while a provider owns the authorization and accounting of the service request. Or, a customer may own user- specific authorization and authentication while the provider owns the accounting. As users join or leave a customer, the provider may not own user-specific authentication and policies. The AAA functions are best performed at the provider or customer edges. First, each service should not be required to do AAA; it is inefficient and complex. Second, service nodes must be protected from DoS attacks by preventing unauthorized requests from entering the network. Third, services may only be accounted as a bundle (e.g. network, compute and storage form a single usable service bundle) and not individually. Fourth, request logging for business analytics is best done at the network edges and not in individual services. A provider may also wish to hide network topology of services, and may abstract locations from user-visibility. For instance, a provider may publish one interface to access all services although these services are orchestrated by service-specific orchestrators. And these orchestrators may be situated in different locations. 12.3. Service Policies Complex services require coordination of multiple resources. A VM for instance may need network attached storage, network based security and network quality of service. The VM service may be regarded incomplete without the combination of all services. But, much of this is a matter of policy. Some VMs may require network attached storage, while others don't. Some VMs may need firewalls, while others may just need encryption of data. Some services may need a specific amount of network bandwidth to be available. Policies associated with services can be abstracted from clients and servers. Accordingly, when a client requests for a VM, the request may be modified to include storage, security and quality of service requests before it reaches the server. Likewise, if a user is not authorized to request high-end services, their requests might be automatically downgraded to the appropriate grade of service. This is a function of policies that a provider and customer define. This means that an AllocateVM request may do different things for different classes of users. Users may be upgraded or downgraded in Dalela Expires July 4, 2012 [Page 24] Internet-Draft SOP Requirements January 2012 the level of services, while using the same AllocateVM request. This means that the syntax and semantics of a request is not fixed in advance. Rather, it is determined based on context, and different factors may be used to modify these requests in transit. It is important to restrict the syntax and semantics of a request from an end-user perspective. It is also important to offload this restriction from the service itself. Thus, a server should be able to support a superset of request parameters, to allow any user to access the service in different ways. But each client may only request a well-defined subset of those parameters, based on prior customer or provider defined policies or SLAs. The validation and tweaking of request parameters in a user-specific manner should be controlled by policy in transit. In effect, the requests that a client makes and the requests that a server receives can be very different based upon the policies that modify the request in the middle. 13. Architecture Requirements The general principle embedded in the following requirements is sub- system re-use by identifying common requirements and avoiding duplication for every new service (XaaS) needing to be deployed. A-1. To ease the creation of varied services, there SHOULD be a separation between policy and protocol. Policy MUST deal with abstract rules about which components make up a service, and how those individual components must be created, deleted, modified or moved. Protocol MUST deal with the execution of these rules. A-2. The interaction between policy and protocol SHOULD take place at the service orchestrators. Embedding this interaction in the client and server increases complexity and makes it harder to deploy new services or customize existing ones. A-3. The Policy control MUST contain rules for service Authorizing and Accounting. That is, it must have rules about which users are allowed to access which services, or how services are customized for users and the user-specific charging rules to be applied. A-4. Orchestration MUST be able to use the same Identity Management infrastructure for all services. Authentication should be performed by a coherent system across current and new applications. That is, each new service should not require new sets of mechanisms. Rather existing support systems should be extensible. Note, this may also span both provisioning and use of any particular service. Dalela Expires July 4, 2012 [Page 25] Internet-Draft SOP Requirements January 2012 A-5. Orchestration MUST be able to utilize the same Accounting system across multiple services. New accounting systems should not be required for each service. Rather, the orchestrator MUST be able to use the same accounting system to create charging records. A-6. Orchestration MUST be able to integrate with existing Fault management systems. Orchestrators MAY offload and/or automate intelligence to recover from failures. A-7. Orchestration MUST be able to integrate with existing Performance management systems. Orchestrators MAY offload and/or automate intelligence to recover from performance issues. A-8. Orchestration MUST be able to use common Operational Support Systems (OSS) such as DNS, DHCP and BOOTP systems. A-9. Orchestration MUST be able to integrate with existing customer support and billing systems and/or provisioning new customers (BSS). This is to enable a single customer interface for all services. 14. IANA Considerations Not applicable. 15. Conclusions Interoperable ways of creating, delivering and consuming services is essential for cloud. To create this interoperability, there is need for an open standard protocol for exchanging service information. This document captures the requirements for such a protocol. We envision that such a protocol can be an essential ingredient of Cloud Controllers / Proxies to exchange services across multiple private, public, hosted, community and other clouds. 16. References 16.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 16.2. Informative References [NIST] DRAFT Cloud Computing Synopsis and Recommendations http://csrc.nist.gov/publications/drafts/800-146/Draft- NIST-SP800-146.pdf Dalela Expires July 4, 2012 [Page 26] Internet-Draft SOP Requirements January 2012 [SIP] Session Initiation Protocol http://www.ietf.org/rfc/rfc3261.txt 17. Acknowledgments This document was prepared using 2-Word-v2.0.template.dot. Dalela Expires July 4, 2012 [Page 27] Internet-Draft SOP Requirements January 2012 Authors' Addresses Ashish Dalela Cisco Systems Cessna Business Park Bangalore India 560037 Email: adalela@cisco.com Mike Hammer Reston Virginia USA 20190 Email: mphmmr@gmail.com Monique Morrow Cisco Systems [Switzerland] GmbH Richistrasse 7 CH-8304 Walllisellen Switzerland Email: mmorrow@cisco.com Peter Tomsu Cisco Systems Austria GmbH 30 Floor, Millennium Tower Handelskai 94-96 A-1200 Vienna Austria Email: ptomsu@cisco.com Dalela Expires July 4, 2012 [Page 28]