Philip Kershaw

Saturday 26 January 2013

Helix Nebula, the Science Cloud - Results and User Engagement Meeting

A write up from the notes I took at this meeting last week at ESRIN. This is my first closer look at this project (http://helix-nebula.eu/) after hearing bits and piece along the way. It's interesting on a number of levels - application of science use cases to cloud, academic-industrial partnership, federating clouds, service management and the European landscape - policy, legal issues and politics.

Helix Nebula has big ambitions – with the aim to make it the first port of call for cloud provision for the research community by 2020: development as a multi-tenant open market place, the creation of an eco-system to facilitate growth for both science and industry. Science users are the initial target with use cases but this is fully expected to expanded to a wider user base. One member of the consortium noted the increased attendance at meetings over the course of the project as an encouraging sign and there has been a growth in membership for 20 to 34 members over the past year. There are there large players from the science community: CERN, EMBL (European Molecular Biology Laboratory) and ESA. Collaboration is seen as an essential element to meet the goals drawing from the various stakeholders IT providers, SMEs, Science and Space organisations and EC support and projects. Some of the commercial companies represented: Atos, CloudSigma, T-Systems, Terradue, Logica. EGI representing cloud provision via the academic community.

I've heard put before the argument for a collective approach as the only means to develop something which is in any way comparable with the establish big cloud providers. Here the case was put forward, that drawing the collective resources of multiple providers together can build scale. A new one to me also, the aspect of risk sharing. A federated approach has the potential to build something stronger since it has the advantage of diversity over a single cloud provider with comparable resource. There are also political aspects: EC compliance re. privacy, policy and ethics and the ability to meet these by hosting resources exclusively within the EU's boundaries.

Technical Architecture

The technical architecture has been developed by representatives from the supply-side a Tech-Arch group. This was a little concerning but then I found out that there are actually two, a complementary Serve-Arch group so that the demand side is represented also. It was reassuring to hear that the user interests are represented in this way but could there just have been one group and how does it work in practice having the two? Hearing more about the requirements analysis, it sounded more holistic with user stories created around a set of actors identified from both supply and demand-side.

A service-enabling framework, is the blue-print, a federated system of cloud providers. The so-called 'Blue-Box' (apparently so-called because when first put forward the relevant Powerpoint slides had a default blue scheme!). This is an interface between the many consumers and providers harmonising the interfaces of the various cloud providers. This was all starting to sound very familiar coming from my experience with Contrail and this was confirmed 'opening' the Blue-Box it revealed, users at the 'north'-end an SOA to manage the various federation-level cloud capabilities in the middle and then below this at the 'south'-end APIs to the various cloud provider interfaces. This system is being constructed via a roadmap involving three releases. This is building capability incrementally using reference technologies as building blocks and identifying low hanging fruit from which to prioritise work. The federation layer is then a thin one. I can see major issues to tackle - for example management of SLAs across multiple providers - but the approach at least starts with what is possible.

I was encouraged to hear about this federated approach but coming to the meeting my feeling was that it's surely not in the interests of commercial cloud providers to harmonise their interfaces and make a federation with their other competitors possible. There's further inertia though: in creating a generic interface for a number of different cloud providers, there's a danger that the federation layer generalises or dilutes their individual capabilities. All cloud providers are not made equal and each may have specific capabilities which make them stand out and give them a particular market edge. At worst, a cloud provider might be being shoe-horned into a system that dilutes this or gives a vanilla or sub-optimal exposure of their functionality through the federation layer to the users sitting at the North-end.

This brings me to the input from EGI. I would there is a greater potential to develop a federated system since its strongly in the interest of the academic sector. Matteo Turilli (OeRC) gave an overview of current developments. Private clouds are being deployed within EGI with twenty providers from across Europe. A cloud profile has been developed as a standard and a testbed established aiming to build a federation across the private clouds represented in the community. The heritage of a production Grid is providing an infrastructure upon which to build. Solutions for federated identity management are already established with Shibboleth and GSI. I don't see support for such technology amongst commercial cloud providers. I suppose there is no incentive but it means that at the interface between federation layer and cloud provider, the best the federation can do is cache the user's credentials for each cloud provider and hope the user doesn't mind. To paraphrase from a social networking site, 'you can trust us with your password, we won't pass it on to anyone else'. ;)

OCCI 1.1 has been adopted for the cloud standard and various cloud solutions are in use: OpenNebula, OpenStack, WNoDes, Okeanos. IaaS is the initial service model being offered augmented with various services including capability for VM sharing. Demonstrations of the federated system are carried out every six months with 3 completed to date. How does this fit in with the Helix Nebula Blue-box? A strategy of a thin federation layer is being adopted to avoid the complex problems that will inevitably arise with deeper integration. There are problems of course. OCCI does not seem to be getting much traction in the commercial community and the perception seemed to be that it does not have the level of granularity in its interface to communicate more detailed user needs. jclouds was mooted as a possible alternative. It too exposes a REST-based interface.

Flagships

There were presentations from each of the three flagships: CERN, EMBL and ESA. I don't have space to go through these in depth here but the presentations are available on the Helix Nebula site. CERN's use case used results from the ATLAS experiment. Interesting that processing jobs had low i/o. Network bandwidth between providers and CERN was not an issue. ESA's case seemed to be an expansion of the SSEP (Supersites Exploitation Platform) looking at disaster response scenarios where processing needs to be very rapid. EMBL's large scale genome analysis was particularly impressive handling 100,000s of jobs auto-provisioned on 50 node clusters. GlusterFS was used for shared file system and StarCluster from MIT for processing. This manages provisioning of images and setting up of a cluster with capability for fluctuating workloads. It also integrates with Sun Grid Engine. EMBL also had a nice user interface.

It was revealing hearing some feedback on the supplier-side perspective to the flagships. The technology stacks in use by the cloud providers included:

StratusLab OpenNebula with KVM hypervisor
Zimory with VMware and vCloud
Abiquo with ESX and KVM hypervisors

Customers were presented with four different interfaces re-emphasising the need for federation layer. There's a need to find a means to share VMs between different vendor flavours. Other issues that came up: the differing accounting models between suppliers, the need for logic to handle workload assignment and scaling between clouds and perhaps most interesting: revenues were small for suppliers and the need to convince management and stakeholders of the longer term benefits and opportunities.

For future work, the Tech-Arch carried out some analysis of candidate technologies picking out SlipStream (Open Source, European) and enStratus (proprietary, US) from a list which also included OpenNebula and BonFIRE.

More Flagships are in the on the way so it will be interesting to follow:

Monday 3 September 2012

ndg_oauth for PyPI

After a long silence this is a catch up on things OAuth-related. Firstly, a follow up to announce that ndg_oauth is now finally moved to PyPI for its first 'official' release. There are separate client and server packages. The latter containing the functionality for both authorisation and resource server:

http://pypi.python.org/pypi/ndg-oauth-client/
http://pypi.python.org/pypi/ndg-oauth-server/

Before any more on that though I should really say something about the recent news with OAuth 2.0, something that colleagues nudged me about. It's disappointing but then maybe signs were there. When I first became familiar with the draft for 2.0, it looked like it would be easier to implement and it would meet a broader set of use cases. Two things appealed particularly: the simplification of the message flow and the use of transport layer instead of message-level security. Having had experience wrestling with different implementations of WS-Security to get them to interoperate and particularly XMLSec digital signature, it was relief to not to have to deal with that in this case. Also, transport layer security is established in Grid security and federated identity worlds and so is a known quantity.

All the same, the reported problems with OAuth 1.0 and signatures seems a little surprising. Consistent canonicalisation of a HTTP header for OAuth 1.0 seems a much easier prospect vs. XML signature. It hardly seems such a difficult requirement to implement. Message level security also provides some protection for the redirects in use: securing at the transport layer will give me secure channels between user agent and authorisation server and user agent and OAuth client but the content can still be tampered in between these interactions at the user agent itself.

Our use of OAuth has definitely been taking it beyond the simple web-based use cases that I've seen for 1.0. When I say 'our' I'm thinking of its application for Contrail and some interest shown by others in the academic research community. In our case, we can exploit version 2.0 to address the multi-hop delegation required to fit together the various actors in the layered architecture for Contrail or it could be applied to manage delegation across multiple organisations in a federated infrastructure like ESGF. These cases need support for non-browser based clients and for actions executed asynchronously, removed from direct user interactions. These are in effect enterprise use cases for the academic communities involved.

As I looked over the draft specification with colleague Richard and he started an implementation it became clear that there was opening of possibilities to support many use cases but with it a concern that when it came to an implementation there were gaps to fill in and choices to be made about how best to implement. It was clear from that it would need careful profiling if it was to be of wider use in our communities but this is familiar ground in geospatial informatics and I'm sure elsewhere to.

A consensus is badly needed but I think that's practicable and can be a process. By making an implementation, and working with other collaborators we are making the first steps towards standardising a version 2.0 profile for use with a Short-Lived Credential Service. This was the original motivating driver - to provide a means for delegation of user X.509 certificates. So returning to the ndg_oauth implementation, it aims to fit this purpose but at the same time provide a baseline of more generic functionality that others can extend and build on. More to follow! ...

Tuesday 24 April 2012

OAuth Service Discovery

This picking up from where I left off in any earlier post about OAuth. To recap, this fronts a short-lived credential service with an OAuth 2.0 interface. This is now being integrated into Contrail described in this paper just submitted following ISGC. There are few more design considerations to give some thought to though for its wider deployment. As well as use for managing delegation around entities in a cloud federation with Contrail, we are applying this to CEDA's applications and web services. We have a full Python implementation of both client and server side for OAuth 2.0 which I hope to push out to PyPI soon. We're using this to secure our CEDA OGC Web Processing Service and and Web Map Service. Here though, I want to focus on how we might integrate with federated identity management system for the Earth System Grid Federation (ESGF). This is foremost given our involvement with CMIP5.

It's firstly a question of service discovery, something that need not be addressed for Contrail. The Contrail system consists of federation layer which abstracts a set of underlying cloud provider interfaces. Users can manage resources from each of these collectively through the federation's interface. The access control system manages identity at the federation level. A user authenticates with a single federation IdP. When it comes to the OAuth delegation interface, there are single OAuth Authorisation server and Resource server instances associated with the IdP.

Now consider, ESGF. There are many IdPs integrated into the federation, so how does the OAuth infrastructure map to this? Imagine, for example a Web Processing Service hosted at CEDA. A user invokes a process. This process will access secured data from an OPeNDAP service. The WPS will therefore need a delegated credential. When the user requests the process then, the WPS triggers the OAuth flow redirecting their browser to an OAuth Server but how does the WPS know where that service is? A simple approach would be to configure the WPS with an endpoint in its start up settings. This server could be:

a global one for the whole federation – clearly this won't scale given the deployment across many organisations
Associated with the given Service Provider – in this case CEDA
Associated with the user's IdP

2. could work but now consider the other steps in the flow: the user is prompted to authenticate with the OAuth server and approve the delegation i.e. allow the WPS to get a credential on their behalf. In a federated system, the user could be from anywhere, they will need to sign in with their IdP. In this case then, the user enters their OpenID at the OAuth Authorisation server. Already, this is sounding complicated and potentially confusing for the user: they've already been redirected away from the WPS to a central OAuth Authorisation server at CEDA, now they will be redirected again to their IdP probably somewhere else.

Also, consider the approval process. Rather then having to approve the delegation every single time the user invokes the WPS process, they would like the system to remember their approval decision. The OAuth authorisation server could record a set of user approvals associated with a profile stored their. However, if we manage approvals at CEDA, this information will only be scoped within the bounds of CEDA's services. If the user now goes to use services at another service provider, they will need to record a fresh set of delegation approvals. Not a smooth user experience.

Turning to 3., this could scale better. All approval decisions would be centralised with the OAuth Authorisation server associated with their IdP. However, there is now a service discovery problem. Looking at the protocol flow again, the WPS does not know the location of the user's OAuth Authorisation server endpoint. Given ESGF's already established use of OpenID, an obvious solution is to leverage Yadis. During OpenID sign in, the Relying Party HTTP GETs the user's OpenID URL returning an XRDS document. This contains the service endpoint for the OpenID Provider. This has already been extended for ESGF to include the MyProxy and Attribute Service endpoints. I'm told that there's some debate in the community about choice of discovery technology with SWD (Simple Web Discovery) as an alternative. Given ESGF's legacy with XRD it makes logical sense to add in the address for the OAuth Authorisation server as an entry into the XRDS document returned.


<?xml version="1.0"; encoding="UTF-8"?>
<xrds:XRDS xmlns:xrds="xri://$xrds" xmlns="xri://$xrd*($v*2.0)">
  <XRD>
    <Service priority="0">
      <Type>http://specs.openid.net/auth/2.0/signon</Type>
      <Type>http://openid.net/signon/1.0</Type>
      <URI>https://openid.provider.somewhere.ac.uk</URI>
      <LocalID>https://somewhere.ac.uk/openid/PJKershaw</LocalID>
    </Service>
    <Service priority="1">
      <Type>urn:esg:security:myproxy-service</Type>
      <URI>socket://myproxy-server.somewhere.ac.uk:7512</URI>
      <LocalID>https://somewhere.ac.uk/openid/PJKershaw</LocalID>
    </Service>
    <Service priority="10">
      <Type>urn:esg:security:oauth-authorisation-server</Type>
      <URI>https://oauth-authorisation-server.somewhere.ac.uk</URI>
      <LocalID>https://somewhere.ac.uk/openid/PJKershaw</LocalID>
    </Service>
    <Service priority="20">
      <Type>urn:esg:security:attribute-service</Type>
      <URI>https://attributeservice.somewhere.ac.uk</URI>
      <LocalID>https://somewhere.ac.uk/openid/PJKershaw</LocalID>
    </Service>
  </XRD>
</xrds:XRDS>

Bringing it together, the delegation flow would start with the Service Provider present an interface to enable the user to select their IdP. This could be by entering a full OpenID URL or preferably picking one from a list of trusted IdPs. The SP could then GET the URL and extract the OAuth Authorisation Server endpoint from the XRDS document returned. From there, the standard OAuth flow would proceed as before.

Thursday 9 February 2012

New Python HTTPS Client

I've added a new HTTPS client to the ndg_* list of packages in PyPI. ndg_httpsclient as it's become has been on a todo list for a long time. I've wanted to make use of all the features PyOpenSSL offers with convenience of use wrapping it in the standard httplib and urllib2 interfaces.

Here's a simple example using the urllib2 interface. First create an SSL context to set verification of the peer:

>>> from OpenSSL import SSL
>>> ctx = SSL.Context(SSL.SSLv3_METHOD)
>>> verify_callback = lambda conn, x509, errnum, errdepth, preverify_ok: preverify_ok 
>>> ctx.set_verify(SSL.VERIFY_PEER, verify_callback)
>>> ctx.load_verify_locations(None, './cacerts')

Create an opener adding in the context object and GET the URL. The custom build opener adds in a new PyOpenSSL based HTTPSContextHandler.

>>> from ndg.httpsclient.urllib2_build_opener import build_opener
>>> opener = build_opener(ssl_context=ctx)
>>> res = opener.open('https://localhost/')
>>> print res.read()

The above verify callback above is just a placeholder. For more a comprehensive implementation ndg_httpsclient includes a callback with support for checking of the peer FQDN against the subjectAltName in the certificate. If subjectAltName is absent, it defaults to an attempted match against the certificate subject CommonName.

The callback is implemented as a class which is a callable. This means that you can instantiate it, configuring the required settings and then pass the resulting object direct to the context's set_verify:

>>> from ndg.httpsclient.ssl_peer_verification import ServerSSLCertVerification
>>> verify_callback = ServerSSLCertVerification(hostname='localhost')

To get the subjectAltName support I needed pyasn1 with some help from this query to correctly parse the relevant certificate extension. So adding this into the context creation steps above:

>>> from OpenSSL import SSL
>>> ctx = SSL.Context(SSL.SSLv3_METHOD)
>>> verify_callback = ServerSSLCertVerification(hostname='localhost')
>>> ctx.set_verify(SSL.VERIFY_PEER, verify_callback)
>>> ctx.load_verify_locations(None, './cacerts')

The package will work without pyasn1 but then you loose the subjectAltName support. Warning messages will flag this up. I can pass this context object to the urllib2 style opener as before, or using the httplib interface:

>>> from ndg.httpsclient.https import HTTPSConnection
>>> conn = HTTPSConnection('localhost', port=4443, ssl_context=ctx)
>>> conn.connect()
>>> conn.request('GET', '/')
>>> resp = conn.getresponse()
>>> resp.read()

A big thank you to Richard for his help getting this package written and ready for use. Amongst other things he's added a suite of convenience wrapper functions and a command line script.

Wednesday 23 November 2011

OAuth MyProxy Mash up

This is a revisit of OAuth from two perspectives, the first in that it was under consideration for the MashMyData project and the second, because this is looking at using OAuth with MyProxyCA, something already exploited with the CILogon project.

MashMyData explored solutions for delegation in a secured workflow involving a portal, an OGC Web Processing Service and OPeNDAP. The aim was where possible, to re-use existing components a large part of which was the federated access control infrastructure developed for the Earth System Grid Federation. ESGF supports single sign-on with PKI short-lived credentials but we had no use case requiring user delegation. When it came to MashMyData the traditional solution with Proxy certificates won out. So why bother revisiting OAuth?

OAuth 2.0 simplifies the protocol flow compared with OAuth 1.0. Where OAuth 1.0 used signed artifacts in the HTTP headers, with OAuth 2.0, message confidentiality and integrity can be taken care of with SSL.
Custom SSL verification code is required for consumers to correctly verify certificate chains involving proxy certificates. We can dispense with this as OAuth uses an independent mechanism for delegation.

There are a number of different ways you could apply OAuth. Here we're looking at MyProxyCA, a short-lived credential service (SLCS). Usually, a user would make a request to the service for a new credential (X.509 certificate) and authenticate with username/password. Applying OAuth, the user gives permission to another entity to retrieve a credential representing them on their behalf.

There's a progression to this solution as follows:

MyProxyCA alters the standard MyProxy to enable certificates to be dynamically issued from a CA configured with the service. User's input credentials (username/password) are authenticated against a PAM or SASL plugin. This could link to a user database or some other mechanism. MyProxyCA issues EECs.
Front MyProxyCA with a web service interface and you gain all the benefits - tools, middleware, support and widespread use of HTTP.
If you have a HTTP interface you can easily front it with OAuth

2) Has been developed for CILogon and there is also MyProxyWebService. This is how 3) could look:

The user agent could be a browser or command line client or some other HTTP rich client application. The OAuth 2.0 Authorisation Server is implemented in this case as middleware fronting the online CA. The user must authenticate with this so that the online CA can ensure that this user has the right to delegate to the client. I would envisage it also keeping a record of previously approved clients for the user so that approvals can be made without user intervention should they wish. I like this ability for the user to manage which clients may be delegated to offline of the process of brokering credentials.

Once the client has obtained an authorisation grant, it generates a key pair and puts the public key in a certificate request to despatch to the online CA. The latter verifies the authorisation grant and returns an OAuth access token in the form of a new user certificate delegated to the client. The client can then use this in requests to other services where they need to act on behalf of the user. A big thank you to Jim Basney for his feedback in iterating towards this solution.

There is the question remaining of the provenance of credentials: how do I distinguish this delegated certificate to one obtain directly by the user? With proxy certificates you at least have the chain of trust going back to the original user certificate issuer and the CN component of the certificate subject name added for each delegation step. For ESGF, we added in a SAML attribute assertion into the certificates extension for issued EECs. There's a profile for this with VOMS which we hope to adopt for Contrail and the same idea has been used elsewhere. I'd like to be able to express provenance information in this same space. I would think it would likewise be as an attribute assertion but it needs some more thought. The certificate extension content could be extended with additional SAML to meet a couple of other objectives:

add an authorisation decision statement to restrict the scope of resources that the issued certificate is allowed to access
add level of assurance information via a SAML Authentication Context.

Thursday 20 October 2011

Federated Clouds

Imagine the ability to seamlessly manage independent resources on multiple cloud providers with a single interface. There are some immediate benefits to consider: avoiding vendor lock-in, migration of a resource from one cloud to another, replication of data ...

You might be excused for thinking it's a little ambitious but a colleague on the Contrail project drew my attention to this article on Cloud Brokering. As Lorenzo said, you don't have to pay for the full article to get the jist but it seems from a rudimentary search that there are a number of commercial products already ventured into this area:

http://www.datacenterknowledge.com/archives/2009/07/27/cloud-brokers-the-next-big-opportunity/

Federating clouds is a core objective of Contrail, and from what I heard at the Internet of Services meeting I attended last month, there's plenty of research interest in this topic. Picking out some points raised in the discussions (with some of my own thoughts mixed in):

the importance of federating clouds for the European model. Cloud infrastructures deployed in smaller member states can't match the resources available to the large US enterprises but if those smaller infrastructures are joined in a federation their resources can be pooled to make something greater.
Standards are essential for federated clouds to succeed (an obvious point really) but that existing standards such as OVF and OCCI provide incomplete coverage of what is needed across the spectrum of cloud architecture related concerns.
The problem of funding and continuity of work, general to many research efforts but cloud technology by its nature surely needs a long term strategy for it to flourish.
The need for longer term research goals with 10-15 year gestation, short-term goals will be absorbed by commercial companies. There's a danger of simply following rather than leading.

So on the last point then, it's all right to be ambitious ;)

Friday 7 October 2011

Federated Identity Workshop Coming up...

This is a plug for a workshop on Federated Identity Management for Scientific Collaborations coming at RAL 2-3 November:

http://indico.cern.ch/conferenceDisplay.py?ovw=True&confId=157486

It's a follows the first of it's kind held earlier this year held at CERN and brought together experts in the field and representatives from a range of different scientific communities to present the state of play for federated identity management in each of the various fields and draw together a roadmap for future development.   See the minutes for a full account.

Picking out just a few themes that were of interest to me: inter-federation trust came up a number of times and the need for services to translate credentials from one domain to another. I read that as a healthy sign that a) various federated identity management systems have bed down and become established and b), that there is not a fight of competing security technologies for one to take over all, rather a facing up to realities of how can we make it work so that these co-exist along side each other.

Credential translation brings in another two interesting issues: provenance and levels of assurance that actually also arose independently in some of the talks and discussions. If I have a credential that is as a result of a translation of another credential from a different domain how much information is transferred between the two, is it lossy are the identity concepts and various attributes semantically the same?   The same issues arise perhaps to a lesser degree with delegation technologies.

Levels of assurance is another issue that is surely going to crop up more and more as different authentication mechanisms are mixed together in systems: the same user can enter a federated system with different methods how do we ensure that they are assigned access rights accordingly.   Some complicated issues to tackle but the fact that they can begin to be addressed shows the progress that has been made building on the foundations of established federated systems.