Friday, 10 December 2010

Mash My Security for MashMyData

MashMyData is underway.  This proof of concept project is exploring the provision of an online environment for scientific users to upload their data and intercompare it with environmental datasets gathered from a variety of independent data services.   These services secure access to datasets and so the management of authentication/authorisation credentials across potentially multiple hops between services provides a significant challenge.   For a project where data mash up is the primary focus, this involves a fair quantity of security related mash up too.

A recent code sprint with project partners has brought into sharp focus how we can address and implement our use case in a short space of time.   There's much to tell of the implementation details but for now this more high level overview of the use case...

A MashMyData Portal provides the user environment for data mash up.  It supports OpenID based single sign on enabling authentication within the scope of the portal but the portal itself must broker access to other services on the user's behalf.    An initial study investigated both OAuth and the classic Grid based solution of proxy certificates.   I'm keen to explore OAuth more extensively but surprisingly for me at least, the latter was easier to realise within the scope of the code sprint.  This is due in no small part to the fact that it was given something of a head start:  MashMyData leverages the existing security system developed for Earth System Grid.  In this, services support both OpenID and PKI based authentication.   The latter fits nicely with the paradigm of individual short term user certificates used in the Grid world.

At this point though it's worth taking a step back to look at how OpenID might fit in this scenario.   Some considerable time was spent in the development of the ESG security architecture on this:  you could argue the case for OpenID approach for authentication of the user by the portal at the secondary service.  By it's nature though it's unsuited in a case where the client is not a browser especially when you consider that any given OpenID Provider can impose an number of steps in its user interface and still adhere to the interface with OpenID Relying Party.   This makes it difficult to manage in our case here with a programmatic interface where there is no user interaction.

Back to the PKI based approach then.  Each ESG Identity Provider deploys a MyProxy service to enable users to obtain short term credentials but for the MashMyData portal, the user has already authenticated via OpenID.  We don't wish them to have to sign in again via MyProxy.  We can however, translate their signed in status and issue a short term certificate.  This is something that has already been employed with projects like SARoNGS and with the system devised for federated login to TeraGrid.  The diagram below shows how the MyProxy can be employed:

The user signs in at the MashMyData portal and invokes CEDA's (Centre for Environmental Data Archival) Python based WPS (OGC Web Processing Service) implementation to execute a job.  The WPS requires authentication so the portal calls the Credential Translation Service to obtain a short term certificate to represent the user and authenticate at the WPS.   [I'm leaving authorisation out of this for simplicity. - Authorisation is enforced on all services].  The translation service is in fact a MyProxy service configured with a CA.   For the purposes of the MashMyData demonstrator certain CEDA services have been configured to trust this CA.

Usually in this mode, the MyProxyCA responds to MyProxy logon requests by authenticating based on the input username/password against a given PAM service module.  The PAM might for example link to a user database.  In this case however a custom PAM accepts the users OpenID and the MyProxy service 'authenticates' against this alone and returns an End Entity Certificate back to the portal.  The portal can then use this in its request to the WPS.  The obvious question here is, given such a permissive policy what is to stop anyone requesting as many credentials as they like?!    However, only the portal needs access to this service, so access can be restricted to it alone.

Next, the job at the WPS itself needs to retrieve data from CEDA's Python based OPeNDAP service, PyDAP.   The portal pre-empts this step by priming a second MyProxy server with a delegated user credential which the WPS can then retrieve.   This second MyProxy server is configured in an alternative more conventional mode for MyProxy in which it acts as a repository for short term credentials.  The portal then, adds a new credential to this repository so that it can be available for the WPS or any other service which has been allocated retrieval rights.   In this process - a put request on the part of the portal, makes the MyProxy server create a new key pair and return a certificate signing request in return.  The Portal signs this using the user certificate previously obtained and the signed proxy is uploaded to the MyProxy server's repository.

With this in place, the WPS can execute a MyProxy logon to obtain a proxy certificate for use to authenticate with the PyDAP service.   In fact, any number of services can be configured in a chain.  Some interesting remaining issues to consider:
  1. Services must be able to consume proxy certificates.  This needs special configuration, something I've discussed previously.  
  2. The MyProxy delegation service has a static configuration determining which services are authorised to retrieve user proxy credentials.   On this point and the previous an OAuth based solution might provide a better alternative plus you might throw away proxy certificates altogether which would remove an SSL configuration overhead.
  3. How do services discover the MyProxy service endpoint in order to know where to get delegated credentials from?  For the moment this is a static configuration but there could be a way of passing this information from the client.  Another alternative could be to add this information to the OpenID Provider's Yadis document so that it can be discovered by invoking a HTTP GET on the user's OpenID URL.  Extending the Yadis document has already been exploited with ESG but implicit is the ability for OpenID Providers to include this customisation.   This would obviously break interoperability with vanilla OpenID Providers.
The code sprint has provided a great demonstration of the flexibility of MyProxy and the ease with which something can be assembled quickly to meet the given use case.

No comments: