Saturday 26 January 2013

Helix Nebula, the Science Cloud - Results and User Engagement Meeting


A write up from the notes I took at this meeting last week at ESRIN.  This is my first closer look at this project (http://helix-nebula.eu/) after hearing bits and piece along the way.  It's interesting on a number of levels - application of science use cases to cloud, academic-industrial partnership, federating clouds, service management and the European landscape - policy, legal issues and politics.

Helix Nebula has big ambitions – with the aim to make it the first port of call for cloud provision for the research community by 2020: development as a multi-tenant open market place, the creation of an eco-system to facilitate growth for both science and industry.  Science users are the initial target with use cases but this is fully expected to expanded to a wider user base.  One member of the consortium noted the increased attendance at meetings over the course of the project as an encouraging sign and there has been a growth in membership for 20 to 34 members over the past year.  There are there large players from the science community: CERN, EMBL (European Molecular Biology Laboratory) and ESA.  Collaboration is seen as an essential element to meet the goals drawing from the various stakeholders IT providers, SMEs, Science and Space organisations and EC support and projects.  Some of the commercial companies represented: Atos, CloudSigma, T-Systems, Terradue, Logica. EGI representing cloud provision via the academic community.
I've heard put before the argument for a collective approach as the only means to develop something which is in any way comparable with the establish big cloud providers.   Here the case was put forward, that drawing the collective resources of multiple providers together can build scale. A new one to me also, the aspect of risk sharing.  A federated approach has the potential to build something stronger since it has the advantage of diversity over a single cloud provider with comparable resource. There are also political aspects: EC compliance re. privacy, policy and ethics and the ability to meet these by hosting resources exclusively within the EU's boundaries.

Technical Architecture

The technical architecture has been developed by representatives from the supply-side a Tech-Arch group. This was a little concerning but then I found out that there are actually two, a complementary Serve-Arch group so that the demand side is represented also. It was reassuring to hear that the user interests are represented in this way but could there just have been one group and how does it work in practice having the two? Hearing more about the requirements analysis, it sounded more holistic with user stories created around a set of actors identified from both supply and demand-side.

A service-enabling framework, is the blue-print, a federated system of cloud providers. The so-called 'Blue-Box' (apparently so-called because when first put forward the relevant Powerpoint slides had a default blue scheme!). This is an interface between the many consumers and providers harmonising the interfaces of the various cloud providers. This was all starting to sound very familiar coming from my experience with Contrail and this was confirmed 'opening' the Blue-Box it revealed, users at the 'north'-end an SOA to manage the various federation-level cloud capabilities in the middle and then below this at the 'south'-end APIs to the various cloud provider interfaces.  This system is being constructed via a roadmap involving three releases. This is building capability incrementally using reference technologies as building blocks and identifying low hanging fruit from which to prioritise work.  The federation layer is then a thin one.  I can see major issues to tackle - for example management of SLAs across multiple providers - but the approach at least starts with what is possible.

I was encouraged to hear about this federated approach but coming to the meeting my feeling was that it's surely not in the interests of commercial cloud providers to harmonise their interfaces and make a federation with their other competitors possible. There's further inertia though: in creating a generic interface for a number of different cloud providers, there's a danger that the federation layer generalises or dilutes their individual capabilities. All cloud providers are not made equal and each may have specific capabilities which make them stand out and give them a particular market edge. At worst, a cloud provider might be being shoe-horned into a system that dilutes this or gives a vanilla or sub-optimal exposure of their functionality through the federation layer to the users sitting at the North-end.

This brings me to the input from EGI. I would there is a greater potential to develop a federated system since its strongly in the interest of the academic sector. Matteo Turilli (OeRC) gave an overview of current developments. Private clouds are being deployed within EGI with twenty providers from across Europe. A cloud profile has been developed as a standard and a testbed established aiming to build a federation across the private clouds represented in the community. The heritage of a production Grid is providing an infrastructure upon which to build.  Solutions for federated identity management are already established with Shibboleth and GSI.  I don't see support for such technology amongst commercial cloud providers.  I suppose there is no incentive but it means that at the interface between federation layer and cloud provider, the best the federation can do is cache the user's credentials for each cloud provider and hope the user doesn't mind.  To paraphrase from a social networking site, 'you can trust us with your password, we won't pass it on to anyone else'. ;)

OCCI 1.1 has been adopted for the cloud standard and various cloud solutions are in use: OpenNebula, OpenStack, WNoDes, Okeanos.  IaaS is the initial service model being offered augmented with various services including capability for VM sharing.  Demonstrations of the federated system are carried out every six months with 3 completed to date.   How does this fit in with the Helix Nebula Blue-box? A strategy of a thin federation layer is being adopted to avoid the complex problems that will inevitably arise with deeper integration. There are problems of course. OCCI does not seem to be getting much traction in the commercial community and the perception seemed to be that it does not have the level of granularity in its interface to communicate more detailed user needs. jclouds was mooted as a possible alternative. It too exposes a REST-based interface.

    Flagships

    There were presentations from each of the three flagships: CERN, EMBL and ESA.   I don't have space to go through these in depth here but the presentations are available on the Helix Nebula site.   CERN's use case used results from the ATLAS experiment.   Interesting that processing jobs had low i/o.  Network bandwidth between providers and CERN was not an issue.  ESA's case seemed to be an expansion of the SSEP (Supersites Exploitation Platform) looking at disaster response scenarios where processing needs to be very rapid.   EMBL's large scale genome analysis was particularly impressive handling 100,000s of jobs auto-provisioned on 50 node clusters.  GlusterFS was used for shared file system and StarCluster from MIT for processing.  This manages provisioning of images and setting up of a cluster with capability for fluctuating workloads.  It also integrates with Sun Grid Engine.  EMBL also had a nice user interface.

    It was revealing hearing some feedback on the supplier-side perspective to the flagships.  The technology stacks in use by the cloud providers included:
    • StratusLab OpenNebula with KVM hypervisor
    • Zimory with VMware and vCloud
    • Abiquo with ESX and KVM hypervisors
    Customers were presented with four different interfaces re-emphasising the need for federation layer.   There's a need to find a means to share VMs between different vendor flavours.  Other issues that came up: the differing accounting models between suppliers, the need for logic to handle workload assignment and scaling between clouds and perhaps most interesting: revenues were small for suppliers and the need to convince management and stakeholders of the longer term benefits and opportunities.

    For future work, the Tech-Arch carried out some analysis of candidate technologies picking out SlipStream (Open Source, European) and enStratus (proprietary, US) from a list which also included OpenNebula and BonFIRE.

    More Flagships are in the on the way so it will be interesting to follow: