Friday 23 March 2007

Web Services and securing access to large datasets

One of the challenges for the DEWS project has been transfer of large datasets using web services. This is a well known problem in the web services world but is complicated in this case by how we secure the transfer. The conflict in our case comes here:

  • For DEWS we want to use standards where possible. So for securing web services this has been interpreted as use SOAP with WS-Security. We would like to use digital signature where possible. [Message confidentiality may be required but is likely to be impracticable for transfer of large datasets for performance reasons. - We want to be able to authenticate clients downloading data but we accept we can't stop an attacker listening on a port as data is accessed. This may sound weak but at least if clients are authenticated it means an attacker can't initiate retrieval of a dataset of their choosing. They can only listen out for an authorised users request.]
  • ... but we're using the Geoserver (implementation of OGC Services) to serve some of our datasets. It doesn't use SOAP (uses a more RESTful style interface) and it's not secured.
To address this we've created a Gatekeeper web service. This acts as a proxy for Geoserver protecting it by applying security constraints to requests before passing on through valid ones to Geoserver.  We've built the Gatekeeper using a standard WebSphere based web service. It uses WS-Security to give us digital signature and can be run over https to give confidentiality where necessary.

Transfer of a large binary dataset is fine if you leave out the Gatekeeper and our adoption of WS-Security(!). - Geoserver receives a http request by GET or POST and returns the binary data back along the same channel setting the data type so that the client will understand.

Now with the Gatekeeper and a SOAP interface placed between Geoserver and the client it's not as straightforward. Tooling like WebSphere generates client and server stubs that expect SOAP in - SOAP out not SOAP in binary data out as we would want.

Unfortunately we've needed some decisions quickly to this problem. The options we've considered are:

  • SOAP in binary data out: adapt interface to return binary data back instead of a SOAP response.
  • Use SOAP attachments
  • Do transfer out of band of the web service - SOAP response returns a data download uri for later retrieval by the client.

Taking them in Turn...

SOAP in Binary Data out

Tooling for SOAP based web services creates interfaces that expect SOAP not binary data. To go with this we would need some way to modify the server output and client response. This can be considered a black box for WebSphere and I suspect for other alternatives such as Axis.  Even if we got it working it's not a standards based solution.

SOAP Attachments

This keeps us standards based but which technology do we adopt? MTOM seems the one to go for but no one on the technical team has much experience with this or any other alternative for SOAP attachments? ... something else?
More importantly can it cope with 120Mb datasets(!) - probably not but maybe we could divide the dataset into smaller chunks. Also, how do we handle error recovery? Is there a built in mechanism? Taking both these things into account implies some intelligence on the part of the client and so a more heavyweight solution on the client side. This is something we want to avoid. The client should be as simple as possible so that potential users of the system have minimum overhead to get set up and connected.

We definitely need to know more about the limitations of this solution when applied to large datasets.

Out of Band Transfer

Handle the transfer of the binary data outside the bounds of the web service by returning to the client a URI to the dataset.

The advantages are that it's a simple solution and there are existing tools to handle download and error recovery.  Immediately there are security concerns with this though. An attacker could snoop the URI of the dataset returned by the Gatekeeper. This first problem can easily be addressed by encrypting the channel using https. This still leaves the main problem with this idea: how does the server authenticate the client making a request to the URI? If it can't do this anyone can download the data.

Added to this is the fact that the preferred method of retrieval from Geoserver in our case is GET rather POST. GET is preferred because it will be more straightforward for the client. This decision rules out one possible solution, that is to use XML security to sign the POST message. Besides if we went with this, why don't we dispense with the Gatekeeper's SOAP interface and use a REST style one? ... but we've already said that we want to stick to standards where possible and WS-Security is what we've picked here.

That leaves the problem of trying to secure a GET request. One alternative we considered was authentication by client IP address but this quickly runs into problems when we considered that some clients can't guarantee a consistent IP address or operate behind a site proxy.

Another approach is for the client to sign some component of the GET request:
  1. Client makes authorisation request to Gatekeeper over SOAP interface
  2. Gatekeeper checks and returns the URI for download of the data and a unique security token
  3. Client makes a GET request to the URI sending arguments including the
    security token, a signature of the token and the Distinguished Name of the X.509 certificate to be used to verify the signature. (We should perhaps also include the name of the signature algorithm)
  4. The server checks that the security token is valid
  5. The server checks the DN against the certificates held in a key store.
  6. If recognised it verifies the signature using the public key from the certificate -> if verified, it knows the request has come from a valid source.
All get args are URL-safe base 64 encoded.

I've tried this out with a simple Python test script, the production code will be Java. If we implement the server side with a servlet I'm told we can put extra security constraints - a time limit on it duration and we can log when download starts and stops.

This is not a standards based solution, it's a custom one but then an initial look at Google Authentication has given me reason to believe it uses something similar:

http://code.google.com/apis/accounts/AuthForWebApps.html

Besides this, all the security steps up to this point, attribute retrieval, role mapping and authorization are achieved over SOAP based services secured with WS-Security. Only the final step is different.

The client side will require some coding in order to generate the digital signature but this should be straightforward using the relevant Java OpenSSL library.

This is the chosen solution unless SOAP with attachments proves much, much easier.

No comments: