Philip Kershaw: 2009

Thursday 6 August 2009

WSGI Architecture for NERC DataGrid Security

Over the past year I've been porting the Python based security system for NERC DataGrid to a WSGI based architecture. This is paying huge dividends in terms of the modularity of the code and ability to apply flexible deployment configurations especially when used together with Paste Deploy.

Key parts to the architecture are the authentication and authorisation handlers triggered from the respective 401 and 403 HTTP response codes together with a URI based access control policy. I've used the Python security middleware package AuthKit to help me put this together. One thing I've been meaning to do is to lay this out in a simple example. This first snippet gives an overview:


app = AuthorisationPolicyMiddleware(myApp)

app = MultiHandler(app)
app.add_method("checkerID", AuthenticationHandlerMiddleware)
app.add_checker("checkerID", AuthenticationHandlerMiddleware.trigger)

app = MultiHandler(app)
app.add_method("checkerID", AuthorisationHandlerMiddleware)
app.add_checker("checkerID", AuthorisationHandlerMiddleware.trigger)

from paste.httpserver import serve
from paste.deploy import loadapp

serve(app, host='0.0.0.0', port=9080)

The application to be protected is defined in a WSGI elsewhere. This is wrapped in a number of pieces of middleware chained together to form a pipeline to intercept requests to the application. On the last line it is served using Paste.

The first middleware component listed, AuthorisationPolicyMiddleware, checks the requested URI against a policy. If the user is not authorised, it sets a HTTP "403 Forbidden" response bypassing myApp.

Following this, there are two pieces of middleware making use of AuthKit's authkit.authenticate.multi.Multihandler. The MultiHandler accepts two key inputs: a checker function which determines the criteria for intercepting a request, and a method, a WSGI middleware to determine what action to take once an intercept has been made.

In the first case, a class method AuthenticationHandlerMiddleware.trigger has been defined to intercept HTTP "401 Unauthorized" status codes. The AuthenticationHandlerMiddleware itself determines the action taken:

class AuthenticationHandlerMiddleware(object):
    """Handler for HTTP 401 Unauthorized responses"""

    triggerStatus = "401 Unauthorized"

    def __init__(self, global_conf, **app_conf):
        pass

   def __call__(self, environ, start_response):
        log.info("AuthenticationHandlerMiddleware access denied response ...")
        response = "HTTP 401 Unauthorised response intercepted"
        start_response('200 OK', [('Content-type', 'text/plain'),
                                  ('Content-length', str(len(response)))])
        return [response]

    @classmethod
    def trigger(cls, environ, status, headers):
        if status == cls.triggerStatus:
          log.info("Authentication Trigger caught status [%s]",
                   cls.triggerStatus)
          return True
       else:
          return False

In the above, the middleware simply outputs a message but it effectively provides a hook to trigger a login or other authentication interface.

A second Multihandler is in place to handle HTTP "403 Forbidden" responses. This follows a similar pattern:


class AuthorisationHandlerMiddleware(object):
    """Handler for HTTP 403 Forbidden responses"""

    triggerStatus = "403 Forbidden"

    def __init__(self, global_conf, **app_conf):
       pass

    def __call__(self, environ, start_response):
       log.info("AuthorisationHandlerMiddleware access denied response ...")
       response = "HTTP 403 Forbidden response intercepted"
       start_response('200 OK', [('Content-type', 'text/plain'),
                                 ('Content-length', str(len(response)))])
       return [response]

    @classmethod
    def trigger(cls, environ, status, headers):
       if status == cls.triggerStatus:
           log.info("Authorisation Trigger caught status [%s]",
                    cls.triggerStatus)
           return True
       else:
           return False

The trigger method sets a True response to signal to the Multihandler to intercept the request and invoke AuthorizationMiddleware to deliver an access denied message.

This next snippet shows myApp effectively a test harness to demonstrate the middleware behaviour:

def myApp(environ, start_response):
    """Test application to be secured"""

    if environ['PATH_INFO'] == "/test_401":
       status = "401 Unauthorized"
       response = status

   elif environ['PATH_INFO'] == "/test_403":
       status = "403 Forbidden"
       response = status

   elif environ['PATH_INFO'] == "/secured":
       status = "200 OK"
       response = "Secured URI"

   else:
       status = "404 Not Found"
       response = status

   log.info("Application is setting [%s] response..." % status)
   start_response(status,
                  [('Content-type', 'text/plain'),
                   ('Content-length', str(len(response)))])

   return [response]

As set-up above,

http://localhost:9080/test_401 will trigger the authentication middleware and
http://localhost:9080/test_403 the authorisation middleware.
The last, http://localhost:9080/test_secured, demonstrates the access control policy implemented in AuthorisationPolicyMiddleware:

class AuthorisationPolicyMiddleware(object):
   """Apply a security policy based on the URI requested"""

   def __init__(self, app):
       self.securedURIs = ['/test_secured']
       self.app = app

   def __call__(self, environ, start_response):
       if environ['PATH_INFO'] in self.securedURIs:
           log.info("Path [%s] is restricted by the Authorisation policy" %
                    environ['PATH_INFO'])
           status = "403 Forbidden"
           response = status
           start_response(status, [('Content-type', 'text/plain'),
                                   ('Content-length', str(len(response)))])
           return [response]
       else:
           return self.app(environ, start_response)

The middleware has a policy consisting of a list of URIs to be secured in the securedURIs attribute. In practice this could link to a policy file, database link or some other interface. The __call__ method intercepts request URIs which match the policy and invokes a HTTP 403 response. This in turn brings into play the AuthorisationMiddleware handler triggering it to return an access denied response.

The complete example is in the NERC DataGrid SubVersion repository.

Monday 3 August 2009

Python List Utility Classes

I've been adapting some Java code to Python recently and wanted some tighter control over list elements than the default list type. I've extended list with two custom classes. The first, TypedList restricts list elements to a given type or types e.g.

>>> t=TypedList(float)
>>> t+=[9]
Traceback (most recent call last):
File "", line 1, in 
File "", line 34, in __iadd__
TypeError: List items must be of type float

The existing array type gives similar capability but with this you can put in any type ...


class TypedList(list):
    """Extend list type to enabled only items of a given type.  Supports
    any type where the array type in the Standard Library is restricted to
    only limited set of primitive types
    """

    def __init__(self, elementType, *arg, **kw):
        """@type elementType: type/tuple
        @param elementType: object type or types which the list is allowed to
        contain.  If more than one type, pass as a tuple
        """
        self.__elementType = elementType
        super(TypedList, self).__init__(*arg, **kw)

    def _getElementType(self):
        return self.__elementType

    elementType = property(fget=_getElementType,
                           doc="The allowed type or types for list elements")

    def extend(self, iter):
        for i in iter:
            if not isinstance(i, self.__elementType):
                raise TypeError("List items must be of type %s" %
                                (self.__elementType,))

        return super(TypedList, self).extend(iter)

    def __iadd__(self, iter):
        for i in iter:
            if not isinstance(i, self.__elementType):
                raise TypeError("List items must be of type %s" %
                                (self.__elementType,))

        return super(TypedList, self).__iadd__(iter)

    def append(self, item):
        if not isinstance(item, self.__elementType):
            raise TypeError("List items must be of type %s" %
                            (self.__elementType,))

        return super(TypedList, self).append(item)

For the second class I wanted a way of avoiding the addition of duplicate elements to a list:


>>> u=UniqList()
>>> u.append('a')
>>> u
['a']
>>> u.append('a')
>>> u
['a']

It silently ignores the duplicate element. It would be straightforward to alter to raise an exception if this was the preferred behaviour. Here's the class:


class UniqList(list):
    """Extended version of list type to enable a list with unique items.
    If an item is added that is already present then it is silently omitted
    from the list
    """
    def extend(self, iter):
        return super(UniqList, self).extend([i for i in iter if i not in self])

    def __iadd__(self, iter):
        return super(UniqList, self).__iadd__([i for i in iter
                                               if i not in self])

    def append(self, item):
        for i in self:
            if i == item:
                return None

        return super(UniqList, self).append(item)

Thursday 23 July 2009

PyDev, PyLint and Refactoring

I've been using PyDev the Eclipse plugin for Python for some time now and it's certainly improving with each new release (I'm currently on 1.4.2). There are a couple of features I've been getting some benefit from recently.

With PyLint, I actually broke my Eclipse set-up for this a while ago but when I got it reinstated it's reminded me of how useful it is. One thing I hate about about interpreted languages is the huge scope for runtime errors. The PyLint plugin gives immediate feedback in the margin of the editor window with error information and hints. This is saving me a lot of time down the line. What's the metric for time spent fixing a bug immediately vs. at the other end of the scale trying to pick through code on a production deployment? :) Picking up references to undeclared variables is particularly useful. Especially in error blocks or places where test coverage might miss out. There's no doubt some messages can be annoying and getting a mark out of 10 for your code might not appeal to all. Fortunately, you can edit the settings for the Pylint command line that's executed and explicitly filter out warnings you don't like. The second feature I've revisited is the PyDev refactoring options. I have had a low expectation of these perhaps unfairly but there a couple of potential time savers. Take this simple class:



class MyTest(object):
    def doSomething(self):
        self.__a = None
        self.__b = None
        self.__c = []

Left like this PyLint will warn me that I haven't set up these attributes in an __init__ method. ;) PyDev refactoring to the rescue :) ... If I now right click in the editor and pick "Refactoring" -> "Generate Constructor using Fields ...". I'm taken through a series of steps in a dialog. I can get,


class MyTest(object):
    def __init__(self, a, b, c):
        self.__a = a
        self.__b = b
        self.__c = c

    def doSomething(self):
        self.__a = None
        self.__b = None
        self.__c = []

Not that exciting but could save some typing. More useful for me is the "Generate Properties..." to generate properties for new style classes together with their getters and setters:


class MyTest(object):
    def __init__(self, a, b, c):
        self.__a = a
        self.__b = b
        self.__c = c

    def getA(self):
        return self.__a

    def getB(self):
        return self.__b

    def getC(self):
        return self.__c

    def setA(self, value):
        self.__a = value

    def setB(self, value):
        self.__b = value

    def setC(self, value):
        self.__c = value

    def delA(self):
        del self.__a

    def delB(self):
        del self.__b

    def delC(self):
        del self.__c

    def doSomething(self):
        self.__a = None
        self.__b = None
        self.__c = []

    a = property(getA, setA, delA, "A's Docstring")

    b = property(getB, setB, delB, "B's Docstring")

    c = property(getC, setC, delC, "C's Docstring")

It could be improved on - there's no explicit keywords inputs for property() - but it could save some time with the boiler plate. :)

Philip Kershaw