Friday, April 15, 2011

On Google AppEngine and keeping one's options open

As I've mentioned previously, one thing I've been looking at recently is making sure I had a caching abstraction which would allow me to flick fairly seemlessly between caching implementations.

So far I have working :-

  • Hazelcast (using their distributed Map with ttl support).
  • Memcached (spymemcached client library and xmemcached client library).
  • Google AppEngine Memcached.
[note to self - add links]

Now the last one led me down an interesting road but caused me some pain, as both with hazelcast and memcached of normal proportion, I was able happily enough to have a gets and puts model using 'long' version Ids. Google AppEngine (GAE) memcached though, stuffed that plan, as for some reason they went 'clever' on the API and instead wrap the version identifier in an instance of an interface 'IdentifiableValue'.

Cue Tim going through and having to refactor the API *again* to change over in a similar fashion in order to be able to play nice with this.

Now I know there's going to be a collision with this API later on, and I'm definitely tending down a memcached style route for the caching, maybe eventually Hazelcast will sink quietly into the sunset, and I'll add some batch gets/puts and memcached counter support... but that's for the future, once I've got some way further with getting heavier into seeing what I need to do to get this trainset to scale.

... but this isn't what I mean by keeping my options open in the title of the post.

What I did realise was that firstly while I could build the GAE memcached client library, it would be largely pointless as things stood as you can only really use it if your app is within the GAE environment. Secondly I realised that the same testing of assumptions which shook things out with the caching component could have similar useful effects on the APIs of other components.

and thirdly ... whilst I doubt it'll ever see the light of day in production, if I've built up the app structure in a fashion compatible with GAE, I could actually have a backup destination of GAE for the application.

Oh how clever I felt ... right up until I realised just how many elements were there and needed GAE equivalents implementing... but I got through it, and can now deploy a working front end app to GAE, giving me that tiny bit of extra comfort that I don't have all my eggs in one basket.

There were some interesting elements though worth sharing with the different elements

1. Maven support
Last time I looked at Google Appengine, integrating with maven was clunky to say the least. This has definitely improved, and whilst I did get a little caught out and confused with the SDK installation as a mavenised thing (hint: gae:unpack), needed before doing handy things like uploading a built war project to appengine (hint: gae:update), it made sense in the end.

2. Deployments
The archtecture split up is certainly different due to appengine's restrictions, and because essentially its elephants all the way down (i.e. webapps). What I had intended to split out as a backend webapp splatting out jobs into the job queue using spring-quartz, instead in Appengine fits better as handler REST resources triggered by Appengine cron.xml task definitions in WEB.INF... not necessarily better, or worse, just different. I may come to like it quite a bit, who knows.

Also, I had been heavily tilted towards a parameterized deployment in the style favoured by the likes of Amazon Elastic Beanstalk. Handily enough, that can be largely matched in Appengine by putting environment variables into the appengine.xml file in WEB-INF, but I could see if I really wanted to use it, I'd probably want to do some more work to get these params injected in via the maven build so I could use maven to say 'clean build update -D<some options>' to build and deploy specific environments - right now its hardcoded for a particular env setup in the xml files ... which is naughty, but certainly fixable.

3. Task queues
The Appengine version of queueing service made my head twist and spin significantly. Going from easy enough SQS or JMS queues to generating tasks as URLs which then get integrated by providing the appropriate handler and chugging on that seems simple enough, but caught me out in unexpected ways, particularly with functional testing.

In theory, there's the simplified process for generating DeferredTasks, but it generates a problem itself, in that the tasks run essentially by implementing 'run' by implementing 'DeferredTask' would have to somehow get their service contexts re-injected - turned out for my purposes easier to just take my medicine and implement a handler to hook into a servlet, one way or another.

4. The big daddy - GAE Datastore
I looked hard and squinty eyed at the various wrappers for ORM or whatever around GAE's very bigtable-ish datastore, but in the end found I had the easiest run at it just using the low-level API.

I had looked into this before through the lens of JPA, and got myself all kinds of confused. Perhaps its been the experience of working with SimpleDB for the last year or so, and escaping the evil clutches of SQL, but it all seemed to make a great deal of sense, and the combination of a little bit of transactionality (and bubbling back up to the app if you need to have another go at it due to transaction conflicts), a simple but rather powerful query model, modelled in straight java (I actually prefer this to having to assemble the SimpleDB selects), and the actual underlying schemaless datatastore 'as a service' was a very quick port from SimpleDB, and for the most part felt very 'right'.

So ... quite liking Google's AppEngine right now, and the tooling around it was a bit of a pleasure to work with.

Bet if I REALLY had to use it in production I'd find some fun niggles though :D


Monday, April 4, 2011

What's in your <license>?

Not very long ago, and not very far away, a conversation opened up about checking the licenses used on dependencies, to avoid issues with pulling in dependencies with toxic licensing in relation to commercial work, or at least pulling them in by accident.

Whilst I suspect this is not exactly a 'new' issue, its certainly one which people haven't been particularly inclined to deal with in a systematic way in the past.

Now, as I'm now doing a lot of my work inside the bounding box of structured projects, and as I had indeed seen a dependency pull in with GPLish consequences, I thought I'd look-see if maven could actually be our friend here, and help deal with this systematic issue - heck its dealing with dependency pulls in a systematic way, so why not the license checks too?

Well as it turns out, inside the structure of a maven pom, there is a licenses element, which can contain one (well actually zero but lets not be picky) or more license strings, like this :-

       <name>The Apache Software License, Version 2.0</name>  

Great! I'll just go dig for the maven plugin to validate my project dependencies against their license definitions and bob will, indeed, become my father's brother.

Well, it turns out, not so much.

First of all, it didn't look like there was such a plugin (apart from Apache RAT Maven plugin, but that's specifically to Apache projects) - nevermind, I should be able put something together ... so...

.. I did, and here it is the maven-license-validator-plugin.

I'm not going to bang this drum too hard, as its actually very little code at all - very very little indeed. But it works. Which is nice.

Anyway, that being done, I've come to realise just how much of a fricking mess the whole 'license' element is in in maven.

Take a look at this :-

             <value>Apache License v2</value>  
             <value>Common Public License Version 1.0</value>  
             <value>The Apache Software License, Version 2.0</value>  
             <value>The Apache Software License, Version 2.0</value>  
             <value>Apache Software License - Version 2.0</value>  
             <value>Apache License, Version 2.0</value>  
             <value>Apache License Version 2.0</value>  
             <value>Apache License</value>  
             <value>Apache 2</value>  
             <value>CDDL 1.1</value>  
             <value>COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL) Version 1.0</value>  
             <value>Common Development and Distribution License (CDDL) v1.0</value>  
             <value>Public Domain</value>  
             <value>Bouncy Castle Licence</value>  
             <value>BSD style</value>  
             <value>Google Web Toolkit Terms</value>  
             <value>ICU License</value>  
             <value>Revised BSD</value>  
             <value>org.slf4j:slf4j-api:jar:1.5.6</value> <!-- no license info - confirmed at MIT Licensed -->  

1. How many ways are there to write 'Apache License V2'?! I know I could write up funky regexp to support picking out the variants being used and reduce the number of elements, but <sigh/>, and in any case I'd be worried to then have an overly-relaxed check miss some strange wording

2. Sun/Oracle and not putting any license in their maven artifacts - now as I understand it getting those artifacts into Central was a fight ... but no license ... grr.

Anyway - I'm seeing vibes indicating that this is an area the gods of Maven are starting to focus on, but as I can see it, its going to be a long hard road towards cleaning this area of maven up and reaching that clear nirvana of easily being able to go 'yep', licenses are a-ok.