Friday, April 15, 2011

On Google AppEngine and keeping one's options open

As I've mentioned previously, one thing I've been looking at recently is making sure I had a caching abstraction which would allow me to flick fairly seemlessly between caching implementations.

So far I have working :-

  • Hazelcast (using their distributed Map with ttl support).
  • Memcached (spymemcached client library and xmemcached client library).
  • Google AppEngine Memcached.
[note to self - add links]

Now the last one led me down an interesting road but caused me some pain, as both with hazelcast and memcached of normal proportion, I was able happily enough to have a gets and puts model using 'long' version Ids. Google AppEngine (GAE) memcached though, stuffed that plan, as for some reason they went 'clever' on the API and instead wrap the version identifier in an instance of an interface 'IdentifiableValue'.

Cue Tim going through and having to refactor the API *again* to change over in a similar fashion in order to be able to play nice with this.

Now I know there's going to be a collision with this API later on, and I'm definitely tending down a memcached style route for the caching, maybe eventually Hazelcast will sink quietly into the sunset, and I'll add some batch gets/puts and memcached counter support... but that's for the future, once I've got some way further with getting heavier into seeing what I need to do to get this trainset to scale.

... but this isn't what I mean by keeping my options open in the title of the post.

What I did realise was that firstly while I could build the GAE memcached client library, it would be largely pointless as things stood as you can only really use it if your app is within the GAE environment. Secondly I realised that the same testing of assumptions which shook things out with the caching component could have similar useful effects on the APIs of other components.

and thirdly ... whilst I doubt it'll ever see the light of day in production, if I've built up the app structure in a fashion compatible with GAE, I could actually have a backup destination of GAE for the application.

Oh how clever I felt ... right up until I realised just how many elements were there and needed GAE equivalents implementing... but I got through it, and can now deploy a working front end app to GAE, giving me that tiny bit of extra comfort that I don't have all my eggs in one basket.

There were some interesting elements though worth sharing with the different elements

1. Maven support
Last time I looked at Google Appengine, integrating with maven was clunky to say the least. This has definitely improved, and whilst I did get a little caught out and confused with the SDK installation as a mavenised thing (hint: gae:unpack), needed before doing handy things like uploading a built war project to appengine (hint: gae:update), it made sense in the end.

2. Deployments
The archtecture split up is certainly different due to appengine's restrictions, and because essentially its elephants all the way down (i.e. webapps). What I had intended to split out as a backend webapp splatting out jobs into the job queue using spring-quartz, instead in Appengine fits better as handler REST resources triggered by Appengine cron.xml task definitions in WEB.INF... not necessarily better, or worse, just different. I may come to like it quite a bit, who knows.

Also, I had been heavily tilted towards a parameterized deployment in the style favoured by the likes of Amazon Elastic Beanstalk. Handily enough, that can be largely matched in Appengine by putting environment variables into the appengine.xml file in WEB-INF, but I could see if I really wanted to use it, I'd probably want to do some more work to get these params injected in via the maven build so I could use maven to say 'clean build update -D<some options>' to build and deploy specific environments - right now its hardcoded for a particular env setup in the xml files ... which is naughty, but certainly fixable.

3. Task queues
The Appengine version of queueing service made my head twist and spin significantly. Going from easy enough SQS or JMS queues to generating tasks as URLs which then get integrated by providing the appropriate handler and chugging on that seems simple enough, but caught me out in unexpected ways, particularly with functional testing.

In theory, there's the simplified process for generating DeferredTasks, but it generates a problem itself, in that the tasks run essentially by implementing 'run' by implementing 'DeferredTask' would have to somehow get their service contexts re-injected - turned out for my purposes easier to just take my medicine and implement a handler to hook into a servlet, one way or another.

4. The big daddy - GAE Datastore
I looked hard and squinty eyed at the various wrappers for ORM or whatever around GAE's very bigtable-ish datastore, but in the end found I had the easiest run at it just using the low-level API.

I had looked into this before through the lens of JPA, and got myself all kinds of confused. Perhaps its been the experience of working with SimpleDB for the last year or so, and escaping the evil clutches of SQL, but it all seemed to make a great deal of sense, and the combination of a little bit of transactionality (and bubbling back up to the app if you need to have another go at it due to transaction conflicts), a simple but rather powerful query model, modelled in straight java (I actually prefer this to having to assemble the SimpleDB selects), and the actual underlying schemaless datatastore 'as a service' was a very quick port from SimpleDB, and for the most part felt very 'right'.

So ... quite liking Google's AppEngine right now, and the tooling around it was a bit of a pleasure to work with.

Bet if I REALLY had to use it in production I'd find some fun niggles though :D



Post a Comment