Wednesday, July 27, 2011

SimpleDB Leaderboards the ongoing story...

Optimization Update

Just a quick update today.

Following some advice from AWS peeps, I've modified the service so its possible to configure a leaderboard service instance to have each leaderboard use its own unique domain.

Coupling that with re-running some tests locally on an EC2 instance rather than on my desktop machine, through 2 proxies, and only then out to the SimpleDB servers, has generated some interesting and hopeful new timings for me for the process of regenerating the predictors for a leaderboard.

The test sample here being a single leaderboard with approx 343,000 entries and 6 columns of data, but specifying the test to generate predictors only for one column.

The same predictor generation was run against this data set and time in seconds taken recorded as below.

Service Type Local EC2
Domain per leaderboard 224 68
Single domain for all leaderboards 276 129

The results which came out of this were interesting and rather hopeful. Given that the predictor in these tests is set to generate 1000 sample data points, and to do this it does the read-skip-read(...) routine, that means each predictor generation is doing on the order of 2000 roundtrip requests to EC2 - these taking ~60ms when running on EC2 versus >120ms when running locally meant that running the test locally was hiding a big chunk of the computational cost of having multiple leaderboards in a single SDB domain - hence when running directly on an EC2 instance there is an almost 100% decrease in time taken to generate predictor.

According to AWS experts who suggested this approach, this is because SimpleDB can optimize queries which have their query operating against a single column rather than multiple - so the query looks like the following (domain per leaderboard at top, older shared domain across leaderboards below).

embed

end embed


String concatenation avoidance
Incidentally, there was another change I made during this evolution. I've been seeing how the SimpleDB queries were getting somewhat evil, and harder to avoid making a cock of when forming via string + string (or StringBuilder).
Having seen how comparatively clean the process is using Google AppEngine datastore Java API (couldn't comment for go/python) using their QueryBuilder class and preparing queries programmatically using a nice fluent API (and hence avoiding manual stringiness), I realised the time had come to knock up something similar for use here.

Currently its only being used in the leaderboard-simpledb code, and nowhere else, but its certainly been worth the (very small) effort - the java class is less than 200 lines long and that's with my formatting style which is very whitespace-ish.

I'd definitely recommend doing this early though, as I'm probably as a result going to change the API wrapper I have for SimpleDB to force use of QueryBuilder instances rather than direct Strings as the argument to SDB selects... but this is going to make many other things go 'booom!' and have me go through and fix all my fail.

Hey ho though, such is life and learning.

TTFN

Tim


0 comments:

Post a Comment