Wednesday, July 27, 2011

SimpleDB Leaderboards the ongoing story...

Optimization Update

Just a quick update today.

Following some advice from AWS peeps, I've modified the service so its possible to configure a leaderboard service instance to have each leaderboard use its own unique domain.

Coupling that with re-running some tests locally on an EC2 instance rather than on my desktop machine, through 2 proxies, and only then out to the SimpleDB servers, has generated some interesting and hopeful new timings for me for the process of regenerating the predictors for a leaderboard.

The test sample here being a single leaderboard with approx 343,000 entries and 6 columns of data, but specifying the test to generate predictors only for one column.

The same predictor generation was run against this data set and time in seconds taken recorded as below.

Service Type Local EC2
Domain per leaderboard 224 68
Single domain for all leaderboards 276 129

The results which came out of this were interesting and rather hopeful. Given that the predictor in these tests is set to generate 1000 sample data points, and to do this it does the read-skip-read(...) routine, that means each predictor generation is doing on the order of 2000 roundtrip requests to EC2 - these taking ~60ms when running on EC2 versus >120ms when running locally meant that running the test locally was hiding a big chunk of the computational cost of having multiple leaderboards in a single SDB domain - hence when running directly on an EC2 instance there is an almost 100% decrease in time taken to generate predictor.

According to AWS experts who suggested this approach, this is because SimpleDB can optimize queries which have their query operating against a single column rather than multiple - so the query looks like the following (domain per leaderboard at top, older shared domain across leaderboards below).

embed

end embed


String concatenation avoidance
Incidentally, there was another change I made during this evolution. I've been seeing how the SimpleDB queries were getting somewhat evil, and harder to avoid making a cock of when forming via string + string (or StringBuilder).
Having seen how comparatively clean the process is using Google AppEngine datastore Java API (couldn't comment for go/python) using their QueryBuilder class and preparing queries programmatically using a nice fluent API (and hence avoiding manual stringiness), I realised the time had come to knock up something similar for use here.

Currently its only being used in the leaderboard-simpledb code, and nowhere else, but its certainly been worth the (very small) effort - the java class is less than 200 lines long and that's with my formatting style which is very whitespace-ish.

I'd definitely recommend doing this early though, as I'm probably as a result going to change the API wrapper I have for SimpleDB to force use of QueryBuilder instances rather than direct Strings as the argument to SDB selects... but this is going to make many other things go 'booom!' and have me go through and fix all my fail.

Hey ho though, such is life and learning.

TTFN

Tim


Thursday, July 14, 2011

A Leaderboard Storage implementation using Amazon SimpleDB

Summary


This post is an outline of a component design which uses SimpleDB as a backing store for a leaderboard system.

It is still a work-in-progress, but has already had various stress test operations performed on it.

This document was written in order to provide a little insight into one way of tackling this problem domain, as well as allowing myself to clarify the design which I had described a few times already internally, but never really written a design summary for.

The implementation of the service itself is in Java, using aws-java-sdk to communicate with SimpleDB via a wrapper library written for the project which helps provide monitoring information (via JMX) to keep tabs on a live basis on SimpleDB usage and issues, as well as to handle some of the dynamic request ramping/throttling we previously found we needed when making use of AWS services.


N.B. YMMV - This is for certain not the only way to do it, and does not mean I would endorse this as the only way to do it, or indeed a necessarily optimal way to do your leaderboards



TL;DR
Multiple leaderboards per table. Cron task update rank prediction curve per leaderboard sort column. Estimate rank from current score when retrieving results.

Domain Structure

An instance at the application level of SimpleDBLeaderboardService maps to a single SimpleDB domain. This is used for one or more individual leaderboards.

The SimpleDB limitations of domain sizes are a fundamental limiting factor on the size of the data within a SimpleDBLeaderboardService instance.

Dependant on the usage pattern there could be a small number of unique leaderboards, each with a large number of individual entries, or a larger number of leaderboards with fewer entries.

If there is a need for many leaderboards and many rows, then either the implementation would have to be expanded to allow finer grained splitting out of leaderboards, with the simplest amendment being to have one domain per leaderboard (though this would be at a cost of eating up domains rather rapidly).

As you can see from the table below, there are relatively few attributes inherent in the system, with most being those relating to the runtime determined and leaderboard specific sorted and unsorted columns.

Attribute Description Example
itemName() SimpleDB item UID - in this case a leaderboardId + ownerId Week-21_1070385706236341127
leaderboardId Unique identifier to the leaderboard within the service domain Week-21
timestamp Timestamp of when the score for this row was last updated 2011-05-23T19:30:52.709
owner Entity which posted the score (e.g. a user or clan UID) 1070385706236341127
<sorted attribute> Value of a named attribute which is a valid column for rankingss (e.g. xp, accuracy, rescues, damage, kills). Padded, fixed accuracy, and offset for natural sort order of string value to apply 0000000600
<unsorted attribute> Value of a named attribute which is not a column used for ranking (e.g. shots, damage). Not padded/offset as natural sort orderings need not apply 1600000


Posting Scores
When a client posts game results, this results in some internal processing outside of the scope of this description, but which ultimately can result in some leaderboard columns being updated.

The part relevant to the implementation of the SimpleDBLeaderboardService is the part where the application :-
  • Retrieves the current row data for the specified LeaderboardId and owner UID.
  • Accumulates data for columns which are accumulators (e.g. shots_fired)
  • Conditionally updates other columns which are either 'lowest' or 'highest' value if the score post contains a better score than the current value for that column (e.g. time_to_complete in posted score = 60, stored = 40).
  • Posts any updated sorted/unsorted attribute values
Posting always has to be done by a read for existing Leaderboard data for that user followed by a possible write if there is anything to update, as the services makes use of SimpleDB conditional writes to ensure transactionality on the update (the timestamp of the read in data is used as a condition check if there was a previous entry, and the absence of any timestamp attribute if there was not). If this fails on the write, the process has to be repeated from the read onwards.


Retrieving Scores
Retrieval of score data from the leaderboard consists of 4 main elements, and can be pulled using several different view types available from the SimpleDBLeaderboardService.

The elements are :-
  • Systemic data such as LeaderboardId, Owner Id and timestamp.
  • Unsorted stats column data.
  • Sorted stats column data.
  • Ranking information for a specified sorted column.
The SimpleDBLeaderboardService implementation are as follows for the different views :-

1. Positions for owner
Returns just the ranks for one or more sort columns for a single specified owner on a particular leaderboard.

The service does this by retrieving current score data for the specified user, then for each specified sort column gets the rank from the PositionCalculator (see Rank Predictors below) specified for the service instance for that column.

2. Result for owner
Returns just the current leaderboard result (sorted + unsorted column values) for a single specified owner (no ranking information returned).

The service does this by retrieving current score data for the specified user.

3. Results around owner
Returns a pivot of leaderboard results of the people at the ranks around that of the specified owner for a specified sort column.

The service does this by :-
  1. Retrieving the results data for the specified user.
  2. Requesting the rank from the PositionCalculator (see Rank Predictors below) specified for the service instance for that column for the specified user
  3. Executing a query which retrieves (in order of the specified sort column) the a fixed number of rows with scores higher than that of the initial user, 
  4. Executing a query which retrieves a fixed number of rows with scores lower than that of the initial user.
  5. Assembling the results in a single respose list in order (better score rows, user rows, worse score rows) using the rank from the pivot user to fill in the ranking information for the other rows.
4. Results for owners
Returns current leaderboard results for a set of specified owners, optionally including ranks for a specified sort column

The service does this by executing a series of individual requests for results for each owner, and then requesting that owner's rank from the PositionCalculator  (see Rank Predictors below) specified for the service instance for that column for the specified user.

It is likely this request would want/need optimisation at a later stage to either parallelise the requests, or instead make a single query of form 'SELECT * from domainname WHERE itemName() IN ###names###' which may perform better.

5. Results paged
Returns an ordered page of leaderboard results for a specified sort column, along with a cursor for retrieving further results.

The services does this by performing a SELECT query using the specified name in an ORDER BY clause, limiting by the requested page size, and extracting in an application 'opaque' cursor if provided, the start rank for the current page and SimpleDB nextCursor to use in the select. The response includes an updated cursor for further pagination.

As such this request never refers to the PositionCalculator  (see Rank Predictors below), instead relying on the fact that either no cursor was supplied and we are starting at rank #1, or have been provided an encoded rank within the cursor supplied with the request.


Parallelism

Retrieval requests of all forms are pretty much constant time regardless of concurrency of requests coming into the system, as the each make use of SimpleDB largely independantly (YMMV disclaimer here!).

This is mainly down to avoiding coupling in the system between requests acting in parallel, and the fact that connections are not too much of a precious commodity when communicating with SimpleDB - with the current setup each front end server instance communicating with SimpleDB has a pool of 100 http connections it can put into use for SimpleDB communications for the purpose of leaderboards, and this can be increased either directly by increasing the pool, or indirectly by splitting the pools (particularly useful if 1 set of leaderboards starts starving out everyone else!).

There is some more parallelism which could be applied to the retrieval process, but mainly on in the request which gets multiple users data in a single request, which currently performs 'n' queries equal to the number of users in the request - this is mitigated somewhat though, as it does not go through the 'select' mechanism but instead is a straightforward 'getItem' request, which in terms of billed CPU time is much cheaper than a number of  'select's.

Write throughput when posting scores is limited not so much by the amount of per-leaderboard activity, but mainly by the rate at which posts can be made to a leaderboard before concurrent updates collide and cause out-of-order updates to be detected at which point back-offs need to start being done. This is almost certainly going to mainly affect clan rather than user leaderboards, as the use-case which would generate concurrent multiple leaderboard updates for user leaderboards is far more rare than that where score posts from different users which generates clan leaderboard updates will be attempting to update a single clan's leaderboard row at the same time.

Largely though, even though a single leaderboard update in our current setup will usually result in updates to multiple leaderboards (for example overall all-time, overall this week, overall today, per-mission all-time, per-mission today ...), these are done in series rather than in parallel, as part of a single job processing operation, as there is expected to be many other jobs which are being processed separate to this, which are running in parallel 

(and setting up each leaderboard update as its own queued job in that end of the system may drive me bonkers when debugging).

Therefore the main worry is in the retrieval of rank in these, as described below.


Rank Predictors

The initial prototypical implementation to provide rank information in the responses involved making the other requests to form up the dataset for the response, followed by looping round and essentially executing a big count to find the rank

embed

end embed

Clearly, this is meh-fine when looking at low-traffic uses, or if correct ranking is more import than any semblance of performance, but it does at least provide a 'native' fallback.

When testing on larger datasets, it became rapidly obvious this was going to be painfully slow though, particularly when querying for the rank of people who were in the >100,000 rank bracket (where I usually am on most games).

As a result of this, we implemented and tested an alternative way to provide estimated ranking information.

This is based on performing samplings from the full dataset to refresh a prediction curve periodically as follows :-

  1. Determine total number of entries in leaderboard
  2. Divide total entries by required sample count to get 'skip size'
  3. Generate a query statement for 'skip' and for 'retrieve', using the same WHERE clause (ordering on sort column).
  4. execute the retrieve query, with a LIMIT of 1 to retrieve a single row's column value - store this along with the rank (1)
  5. execute the skip query using nextToken from previous query
  6. repeat execution of retrieve query LIMT 1 for single row, storing the value along with rank (1 + skipsize)
  7. repeat skip + retrieve 1 until end of leaderboard
  8. Generate score predictor to determine estimated position based on a provided score value.
This predictor can (and is) be generated periodically by any job-processing backend node, which then shares this data via a clustered caching mechanism (memcached, hazelcast for example), so that client-facing nodes only ever get their rankings from a predictor - never having to generate the predictor on the fly (this isn't fast by any means)

N.B. This still needs a slightl tweak as tests against real stats from a previous title showed the predictor was pretty accurate but only once past the initial top 'N' scores (say 500 of 250,000). As such an update may be made which either increases the number of samples taken at top-end, or instead just reverts to NativePositionCalculator if the estimated rank is near the top end.


Alternative implementations may be added in the future which could include
  • Singleton updated inverted index for rankings - downside could involve temporarily showing ranking for a user based against their old score not their current, unless we can think of a way to be able to still do lookups on this index against scores and get the 'nearest' 2 scores if no exact score match is found and interpolate (or somesuch).
  • Use a different nature of datastore such as Cassandra or MongoDB which does this sort of sorting natively (although semi-ironically Cassandra sorts but really doesn't want to provide the ranking).