Blog Archives

Comment Systems and Time-Ordering at Scale

Time is a tricky concept in software development, especially in large distributed systems. Not only do you have to worry about the intricacies of computer representations for time itself (and there are many), but you also have to deal with the fact that it’s nigh-impossible to perfectly sync system clocks across multiple servers. This, of course, results in some interesting problems.

One example of a large distributed system is Google+. Obviously, there are quite a few servers that power the Google+ webapp. When a user adds a comment to a post, one of those many servers will be handling the AJAX request to add the comment, and it’ll be talking to one of a number of servers in Google’s backing data store.

Of course, each of those servers has a separate system clock. If those system clocks are even a second or two off from one another, you can get situations where two comments are timestamped in opposite order from when they were actually submitted – because the first comment was processed by a server that had a lagging system clock. You can see an example of that in this public comment thread. Notice how the 2nd comment displayed is actually a reply to the 3rd, and the 11th a reply to the 12th.

Now, this could be a fun little blog post just pointing that out and calling it done – “here’s the reality you have to deal with when you scale this large.” As it turns out, however, there’s a way this particular problem could be solved. Let’s take a look.

Defining the problem

First, let’s define the part of the problem we actually care about. In general, we don’t actually care about perfect time ordering – if two people post unrelated comments within a second or two of one another, it’s fine to display them in either order. What we actually care about is that replies to comments are shown after the comment to which they are replying. If comment B is a reply to comment A, we want to show comment B after comment A. That’s all we really care about – beyond that, the existing rough time ordering is fine.

Outlining the solution

So how can we do this without requiring incredibly precise time synchronization? The trick here is that we’re going to send some extra info from the client to the server (and store it as part of the newly created comment). Specifically, at the same time as we send the contents of a new comment to the server, we’ll also send along a second integer value – call it “parent” for now – that we’ll store along with the comment. Why is this important? Because it’s what will let us help decide ordering when we go to create the comment list.

What will “parent” contain? Simple: the id of the last existing comment the client already loaded. For instance, if the post was loaded with three comments, which had ids 1, 3, and 4 respectively, then the client would send a “parent” value of 4 when it posted a new comment.

When fetching the comment list, we’ll start out how you’d expect: grab a list of all the relevant comments from the data store.  To generate the ordered comment list, we’ll first take all the comments with an empty “parent” value and add them to our result, ordered by timestamp. Next, we’ll take all the comments with a “parent” value corresponding to comments already in our result, and insert them into the result based on two rules:

  1. A comment must be inserted somewhere after its “parent” comment P.
  2. A comment must be inserted after any comments with an earlier timestamp, except where this would violate #1.

We keep repeating this until we have no more comments left to add to the ordering.

Why does this work? It works because of the simple fact that in order for someone to reply to a comment, they first have to read it – which means that their client has to have loaded the older comment. By sending that knowledge to the server, we can create what’s called a partially ordered set out of the comments. With that partial ordering, we can then generate a final ordering that meets our desired goals. The algorithm outlined above is basically a pared-down adaptation of a vector clock.

Proof of concept

I’ve created a small proof of concept as a Python script. Here’s the output (notice, in particular, the ordering of #3 and #4):

#0 (t=0)
#1 (t=2)
#2 (t=4) - reply to #1
#3 (t=6)
#4 (t=5) - reply to #3
#5 (t=8)
#6 (t=16)

The script is just a quick example thrown together in under an hour – I make no guarantees on whether it’s the most optimal code or not. Feel free to play around with it.

Closing notes

There is a trade-off involved in this: the algorithm that generates this ordering is worst-case O(n²), compared to just sorting the list based on timestamp in O(n log n). For most scenarios, however, this is acceptable – in the case of Google+, it’s extremely unlikely to be a problem, given that G+ comment threads cap out at something like 500 replies. With such a small n, the difference in time is negligible. The sorting can also be done client-side if desired – no extra server processing is required.