Building a scalable real time leaderboard

TL;DR

I built a real-time tournament leaderboard system using Go, Redis Sorted Sets, and Server-Sent Events. It streams live updates to clients and can handle hundreds of concurrent connections and game submissions per second. This project helped me learn Go, experiment with observability using Prometheus + Grafana, and optimize a system for high concurrency. Repo

Some time ago I started working on a real time leaderboard and got to learn a lot from building it. This post discusses a lot of the details and how I improved it, learnt things while building.

Motivation

Click to expand

This project started from a conversation I had with my friend Ritu Raj on the topic of game servers in MMORPGs. I tried to dig deeper into this, could not find any resources so decided to try looking into actual MMORPG code. Since there weren’t any good open source examples, I settled for the next closest thing - Lichess.

Lichess is an online chess platform that handles ~100,000 games any time of the day. It’s completely free to play (non commercial - org), all the code is open source and the community is pretty nice too. I’ve used it for years (for puzzles mostly) and after some digging into the code and talking to the people who made it, I realized that it’s a huge system and a bit too complex to replicate at my current skill level.

So I started with recreating a component of this website, the real time leaderboard that’s used in the tournaments.

The problem

Real time leaderboards are read heavy applications, but the write throughput and latency need to be optimized too. There’s high frequency updates (especially in MMORPG games), concurrent reads and writes and a need for low latency (although this requirement is nowhere close to some other stuff I’ve built).

I’ve been an avid user of Python, and I’d like to think I’m somewhat decent at it (though I do doubt that sometimes). I did not believe that this would be the right language for these requirements, so I looked for a tool that’d provide me the necessary performance without having a steep learning curve, which led me to Go.

Architecture

Core components

Application layer - a simple http server
Data layer - Redis for sorted set operations (eventually replaced by Go implementation)
Real time communication - the most interesting technical aspect, based on server sent events (SSE)

APIs

APIs intended for game server use:

POST /submit-game: Endpoint for game servers to publish results of a game, with a predefined schema in order to update scores.

APIs intended for end user use:

GET /stream-leaderboard
GET /leaderboard
GET /player-stats
GET /game

These provide all the necessary interactions with outside components, but the main thing to look out for is the streaming API which enables real time updates. Most of the discussion will be about that, since it’s expected to be the most frequently used access pattern by a huge margin in these kinds of systems, and therefore optimizing the critical path is the most important.

Read path

Write path

Why this tech stack?

Since one of the primary motivations for this project was to learn concurrency and observability, I decided to go with a tech stack I’m not familiar with.

Go - It’s fast, great support for concurrency, has strong tooling, and is well-suited for systems programming (plus good benchmarking tools).
Sorted Sets - Natural data structure for this use case. Adding, updating and deleting entries all cost O(log(N)) time, also supports range queries (needed to get top k players quickly).
Redis - Has a great sorted set implementation already. Is also incredibly fast, although I’d love to see how it compares to a sorted set implementation in native Go.
Gin - Lightweight web framework that keeps things simple while giving useful abstractions like middleware and routing.
Server-Sent-Events - Needed a way to stream updates unidirectionally. Polling and websockets were my other options but these are said to be more resource intensive for this use case.
Prometheus + Grafana - To monitor system behavior and visualize metrics like request rates, Redis latency, memory usage, etc.
Zap (Uber) - For structured logging, to make logs easier to search and filter later.
Docker (and compose) - Containerization, spinning up the whole system with simple commands, help with reproducible dashboards. Later testing setup used docker networking for multiple client containers (which allows different ip for each container).

Concurrency

Observability

Initially started as a way to monitor performance.

Zap (Uber) used for logs, Prometheus for metrics, Grafana as dashboard for Prometheus metrics.

Some examples of the metrics used:

Counter ():
Gauge ():
Histogram:

One thing that bugged me while testing this app was the need to keep creating the grafana dashboard every time that I started it up.

Usual methods for dashboard persistence use Helm charts and something called Jsonnet, which felt far too complex for the thing I’m building.

Searching for options led me to a simple was of provisioning dashboards that are reproducible with just a JSON file and a change to the docker compose setup, which I’m gonna document soon at pranshu-raj.me/reproducible-grafana-json

Performance

Benchmarking setup

Initially I used a simple Go script which opened up multiple SSE clients, which got upto about 800 connections. This felt incredibly odd to me, since I wasn’t seeing any clear reasons for failure (all system indicators were normal). I tried improving the memory usage which at this point was about 7-10 goroutines per connection (don’t ask - I don’t know why this happened), but this didn’t improve things even after lots of tweaks to improve memory and CPU usage.

Eventually I decided to try a different testing method, checked out various common load test tools (k6, vegeta, wrk) but they did not have anything for SSE specified. I decided to go back to one of the earliest inspirations for doing this - Eran Yanay’s 1 million websocket connection repository. I then adapted his test script for SSE, and that got me to 15,400 connections on windows. This was still not what I wanted, but a huge win nevertheless.

Still, this didn’t have the memory or CPU util issues that might have stopped more connections, so I decided to dig around. It turns out that windows has a limit for outbound ports (something I later learned also exists on Linux), so I switched over to Linux and got to 28,232 connections.

28,232 is a very peculiar number,

Issues faced (and how they were/will be fixed)

Docker builds taking too much time (and space) - fix documentation
Broadcast not working as intended (messages sent to only one client) - implement fixed fanout
28,232 connection limit - some Docker networking magic
Load test script not working correctly - built a new one based off of Eran Yanay’s Gophercon talk

Experimental features

Historical querying through event streaming to Postgres (a TSDB would be fine too)
Initial value update (push a lb update when client connects)
Sorted set implementation using skip list

Planned features

Extensions

Create and connect with matchmaking and playing services (probably use bots for playing)
Horizontal scaling of servers
Test how much this can scales (I believe 1M is easy enough, given enough laptops to have clients on)