Skip to content
Go back

Building a scalable real time leaderboard

Updated:  at  04:22 PM

TL;DR

I built a real-time tournament leaderboard system using Go, Redis Sorted Sets, and Server-Sent Events. It streams live updates to clients and can handle hundreds of concurrent connections and game submissions per second. This project helped me learn Go, experiment with observability using Prometheus + Grafana, and optimize a system for high concurrency. Repo


Some time ago I started working on a real time leaderboard and got to learn a lot from building it. This post discusses a lot of the details and how I improved it, learnt things while building.

Motivation

Click to expand

This project started from a conversation I had with my friend Ritu Raj on the topic of game servers in MMORPGs. I tried to dig deeper into this, could not find any resources so decided to try looking into actual MMORPG code. Since there weren’t any good open source examples, I settled for the next closest thing - Lichess.

Lichess is an online chess platform that handles ~100,000 games any time of the day. It’s completely free to play (non commercial - org), all the code is open source and the community is pretty nice too. I’ve used it for years (for puzzles mostly) and after some digging into the code and talking to the people who made it, I realized that it’s a huge system and a bit too complex to replicate at my current skill level.

So I started with recreating a component of this website, the real time leaderboard that’s used in the tournaments.

The problem

Real time leaderboards are read heavy applications, but the write throughput and latency need to be optimized too. There’s high frequency updates (especially in MMORPG games), concurrent reads and writes and a need for low latency (although this requirement is nowhere close to some other stuff I’ve built).

I’ve been an avid user of Python, and I’d like to think I’m somewhat decent at it (though I do doubt that sometimes). I did not believe that this would be the right language for these requirements, so I looked for a tool that’d provide me the necessary performance without having a steep learning curve, which led me to Go.

Architecture

Core components

APIs

APIs intended for game server use:

APIs intended for end user use:

These provide all the necessary interactions with outside components, but the main thing to look out for is the streaming API which enables real time updates. Most of the discussion will be about that, since it’s expected to be the most frequently used access pattern by a huge margin in these kinds of systems, and therefore optimizing the critical path is the most important.

Read path

Write path

Why this tech stack?

Since one of the primary motivations for this project was to learn concurrency and observability, I decided to go with a tech stack I’m not familiar with.

Concurrency

Observability

Initially started as a way to monitor performance.

Zap (Uber) used for logs, Prometheus for metrics, Grafana as dashboard for Prometheus metrics.

Some examples of the metrics used:

One thing that bugged me while testing this app was the need to keep creating the grafana dashboard every time that I started it up.

Usual methods for dashboard persistence use Helm charts and something called Jsonnet, which felt far too complex for the thing I’m building.

Searching for options led me to a simple was of provisioning dashboards that are reproducible with just a JSON file and a change to the docker compose setup, which I’m gonna document soon at pranshu-raj.me/reproducible-grafana-json

Performance

Benchmarking setup

Initially I used a simple Go script which opened up multiple SSE clients, which got upto about 800 connections. This felt incredibly odd to me, since I wasn’t seeing any clear reasons for failure (all system indicators were normal). I tried improving the memory usage which at this point was about 7-10 goroutines per connection (don’t ask - I don’t know why this happened), but this didn’t improve things even after lots of tweaks to improve memory and CPU usage.

Eventually I decided to try a different testing method, checked out various common load test tools (k6, vegeta, wrk) but they did not have anything for SSE specified. I decided to go back to one of the earliest inspirations for doing this - Eran Yanay’s 1 million websocket connection repository. I then adapted his test script for SSE, and that got me to 15,400 connections on windows. This was still not what I wanted, but a huge win nevertheless.

Still, this didn’t have the memory or CPU util issues that might have stopped more connections, so I decided to dig around. It turns out that windows has a limit for outbound ports (something I later learned also exists on Linux), so I switched over to Linux and got to 28,232 connections.

28,232 is a very peculiar number,

Issues faced (and how they were/will be fixed)

Experimental features

Planned features

Extensions



Next Post
Backpressure in Distributed Systems