28K+ connections, zero messages

TL;DR:

I scaled a real-time leaderboard in Go to 28K concurrent SSE connections, but discovered that my broadcast pattern was fundamentally broken.

Fixing it taught me lessons about concurrency, backpressure, deduplication, fan-out, and observability that apply to any real-time system design.

Optimizations at a glance

Removed per-connection Redis polling in favor of centralized poller
Deduplication with hashing
Broadcast using the Fan-out pattern
Backpressure handling (clear buffer, latest update only)
Observability with Prometheus + Grafana

Last month, I built a real-time leaderboard system in Go that could handle 28,232 concurrent SSE connections - pretty impressive for a Python developer’s first Go project.

Then I ran a test with actual score updates and watched all of my messages disappear into the void.

Here’s how I learned about Go’s concurrency model the hard way, and why understanding channels, goroutines, mutexes deeply can make the difference between a system that works in testing and one that works in production.

Some of the concepts here might be a little difficult to understand, I’m working on a second blog to explain what those are, how I’m using them and how all of those connect, their tradeoffs.

Open Table of contents

The goal
The system
Per connection Redis polling
- Implemented
- Why it’s flawed
Deduplication to reduce waste
- What improved
- Why it’s flawed
Decoupling Redis reads
- What improved
- Why it’s flawed
Fixing Fan Out
- What improved
- What can be improved
Final version
- What can be improved
Learnings
Future work
Implementation notes
- Tech stack
Further reading

The goal

I wanted to build a real-time, read-mostly(write privilege to certain nodes) leaderboard that could push the same top-K view to a lot of clients over SSE.

Targets (going in):

Handle 10k+ concurrent SSE connections on one machine.
Every connected client gets every broadcast.
Built-in observability as a sanity check (visibility).

Outcome (spoiler): I hit 28,232 concurrent connections, then discovered my “broadcast” only delivered to one consumer. The post walks through how I found that flaw and fixed it (correct fan-out, dedup, decoupling, backpressure).

The system

overall architecture

Architecture

The system is laid out into the write and read path.

Write path (Game server -> Redis)

Game servers submit game results to an endpoint, which updates the Redis sorted set (source of truth).

Read path (Redis -> clients)

A single goroutine polls Redis, detects changes, which is broadcasted (latest top-K players) to per-client buffered channels that feed SSE connections.

Other read access patterns exist, but not the topic of discussion in this post.

What this design focuses on

The same update goes to all connected clients(correctness)
Latest-only semantics are fine (skip intermediates under load - latest state matters, not the intermediates)
Slow consumers don’t stall the producer (backpressure)

What this design is not concerned with

Intermediate updates when clients reconnect. Heavy emphasis on latest state view

Given the use case, access patterns and scaling concerns, I decided to opt for a SSE based (over websocket, polling) message sending for real time leaderboard updates.

I started implementing the main SSE handling code, which I did progressively in stages, improving and learning along the way.

Legend

Legend for future diagrams

Per connection Redis polling

In this version, the SSE handler function itself contains the Redis read call, a JSON marshaling sequence.

Implemented

Simple handler code
Redis reads (polling every 2 seconds)
Basic active connection metrics

Initial Redis polling architecture

Initial implementation - note that each connection polls Redis

Click to view code

package backend

import (
	"encoding/json"
	"fmt"
	"leaderboard/src/metrics"
	"leaderboard/src/redisclient"
	"time"

	"github.com/gin-gonic/gin"
)

func StreamLeaderboard(c *gin.Context) {
	c.Writer.Header().Set("Content-Type", "text/event-stream")
	c.Writer.Header().Set("Cache-Control", "no-cache")
	c.Writer.Header().Set("Connection", "keep-alive")
	c.Writer.Flush()

	metrics.ActiveSSEConnections.Inc()

	ticker := time.NewTicker(2 * time.Second)
	defer ticker.Stop()

	for {
		select {
		case <-ticker.C:
			results, err := redisclient.GetTopNPlayers(c, "leaderboard", 10)
			if err != nil {
				continue
			}

			data, _ := json.Marshal(results)
			fmt.Fprintf(c.Writer, "data: %s\n\n", data)
			c.Writer.Flush()

		case <-c.Request.Context().Done():
			return
		}
	}
}

Why it’s flawed

Redis reads coupled with the handler function - each SSE connection will have it’s own read call every 2 seconds. That’s immensely wasteful (considering the same data needs to go out). Higher queries per second to Redis soon becomes an issue that could turn into a scaling bottleneck.
No deduplication - every single read gets converted to a data object to be sent over the network.
Just returning when the connection is closed - connection closing needs to be done properly.
Unbuffered channels - may block.
JSON encoding on each tick.
Active connections never decremented.
Improper error handling.
At scale, Redis connection pool will be exhausted.

Deduplication to reduce waste

Contains a basic deduplication measure - compare two json objects, object by object. This reduces the data sent, helpful to reduce network usage.

There’s also the addition of other metrics to help understand how things are working, what failures are occuring in the app.

Config variables are used more frequently in this update, reducing the number of hardcoded values needed. There’s still a few places where they are used, which may need to be modified later.

What improved

Deduplication by comparing JSON objects
Replace harcoded values by configs
Improved Prometheus metrics

Click to view code

 import (
 	"encoding/json"
 	"fmt"
+	"leaderboard/src/config"
 	"leaderboard/src/metrics"
 	"leaderboard/src/redisclient"
 	"time"
 )
 
 func StreamLeaderboard(c *gin.Context) {
+	metrics.ConcurrentClients.Inc()
+	defer metrics.ConcurrentClients.Dec()
+
 	c.Writer.Header().Set("Content-Type", "text/event-stream")
 	c.Writer.Header().Set("Cache-Control", "no-cache")
 	c.Writer.Header().Set("Connection", "keep-alive")
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
+	defer metrics.ActiveSSEConnections.Dec()
 
-	ticker := time.NewTicker(2 * time.Second)
+	ticker := time.NewTicker(5 * time.Second)
 	defer ticker.Stop()
 
+	var lastData []byte
+
 	for {
 		select {
 		case <-ticker.C:
-			results, err := redisclient.GetTopNPlayers(c, "leaderboard", 10)
+			results, err := redisclient.GetTopNPlayers(c, "leaderboard", int64(config.AppConfig.Leaderboard.TopPlayersLimit))
+			if err != nil {			metrics.RedisOperationErrors.WithLabelValues("get_top_players").Inc()
+			}
+			jsonStart :=time.Now()
+			data, err := json.Marshal(results)
+			metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
+    
 			if err != nil {
-				continue
+				metrics.JSONErrors.WithLabelValues("marshal").Inc()
+				return
+			}
+			if !jsonEqual(data, lastData) {
+				fmt.Fprintf(c.Writer, "data: %s\n\n", data)
+				c.Writer.Flush()
+				metrics.SSEMessagesSent.Inc()
+				lastData = data
 			}
-
-			data, _ := json.Marshal(results)
-			fmt.Fprintf(c.Writer, "data: %s\n\n", data)
-			c.Writer.Flush()
 
 		case <-c.Request.Context().Done():
+			metrics.DroppedSSEConnections.Inc()
 			return
 		}
 	}
+}
+
+func jsonEqual(a, b []byte) bool {
+	return string(a) == string(b)
 }

Why it’s flawed

JSON object comparison is expensive (in case of large blobs, which may be encountered here).
Marshaling still happens on each read, and each read is still coupled, which means high QPS on Redis.

Decoupling Redis reads

Instead of having each client hammering Redis, I implemented a broadcast pattern using goroutines and channels (single producer, multiple consumers).

What improved

Separate goroutine for Redis polling
Deduplication using hashing (SHA256)
JSON marshaling done only when data is changed

Fixed broadcast overview

How it should have worked

Click to view code

 package backend
 
 import (
+	"context"
+	"crypto/sha256"
 	"encoding/json"
 	"fmt"
 	"leaderboard/src/config"
 	"leaderboard/src/metrics"
 	"leaderboard/src/redisclient"
+	"sync"
 	"time"
 
 	"github.com/gin-gonic/gin"
 )
 
+type LeaderboardUpdate struct {
+	Data []byte
+	Hash [32]byte
+}
+
+type LeaderboardBroadcaster struct {
+	// channel for all SSE conns to listen to get lb updates
+	broadcastChan chan LeaderboardUpdate
+
+	ctx    context.Context
+	cancel context.CancelFunc
+	wg     sync.WaitGroup
+}
+
+func CreateLeaderboardBroadcaster() *LeaderboardBroadcaster {
+	ctx, cancel := context.WithCancel(context.Background())
+
+	lb := &LeaderboardBroadcaster{
+		// make the channel buffered - clients may be slow, messages can pile up
+		broadcastChan: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
+		ctx:           ctx,
+		cancel:        cancel,
+	}
+
+	lb.wg.Add(1)
+	go lb.detectLeaderboardChanges()
+	return lb
+}
+
+func (lb *LeaderboardBroadcaster) StopBroadcast() {
+	lb.cancel()
+	lb.wg.Wait()
+	close(lb.broadcastChan)
+}
+
+func (lb *LeaderboardBroadcaster) GetBroadcastChannel() <-chan LeaderboardUpdate {
+	return lb.broadcastChan
+}
+
+// package level var
+var broadcaster *LeaderboardBroadcaster
+
+func SetBroadcaster(b *LeaderboardBroadcaster) {
+	broadcaster = b
+}
+
+func (lb *LeaderboardBroadcaster) detectLeaderboardChanges() {
+	defer lb.wg.Done()
+
+	// TODO: check this time conversion
+	ticker := time.NewTicker(time.Duration(config.AppConfig.Server.PollingIntervalSeconds) * time.Second)
+	defer ticker.Stop()
+
+	var lastHash [32]byte
+
+	for {
+		select {
+		case <-ticker.C:
+			results, err := redisclient.GetTopNPlayers(lb.ctx, "leaderboard", int64(config.AppConfig.Leaderboard.TopPlayersLimit))
+			if err != nil {
+				metrics.RedisOperationErrors.WithLabelValues("get_top_players").Inc()
+				config.Error("Failed to fetch leaderboard from Redis.", map[string]any{"Error": err, "source": "/stream-leaderboard"})
+				continue
+			}
+
+			resultString := fmt.Sprintf("%+v", results)
+			currentHash := sha256.Sum256([]byte(resultString))
+
+			if currentHash != lastHash {
+				lastHash = currentHash
+
+				jsonStart := time.Now()
+				jsonData, err := json.Marshal(results)
+				if err != nil {
+					config.Error("JSON marshaling error",
+						map[string]any{"Error": err, "source": "/stream-leaderboard", "results": results})
+					metrics.JSONErrors.WithLabelValues("marshal").Inc()
+					continue
+				}
+				metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
+
+				update := LeaderboardUpdate{
+					Data: jsonData,
+					Hash: currentHash,
+				}
+
+				// non blocking send
+				select {
+				case lb.broadcastChan <- update:
+				default:
+				}
+			}
+		case <-lb.ctx.Done():
+			return
+		}
+	}
+}
+
 func StreamLeaderboard(c *gin.Context) {
 	metrics.ConcurrentClients.Inc()
 	defer metrics.ConcurrentClients.Dec()
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
+	config.Info("New SSE conn", map[string]any{"Num active clients": metrics.ActiveSSEConnections})
 	defer metrics.ActiveSSEConnections.Dec()
 
-	ticker := time.NewTicker(5 * time.Second)
-	defer ticker.Stop()
-
-	var lastData []byte
+	broadcastChan := broadcaster.GetBroadcastChannel()
 
 	for {
 		select {
-		case <-ticker.C:
-			results, err := redisclient.GetTopNPlayers(c, "leaderboard", int64(config.AppConfig.Leaderboard.TopPlayersLimit))
-			if err != nil {
-				metrics.RedisOperationErrors.WithLabelValues("get_top_players").Inc()
-			}
-			jsonStart := time.Now()
-			data, err := json.Marshal(results)
-			metrics.JSONMarshalDuration.Observe(time.Since(jsonStart).Seconds())
-
-			if err != nil {
-				metrics.JSONErrors.WithLabelValues("marshal").Inc()
+		case update, ok := <-broadcastChan:
+			if !ok {
+				// channel closed
 				return
 			}
-			if !jsonEqual(data, lastData) {
-				fmt.Fprintf(c.Writer, "data: %s\n\n", data)
-				c.Writer.Flush()
-				metrics.SSEMessagesSent.Inc()
-				lastData = data
-			}
+			fmt.Fprintf(c.Writer, "data: %s\n\n", update.Data)
+			c.Writer.Flush()
+			metrics.SSEMessagesSent.Inc()
 
 		case <-c.Request.Context().Done():
 			metrics.DroppedSSEConnections.Inc()
+			config.Info("Closed SSE conn", map[string]any{"open": metrics.ActiveSSEConnections})
 			return
 		}
 	}
 }
-
-func jsonEqual(a, b []byte) bool {
-	return string(a) == string(b)
-}

Benchmarking this with a Go script that opens persistent SSE connections (based off of Eran Yanay’s Gophercon talk) got me 28,232 concurrent connections per second.

Grafana dashboard with 28232 connections

Scaled to 28,232 SSE connections after decoupling

I was pretty happy with these results, especially since the connections maxed out due to hitting the upper limit of outbound ports on Linux, not due to memory or CPU bottlenecks.

Then I realized I had a major bug which I forgot to test for.

Flawed broadcast explanation

The error in my broadcasting implementation

Dashed arrows here show that all connections but one do not get a message.

Why it’s flawed

Fan out implementation is incorrect - One message can be consumed by only a single consumer (goroutine) in a channel. This effectively means - any event added to the channel is consumed by a single goroutine, others never get the message.
Error handling gaps.
Initial message is not sent (messages sent only when the ticker event occurs and there is an update). We should have the client receive some leaderboard message on connect, even if it’s somewhat stale.

Fixing Fan Out

The previous version had a single channel for all broadcast messages to be sent, which would effectively be received by a single goroutine (client). This defeats the purpose of my app, for which I need every single client to receive the same message.

What improved

One buffered channel per client, stored in a map
Mutex applied whenever map is modified, preventing race conditions
Leaderboard change leads to broadcasting message to all channels

Fixed fanout explanation

Correct fan out (broadcast) implementation

Alternative implementations might use Pubsub with popular providers, but it’s probably overkill for this use case.

Grafana dashboard with fanout fix

28232 connections with fixed fan out code

Grafana dashboard for game submissions and SSE messages received

Game submissions are being received by the SSE clients - fan out works

Click to view code

-type LeaderboardBroadcaster struct {
-	// channel for all SSE conns to listen to get lb updates
-	broadcastChan chan LeaderboardUpdate
+type Client struct {
+	ID      int64
+	channel chan LeaderboardUpdate
+	ctx     context.Context
+	cancel  context.CancelFunc
+}
 
-	ctx    context.Context
-	cancel context.CancelFunc
-	wg     sync.WaitGroup
+type LeaderboardBroadcaster struct {
+	clients       map[int64]*Client
+	clientsMutex  sync.RWMutex
+	ctx           context.Context
+	cancel        context.CancelFunc
+	wg            sync.WaitGroup
+	clientCounter int64
 }
 
 func CreateLeaderboardBroadcaster() *LeaderboardBroadcaster {
 	lb := &LeaderboardBroadcaster{
 		// make the channel buffered - clients may be slow, messages can pile up
-		broadcastChan: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
-		ctx:           ctx,
-		cancel:        cancel,
+		clients: make(map[int64]*Client),
+		ctx:     ctx,
+		cancel:  cancel,
 	}
 	lb.wg.Add(1)
 	return lb
 }
 
-func (lb *LeaderboardBroadcaster) GetBroadcastChannel() <-chan LeaderboardUpdate {
-	return lb.broadcastChan
+// Create new channel for client, add to map
+func (lb *LeaderboardBroadcaster) AddClient() (*Client, <-chan LeaderboardUpdate) {
+	lb.clientsMutex.Lock()
+	lb.clientCounter++
+	ctx, cancel := context.WithCancel(lb.ctx)
+
+	client := &Client{
+		ID:      lb.clientCounter,
+		ctx:     ctx,
+		cancel:  cancel,
+		channel: make(chan LeaderboardUpdate, config.AppConfig.Server.BroadcastBufferSize),
+	}
+	lb.clients[lb.clientCounter] = client
+	lb.clientsMutex.Unlock()
+
+	return client, client.channel
+}
+
+// remove specific client channel - closed connection
+func (lb *LeaderboardBroadcaster) RemoveClient(client *Client) {
+	lb.clientsMutex.Lock()
+	defer lb.clientsMutex.Unlock()
+
+	if _, exists := lb.clients[client.ID]; exists {
+		delete(lb.clients, client.ID)
+		client.cancel()
+		close(client.channel)
+	}
+}
+
+// broadcastToAllClients sends an update to all connected clients
+func (lb *LeaderboardBroadcaster) broadcastToAllClients(update LeaderboardUpdate) {
+	lb.clientsMutex.RLock()
+
+	var clientsToRemove []*Client
+
+	// what to do in case Client channel is full, skip this client (to be changed later - add channel clearing mechanism + alerting)
+	for _, client := range lb.clients {
+		select {
+		case client.channel <- update:
+			// sent
+		case <-client.ctx.Done():
+			// clean
+			clientsToRemove = append(clientsToRemove, client)
+		default:
+			metrics.FilledSSEChannels.Inc()
+		}
+	}
+	lb.clientsMutex.RUnlock()
+
+	if len(clientsToRemove) > 0 {
+		lb.clientsMutex.Lock()
+		for _, client := range clientsToRemove {
+			if _, exists := lb.clients[client.ID]; exists {
+				delete(lb.clients, client.ID)
+				client.cancel()
+				close(client.channel)
+			}
+		}
+		lb.clientsMutex.Unlock()
+	}
 }
 
 // package level var
 	broadcaster = b
 
+// poll redis, dedup leaderboard values, push to broadcast to all clients
 func (lb *LeaderboardBroadcaster) detectLeaderboardChanges() {
 	defer lb.wg.Done()
-
-	// TODO: check this time conversion
 	ticker := time.NewTicker(time.Duration(config.AppConfig.Server.PollingIntervalSeconds) * time.Second)
 	defer ticker.Stop()
 					Hash: currentHash,
 				}
 
-				// non blocking send
-				select {
-				case lb.broadcastChan <- update:
-				default:
-				}
+				lb.broadcastToAllClients(update)
 			}
 		case <-lb.ctx.Done():
 			return
 	c.Writer.Flush()
 
 	metrics.ActiveSSEConnections.Inc()
-	config.Info("New SSE conn", map[string]any{"Num active clients": metrics.ActiveSSEConnections})
+	config.Info("New SSE conn", map[string]any{})
 	defer metrics.ActiveSSEConnections.Dec()
 
-	broadcastChan := broadcaster.GetBroadcastChannel()
+	client, channel := broadcaster.AddClient()
+	defer broadcaster.RemoveClient(client)
 
 	for {
 		select {
-		case update, ok := <-broadcastChan:
+		case update, ok := <-channel:
 			if !ok {
 				// channel closed
 				return
 			fmt.Fprintf(c.Writer, "data: %s\n\n", update.Data)
 			c.Writer.Flush()
 			metrics.SSEMessagesSent.Inc()
-
 		case <-c.Request.Context().Done():
 			metrics.DroppedSSEConnections.Inc()
 			config.Info("Closed SSE conn", map[string]any{"open": metrics.ActiveSSEConnections})

What can be improved

Global variable usage for leaderboardBroadcaster
Testing - no tests yet (except for load testing benchmarks)
Flush buffer (empty) if found filled, add in latest update (that’s required)
Figure out how to deal with surges and spikes in traffic
Send an initial update to client when connecting to /stream-leaderboard

Final version

As implemented in this PR.

Sequence diagram of fanout

Sequence diagram of a SSE request

What improved:

Added backpressure handling (clear buffer, add latest update - intermediates do not matter in this application)
Used JSON marshaling before hash based dedup - strings may be different as Go’s map is unordered
Heartbeats to detect dead clients
Fix bugs in Redis code
Use Dependency Injection instead of global var for leaderboardBroadcaster
Send an update to the client as soon as it connects to the SSE endpoint /stream-leaderboard

What can be improved

Still need to figure out how to deal with spikes in traffic (100s of clients try to connect at once)

Learnings

Go channels are point to point, not broadcast
You really need to have different kinds of tests for sanity checks
Sometimes test code can be the issue (earlier load test scripts were a bottleneck)
Connection pools matter for external systems like Redis, databases
Ephemeral ports can be a hidden scaling limit
Observability is not optional

Future work

Immediate fixes

Send a leaderboard update immediately on connecting to SSE stream
Understand how to cleanly shut down connections

Scaling improvements

Add retries to SSE send, add jitter to prevent thundering herd
Learn how to deal with spikes in traffic
Explore horizontal scaling in this kind of system

Production considerations

Reduce Grafana and Prometheus interval
Throughput metrics - per time interval
Validation for score updates - with proper error messages
Authentication for game servers
Postgres integration (leaderboard historical data - aggregatiion)
Redis/database based id to username translation

Implementation notes

Tech stack

I tried to choose the most sensible tech stack, and ended up with:

Go for the backend (high concurrency, more fine grained control and less resource usage than python)
Redis for the sorted set data structure. Could have implemented in pure Go, but overkill at this stage.
Prometheus and Grafana for observability - wanted to learn how observability works and get hands on experience with it.
Docker and Docker compose for coordinating everything. Wrote a blog on docker optimization for this project.
PostgreSQL for persistent storage, historical time queries (in the works).

TL;DR:

Optimizations at a glance

Table of contents

The goal

The system

Write path (Game server -> Redis)

Read path (Redis -> clients)

What this design focuses on

What this design is not concerned with

Per connection Redis polling

Implemented

Why it’s flawed

Deduplication to reduce waste

What improved

Why it’s flawed

Decoupling Redis reads

What improved

Why it’s flawed

Fixing Fan Out

What improved

What can be improved

Final version

What can be improved

Learnings

Future work

Immediate fixes

Scaling improvements

Production considerations

Implementation notes

Tech stack

Further reading