Skip to content
Go back

backpressure

Published:  at  05:30 PM

Backpressure is one of those things that can make or break a distributed system, and is handled in an amazing way by a lot of tech around us.

I recently got the chance to interact with it while building my real time leaderboard, where I had to account for this to enable clients have the best possible experience.

So what is it, really?

There’s two competing definitions.

  1. A technique used to regulate the transmission of messages (or events, packets).
  2. The capability of producers of messages to overload consumers. (Reference to Baeldung’s blog)

Though both are correct, I prefer the second definition and will use that throughout this post.

Backpressure happens when your system can’t keep up with the amount of work being thrown at it.

Fast producers overwhelm slow consumers

Why is this an issue?

Here’s a small list of issues that can occur if backpressure isn’t handled correctly.


When does backpressure occur?

Let’s first define the system in which backpressure will be encountered, then do a code prototype and discuss strategies for resolving it.

There’s three components to this system

  1. Producer creates and initiates send of the message to the consumer.
  2. Messaging system which receives messages from the producer and forwards them to the consumer. (This part may not be present separately, can also be the network buffers of the system.)
  3. Consumer receives messages from the messaging system and processes them.

// TODO: add backpressure diagram, tetris image

Things work fine if the rate at which messages are created by the producer is less than or equal to the rate at which messages are processed by the consumer. If the creation rate exceeds the rate of consumption, we have a problem.

I like to think of this in terms of playing games like Tetris, at first the blocks arrive slowly and you’re able to process (move and rotate) them easily. As time goes on, the rate at which these blocks arrive speeds up and overwhelms you and at some point it’s game over.


How to fix it?

There’s a bunch of ways this can be handled, depending on the system and message constraints.

1. Slow down producers

Consumer sends a signal to the producer to slow down. This can be applied where the rate of messages can be controlled, and consumer should be given control of it.

In Go this can be implemented through the use of a channel to signal when the message rate should drop.

Slowing down producers by sending signal

Slow down producers

TCP does something similar, which is discussed below.

Tradeoffs

2. Drop existing messages

If the messages existing in the queue are not as important as the ones that are being sent by the producer, the existing messages can be dropped. The exact strategy of dropping (drop oldest, drop all, priority based etc.) depends.

Dropping existing messages in the buffer

Drop existing messages

This is the approach I’ve used in my real time leaderboard, as the final state matters and not the intermediate states. If the producer is throttled instead, the leaderboard sent to consumers will be of older intervals for all the clients. Instead skipping on a few intermediates (which haven’t been received by client) and directly sending the final state to slow clients is a better solution.

Tradeoffs

3. Drop incoming messages

Probably the simplest method, to not accept any more messages from the producer until the space has freed up, without explicitly telling it to slow down. In producers this can be combined with retries and checks - if retries exceed a certain limit throttling can be done without any communication.

Dropping incoming messages

Drop incoming messages

Tradeoffs

4. Increase consumers

An example of this is an async task queue for processing documents (or scalable notification system, similar function).

System when worker pool is used

Scale out workers

There will be a pool of workers which can be scaled up or down based on the amount of messages being received, an intermediate consumer may be used to just assign the messages (tasks) to the workers. This is also used in several systems like Nginx.

Tradeoffs


Warpstream reference

Warpstream is a diskless, Apache Kafka streaming platform which has a great blog on this topic, which I referred to for enhancing my understanding on it.

// TODO: talk about the cases I’ve not covered in my explanation, and how they think of and handle it.

Dealing with rejection (in distributed systems)


How TCP deals with backpressure


What other systems implement this

Backpressure is a recurring theme in distributed systems. It shows up quite prominently in


Further reading