Article


Brief introduction to the Topos Reliable Broadcast protocol

In this post, we will give a brief introduction to reliable broadcast and how we use it in Topos.

The first thing to mention about reliable broadcast is that it can be implemented in the asynchronous model, meaning that the correctness of the algorithm is independent of any timing assumptions. Under asynchrony, given that the adversary is bounded by a threshold (typically, in a system of nn processes, we assume an adversary who can only control ff processes such that 3fn13f \leq n-1), the protocol is not vulnerable to the adversary delaying messages.

What is reliable broadcast?

The properties of a reliable broadcast are the following:

  • Validity: If a correct process pp broadcasts a message mm, then every correct process eventually delivers mm.
  • Consistency: If a correct process pp delivers a message mm and another correct process qq delivers a message mm', then m=mm = m'.
  • Totality: If a correct process delivers a message, then every correct process eventually delivers a message.

The combination of the Consistency and Totality properties makes up for the agreement property.

Given these properties, let’s build a reliable broadcast!

We will show a simple and efficient reliable broadcast which tolerates a 0f<n30 \leq f < \frac{n}{3} fraction of Byzantine processes in a system of a total of nn processes.

The broadcast primitive is composed of steps during which processes wait until they received enough messages before sending certain messages themselves:

  • Step 1 (Broadcast): a process broadcasts a message mm to the rest of the system.
  • Step 2 (Echo): upon reception of the message mm for the first time, a process echoes the message to all the processes with an (Echo,m)(Echo,m) message.
  • Step 3 (Ready): upon reception of more than n+f2\frac{n+f}{2} (Echo,m)(Echo,m) messages or f+1f+1 (Ready,m)(Ready, m) messages, a correct process sends a (Ready,m)(Ready,m) message to everyone.
  • Step 4 (Delivery): upon receiving 2f+12f+1 (Ready,m)(Ready,m) messages, a correct process delivers the message mm.

Performance wise, this primitive has quadratic message complexity so it does not scale well.

But more on this later!

Now that we have our reliable broadcast, let’s prove its correctness:

Proof (Validity): If the sender of message mm is correct, then all correct processes will send an (Echo,m)(Echo,m) message. All correct processes receive at least nfn-f (Echo,m)(Echo,m) messages and since nf>n+f2n-f > \frac{n+f}{2}, it triggers the emission of nfn-f (Ready,m)(Ready,m) messages, which in turn will trigger delivery of the message mm by those processes.

Proof (Consistency): For a correct process pp to deliver message mm, it needs to receive 2f+12f+1 (Ready,m)(Ready,m) messages. A set Q1Q_1 of cardinality Q1=2f+1\vert Q_1\vert = 2f+1 is a Byzantine quorum and two such quorums intersect in at least one correct process because from Q1=Q2=2f+1\vert Q_1 \vert = \vert Q_2 \vert =2f+1, it follows that Q1Q2f+1\vert Q_1 \cap Q_2 \vert \geq f+1. Now consider another correct process qq which similarly received 2f+12f+1 (Ready,m)(Ready,m') messages for message mm'. Since the quorums of pp and qq intersect in a correct process, this correct process sent the same ReadyReady message to both pp and qq, hence m=mm = m'.

Proof (Totality): Again, to deliver a message mm, a correct process pp must have received 2f+12f+1 (Ready,m)(Ready,m) messages. This means that at least f+1f+1 of such ReadyReady messages came from correct processes meaning that these processes have already received more than n+f2\frac{n+f}{2} (Echo,m)(Echo,m) messages or f+1f+1 (Ready,m)(Ready, m) messages (as per Step 3 above). By the Consistency property, these processes did not send ReadyReady messages for a different message mm’ so it is not possible that message mm’ could have n+f2\frac{n+f}{2} (Echo,m)(Echo,m') messages nor f+1f+1 (Ready,m)(Ready,m’) messages.

We have shown that this simple reliable broadcast is correct!

Can we do better?

As we have seen throughout this post, the reliable broadcast primitive has quadratic message complexity. Thus, if the number of processes increases, the number of messages explodes and processes become overwhelmed. A process can only handle so much, so these algorithms are not scalable. Direct communication between participants incur very large communication cost. A natural way to reduce communication cost between processes is to have hierarchical communication, such that each process only communicates with a relatively small number of other processes (compared to the size of the system). The per-process load is reduced, at the cost of communication latency. For instance, if the processes are organized in a tree structure, the communication latency is logarithmic in the size of the system.

In Topos, the Transmission Control Engine (TCE) uses gossip to propagate messages, hence processes are organized in a tree to minimize communication overhead. Furthermore, the Topos Reliable Broadcast trades deterministic guarantees for probabilistic ones, while preserving good reliability guarantees, as they allow to further reduce the overhead which makes it particularly well suited for large systems. By making the protocol probabilistic, we need to revise the properties of the reliable broadcast.

We need weaker properties to account for this transition from a deterministic to a probabilistic algorithm. Given a failure probability ϵ\epsilon (e.g., ϵ<1012\epsilon < 10^{-12}), our three properties become:

  • Validity: If a correct process pp broadcasts a message mm, then every correct process eventually delivers mm with probability at least (1ϵ)(1 − \epsilon).
  • Consistency: If a correct process delivers a message mm and another correct process delivers a message mm’, then m=mm = m' with probability at least (1ϵ)(1 − \epsilon).
  • Totality: If a correct process delivers a message, then every correct process eventually delivers a message with probability at least (1ϵ)(1 − \epsilon).

Probabilistic broadcast has weaker guarantees than its deterministic counterparts but the failure probability ϵ\epsilon can be made arbitrarily small.

Furthermore, it allows the TCE to scale very easily as the per-process communication is only logarithmic and the total communication complexity is quasilinear in the size of the system!

In a future post, we will dive in the core details of the Topos Reliable Broadcast.


We highly recommend checking out Introduction to Reliable and Secure Distributed Programming!

Latest Articles


Cheetah: A STARK-friendly elliptic curve for fast native and in-circuit computations

For Topos’s trustless and privacy-preserving cross-subnet communication, Toposware requires subnets to prove the validity of their internal state transitions to the rest…

Brief introduction to the Topos Reliable Broadcast protocol

In this post, we will give a brief introduction to reliable broadcast and how we use it in Topos. The first thing to mention about reliable broadcast is that it can be…

ICE-FROST: Identifiable Cheating Entity FROST Signature Protocol

Cryptography is at the heart of any blockchain project. A good deal of their useful properties (immutability, security, public verifiability...) comes from cryptographic…

© Toposware,
2022 All Rights Reserved

Privacy Policy

General

HomeAboutCareers