PBFT

The Practical Byzantine Fault Tolerance (PBFT) protocol is a state machine replication consensus protocol that provides fault tolerance under Byzantine failures.

In concept, it is similar to how Raft works.

It uses $3 F + 1$ replicas, assuming $F$ replicas may be faulty.
- Raft uses $2 F + 1$ replicas assuming $F$ replicas may crash.
It uses a quorum of $2 F + 1$ replicas for consensus. This is because in $3 F + 1$ replicas, then a simple majority ( $1.5 F$ is not sufficient: it may be driven by faulty nodes). $2 F + 1$ ensures that we have a majority of correct nodes agreeing as well.
- Raft uses a quorum of $F + 1$ replicas for consensus.
It must also handle malicious nodes. Raft must handle crashed/delayed nodes or lost/delayed messages.

Specification

Views

PBFT divides execution into views. These are functionally similar to terms in Raft and help manage leadership and synchronisation in the system. This is done via a round-robin protocol, where the primary is $r = v mod n$ ( $r$ : replica, $v$ : view number, $n$ : node number). Elections don’t work here because nodes can collude to deny service.

Construction

PBFT assumes a particular attack model, which specifies what it assumes attackers can and cannot do.

An attacker can:
- Run arbitrary code on a faulty node.
- Control all $F$ faulty nodes (and know their cryptographic keys, i.e., with secure channels).
- Can read any message and temporarily delay any message.
An attacker cannot:
- Control more than $F$ nodes. This requires nodes to have different implementations do they don’t have the same bugs/vulnerabilities.
- Impersonate correct nodes, i.e., guess cryptographic keys of correct nodes or break signed messages.

Now, we construct the network characteristics:

One or more clients.
$n$ servers (replicas). $F$ of $n$ replicas can be faulty. All nodes have public-key pairs and know identities of other nodes. All nodes also use digital signatures to sign/authenticate messages.
Basic protocol: client sends a request to invoke an operation. Replicas execute the operation and send a reply to the client with the result. Client then waits for the result from the replicas.

Core protocol:

Client sends a request to invoke an operation to all replicas.
- Recall: for SMR, requests must be deterministic.
Each replica will execute the operation and send a reply to the client with the result.
Client will wait for a result from $n = 2 F + 1$ replicas, i.e., a supermajority of replicas. This ensures that there’s some overlap in the quorums: in the worst case we might get all faulty nodes and still a majority of the correct nodes.
- It can’t be all replicas — because this prevents any liveness if one doesn’t reply.

What about concurrent requests? We use a primary replica to pick an order. But since the primary itself can be faulty:

It can ignore a client request, so the client may need to send requests to all replicas.
It can send requests to different replicas in a different order, so replicas need to communicate with each other to ensure they received the same request in the same order.
Send an incorrect result to the client, so replicas need to directly send the result to the client.

Limitations

PBFT has a few key faults:

It requires independent node implementations, so that node failures are also independent.
- Consider the case where a logical bug causes a segfault given a certain input sequence. Then all nodes will segfault, i.e., the failure isn’t node-independent.
The identity and number of replicas must be known at start-time, typically assigned by some central authority.
- In other words, membership in the network is closed. Bitcoin is an example of a BFT protocol that has an open membership.
view change solution
- quite similar to 2pc
- “replica should execute an operation” — double check numbers here
pbft protocol
- blue text = new additions
- commit step — guarantees op execution
- F + 1 == at least one correct node
- leaders chosen deterministically — so the correctness premise is slightly different from raft
- summary
  - note what we do in raft: only leader sends message to backups, backups trust leader
  - here, we don’t trust leader, so backups must talk to each other
- latency — faster than 2pc. 2 RTT because 4 one-way reqs
- PBFT in practice — authors impl an NFS server — in practice, NFS is I/O bound, and only really uses CPU for cryptographic operations — so CPU is pretty free. perf not bad compared to non crypto + non FT protocols
- quorum — supermajority

jszhn

Recent Notes

ALOHA

ARP

American literature

Assert

Atomics

PBFT

Specification

Views

Construction

Limitations

Graph View

Backlinks