The Practical Byzantine Fault Tolerance (PBFT) protocol is a state machine replication consensus protocol that provides fault tolerance under Byzantine failures.
In concept, it is similar to how Raft works.
- It uses replicas, assuming replicas may be faulty.
- Raft uses replicas assuming replicas may crash.
- It uses a quorum of replicas for consensus. This is because in replicas, then a simple majority ( is not sufficient: it may be driven by faulty nodes). ensures that we have a majority of correct nodes agreeing as well.
- Raft uses a quorum of replicas for consensus.
- It must also handle malicious nodes. Raft must handle crashed/delayed nodes or lost/delayed messages.
Specification
Views
PBFT divides execution into views. These are functionally similar to terms in Raft and help manage leadership and synchronisation in the system. This is done via a round-robin protocol, where the primary is (: replica, : view number, : node number). Elections don’t work here because nodes can collude to deny service.
Construction
PBFT assumes a particular attack model, which specifies what it assumes attackers can and cannot do.
- An attacker can:
- Run arbitrary code on a faulty node.
- Control all faulty nodes (and know their cryptographic keys, i.e., with secure channels).
- Can read any message and temporarily delay any message.
- An attacker cannot:
- Control more than nodes. This requires nodes to have different implementations do they don’t have the same bugs/vulnerabilities.
- Impersonate correct nodes, i.e., guess cryptographic keys of correct nodes or break signed messages.
Now, we construct the network characteristics:
- One or more clients.
- servers (replicas). of replicas can be faulty. All nodes have public-key pairs and know identities of other nodes. All nodes also use digital signatures to sign/authenticate messages.
- Basic protocol: client sends a request to invoke an operation. Replicas execute the operation and send a reply to the client with the result. Client then waits for the result from the replicas.
Core protocol:
- Client sends a request to invoke an operation to all replicas.
- Recall: for SMR, requests must be deterministic.
- Each replica will execute the operation and send a reply to the client with the result.
- Client will wait for a result from replicas, i.e., a supermajority of replicas. This ensures that there’s some overlap in the quorums: in the worst case we might get all faulty nodes and still a majority of the correct nodes.
- It can’t be all replicas — because this prevents any liveness if one doesn’t reply.
What about concurrent requests? We use a primary replica to pick an order. But since the primary itself can be faulty:
- It can ignore a client request, so the client may need to send requests to all replicas.
- It can send requests to different replicas in a different order, so replicas need to communicate with each other to ensure they received the same request in the same order.
- Send an incorrect result to the client, so replicas need to directly send the result to the client.
Limitations
PBFT has a few key faults:
-
It requires independent node implementations, so that node failures are also independent.
- Consider the case where a logical bug causes a segfault given a certain input sequence. Then all nodes will segfault, i.e., the failure isn’t node-independent.
-
The identity and number of replicas must be known at start-time, typically assigned by some central authority.
- In other words, membership in the network is closed. Bitcoin is an example of a BFT protocol that has an open membership.
-
view change solution
- quite similar to 2pc
- “replica should execute an operation” — double check numbers here
-
pbft protocol
- blue text = new additions
- commit step — guarantees op execution
- F + 1 == at least one correct node
- leaders chosen deterministically — so the correctness premise is slightly different from raft
- summary
- note what we do in raft: only leader sends message to backups, backups trust leader
- here, we don’t trust leader, so backups must talk to each other
- latency — faster than 2pc. 2 RTT because 4 one-way reqs
- PBFT in practice — authors impl an NFS server — in practice, NFS is I/O bound, and only really uses CPU for cryptographic operations — so CPU is pretty free. perf not bad compared to non crypto + non FT protocols
- quorum — supermajority