The Practical Byzantine Fault Tolerance (PBFT) protocol is a state machine replication consensus protocol that provides fault tolerance under Byzantine failures.

In concept, it is similar to how Raft works.

  • It uses replicas, assuming replicas may be faulty.
    • Raft uses replicas assuming replicas may crash.
  • It uses a quorum of replicas for consensus. This is because in replicas, then a simple majority ( is not sufficient: it may be driven by faulty nodes). ensures that we have a majority of correct nodes agreeing as well.
    • Raft uses a quorum of replicas for consensus.
  • It must also handle malicious nodes. Raft must handle crashed/delayed nodes or lost/delayed messages.

Specification

Views

PBFT divides execution into views. These are functionally similar to terms in Raft and help manage leadership and synchronisation in the system. This is done via a round-robin protocol, where the primary is (: replica, : view number, : node number). Elections don’t work here because nodes can collude to deny service.

Construction

PBFT assumes a particular attack model, which specifies what it assumes attackers can and cannot do.

  • An attacker can:
    • Run arbitrary code on a faulty node.
    • Control all faulty nodes (and know their cryptographic keys, i.e., with secure channels).
    • Can read any message and temporarily delay any message.
  • An attacker cannot:
    • Control more than nodes. This requires nodes to have different implementations do they don’t have the same bugs/vulnerabilities.
    • Impersonate correct nodes, i.e., guess cryptographic keys of correct nodes or break signed messages.

Now, we construct the network characteristics:

  • One or more clients.
  • servers (replicas). of replicas can be faulty. All nodes have public-key pairs and know identities of other nodes. All nodes also use digital signatures to sign/authenticate messages.
  • Basic protocol: client sends a request to invoke an operation. Replicas execute the operation and send a reply to the client with the result. Client then waits for the result from the replicas.

Core protocol:

  • Client sends a request to invoke an operation to all replicas.
    • Recall: for SMR, requests must be deterministic.
  • Each replica will execute the operation and send a reply to the client with the result.
  • Client will wait for a result from replicas, i.e., a supermajority of replicas. This ensures that there’s some overlap in the quorums: in the worst case we might get all faulty nodes and still a majority of the correct nodes.
    • It can’t be all replicas — because this prevents any liveness if one doesn’t reply.

What about concurrent requests? We use a primary replica to pick an order. But since the primary itself can be faulty:

  • It can ignore a client request, so the client may need to send requests to all replicas.
  • It can send requests to different replicas in a different order, so replicas need to communicate with each other to ensure they received the same request in the same order.
  • Send an incorrect result to the client, so replicas need to directly send the result to the client.

Limitations

PBFT has a few key faults:

  • It requires independent node implementations, so that node failures are also independent.

    • Consider the case where a logical bug causes a segfault given a certain input sequence. Then all nodes will segfault, i.e., the failure isn’t node-independent.
  • The identity and number of replicas must be known at start-time, typically assigned by some central authority.

    • In other words, membership in the network is closed. Bitcoin is an example of a BFT protocol that has an open membership.
  • view change solution

    • quite similar to 2pc
    • “replica should execute an operation” — double check numbers here
  • pbft protocol

    • blue text = new additions
    • commit step — guarantees op execution
    • F + 1 == at least one correct node
    • leaders chosen deterministically — so the correctness premise is slightly different from raft
    • summary
      • note what we do in raft: only leader sends message to backups, backups trust leader
      • here, we don’t trust leader, so backups must talk to each other
    • latency — faster than 2pc. 2 RTT because 4 one-way reqs
    • PBFT in practice — authors impl an NFS server — in practice, NFS is I/O bound, and only really uses CPU for cryptographic operations — so CPU is pretty free. perf not bad compared to non crypto + non FT protocols
    • quorum — supermajority