The comparisons and notes written in collaboration with Tanel and GPT 5.3

This page compares four important approaches to agreement in distributed systems:

Phase King – a synchronous Byzantine agreement algorithm used primarily in theory
Paxos – a foundational consensus protocol with strong theoretical guarantees
Raft – a practical, leader-based consensus algorithm designed for understandability
ZooKeeper (Zab) – a distributed coordination service built on atomic broadcast

There is also

An introductory short comparison of blockchain algorithms with the ones above.
A final chapter: an overview of the compromise algorithm: PBFT (practical byzantine fault tolerance)

Use cases:

Phase King: not really used in practice.
Raft: embedded inside a system
Paxos: protocol family
ZooKeeper: external coordination service

NB! No consensus algorithm can guarantee termination in a fully asynchronous system.

All practical systems (Raft, Paxos, Zab) rely on partial synchrony and timeouts.

Sisukord

1 Comparison to blockchain consensus algorithms
2 Do Real Systems Use the Phase King Algorithm?
3 Partial Synchrony in Raft
- 3.1 Why Partial Synchrony Is Needed
- 3.2 How Raft Uses Partial Synchrony
4 Phase King vs Raft Consensus Algorithms
5 Raft vs Paxos: Use Cases, Assumptions, and Risks
6 ZooKeeper vs Raft and Paxos
7 How PBFT Differs from Phase King, Raft, Paxos, and ZooKeeper

Comparison to blockchain consensus algorithms

Blockchain consensus can be viewed as an extension of classical distributed consensus:

Adds Byzantine fault tolerance (as does phase king)
Removes trusted membership assumptions
Introduces incentives and cryptography
Often retains core ideas:
- Leaders / proposers
- Quorums
- Replicated logs

Raft prioritizes simplicity and performance in controlled environments
PBFT extends consensus to Byzantine settings with stronger guarantees but higher cost
Nakamoto consensus sacrifices efficiency for openness and decentralization

Aspect	Raft	PBFT (Practical Byzantine Fault Tolerance)	Nakamoto Consensus (Proof-of-Work)
Fault model	Crash faults	Byzantine faults	Byzantine + economic adversary
Network model	Partial synchrony	Partial synchrony	Fully asynchronous (probabilistic guarantees)
Participants	Fixed, known set	Fixed, known set	Open, permissionless
Identity model	Static identities	Static identities	Pseudonymous (Sybil-prone)
Sybil resistance	Not needed	Not needed	Required (PoW)
Fault tolerance	<math>f < n/2</math>	<math>f < n/3</math>	Depends on hash power (< 50% attacker)
Consensus type	Leader-based log replication	Leader-based voting protocol	Longest-chain rule
Leader role	Strong leader	Primary (leader) with view changes	Miner proposes blocks (implicit leader)
Communication pattern	Leader → followers	All-to-all voting (multiple phases)	Gossip-based block propagation
Message complexity	<math>O(n)</math> per entry	<math>O(n^2)</math> per decision	<math>O(n)</math> gossip (amortized)
Latency	1–2 RTTs	Multiple rounds (≥2–3 RTTs)	Minutes (block confirmation time)
Finality	Immediate (deterministic)	Immediate (deterministic)	Probabilistic (confidence over time)
Throughput	High	Moderate	Low
Energy usage	Low	Low	Very high (mining)
Incentives	None	None	Economic incentives (block rewards, fees)
Typical use	Databases, coordination systems	Permissioned blockchains	Public blockchains (e.g., Bitcoin)
Examples	etcd, Consul	Tendermint, Hyperledger Fabric	Bitcoin, early Ethereum

Note:

A Sybil attack is when one participant pretends to be many. Example:

One attacker creates 1000 nodes
System treats them as 1000 independent voters
Attacker now controls the system

Do Real Systems Use the Phase King Algorithm?

Short Answer

No — real-world distributed systems do not use the Phase King algorithm.

Why Phase King Is Not Used in Practice

1. Unrealistic Timing Assumptions

Requires a fully synchronous system:
- Known upper bounds on message delays
- Lock-step execution of rounds across all nodes
Real systems are typically asynchronous or only partially synchronous
Even small timing violations can break correctness

2. High Communication Cost

Message complexity: <math>O(n^2 f)</math>
Each round requires all-to-all communication
Does not scale well to larger systems

3. Strong (and Expensive) Fault Model

Designed to tolerate Byzantine faults:
- Nodes may lie, equivocate, or collude
Most real systems assume weaker fault models:
- Crash faults (nodes stop but do not misbehave)
Byzantine tolerance is used only when necessary due to cost

4. Lack of Practical Ecosystem

No widely adopted implementations
Not integrated with:
- Storage systems
- RPC frameworks
- Production infrastructure
Exists mainly in textbooks and theoretical research

What Real Systems Use Instead

Crash Fault-Tolerant Systems

Raft
Paxos / Multi-Paxos
Used in systems such as:
- etcd
- Consul
- ZooKeeper (Zab protocol, Paxos-like)

Byzantine Fault-Tolerant Systems

Practical Byzantine Fault Tolerance (PBFT)
Tendermint
HotStuff

These protocols:
- Work under partial synchrony
- Are optimized for real-world networks
- Avoid strict round-based synchrony like Phase King

Partial Synchrony in Raft

Raft assumes a partially synchronous system:

There is no known fixed upper bound on message delays
However, after some unknown point in time, the system becomes "well-behaved":
- Message delays become bounded
- Processing speeds stabilize
- Communication is reliable enough for progress

A partially synchronous system can be understood as having two phases:

Unstable period
- Messages may be delayed arbitrarily long
- Nodes may timeout incorrectly
- Leader elections may repeatedly fail

Stable period
- Message delays are bounded by some unknown value <math>\Delta</math>
- Timeouts begin to work reliably
- A leader can be elected and maintain authority

The transition point between these phases is unknown and not detectable by the algorithm.

Why Partial Synchrony Is Needed

In a fully asynchronous system:

There is no bound on message delay
It is impossible to distinguish between:
- A slow node
- A failed node

This leads to the FLP impossibility result:

No deterministic consensus algorithm can guarantee termination in a fully asynchronous system

Raft avoids this by assuming eventual synchrony.

How Raft Uses Partial Synchrony

Raft relies on timeouts for leader election:

Nodes start elections if they do not hear from a leader
Randomized timeouts reduce the chance of election conflicts

During the unstable period:

Multiple nodes may start elections simultaneously
Elections may fail repeatedly

During the stable period:

One node times out first
It becomes leader
Other nodes receive its messages in time and follow it

Phase King vs Raft Consensus Algorithms

Big Picture

Aspect	Phase King	Raft
Model	Synchronous	Partially synchronous / asynchronous
Faults	Byzantine (arbitrary/malicious)	Crash faults only
Goal	Theoretical agreement under worst-case faults	Practical replicated state machine
Typical Use	Theory / teaching Byzantine agreement	Real systems (e.g., etcd, Consul)

Assumptions

Phase King

Fully synchronous system
- Known upper bound on message delay
- Lock-step rounds
Handles Byzantine faults
- Nodes may lie, equivocate, or collude
Requires: <math>n \ge 3f + 1</math>

Raft

Partial synchrony
- No fixed delay bound, but eventually messages are timely
Handles crash faults only
- Nodes may stop but do not behave maliciously
Requires: <math>n \ge 2f + 1</math>

Guarantees

Phase King

Agreement: All correct nodes decide the same value
Validity: If all correct nodes start with the same value, that value is chosen
Termination: Always terminates in a fixed number of rounds

Raft

Safety (always):
- No two leaders commit different values at the same log index
- Logs remain consistent across nodes
Liveness (eventual):
- Progress occurs once the network stabilizes

Algorithm Structure

Phase King

Runs in <math>f + 1</math> rounds
Each round:
- Nodes broadcast their values
- A majority value is computed
- A designated king node enforces a decision if needed
The king resolves ambiguity caused by Byzantine nodes

Raft

Continuous protocol consisting of:
- Leader election
- Log replication
- Commit mechanism
Leader-based design:
- Leader sends log entries to followers
- Entries are committed once a majority acknowledges

Efficiency

Phase King

Rounds: <math>f + 1</math>
Messages per round: <math>O(n^2)</math>
Total messages: <math>O(n^2 f)</math>
High communication overhead
Requires strict synchrony
Not used in practice

Raft

Normal operation:
- <math>O(n)</math> messages per log entry
Leader election:
- <math>O(n)</math> messages
Latency:
- Typically 1–2 round-trip times (RTTs)
Efficient and scalable in practice

Key Differences

Fault Model

Phase King: tolerates Byzantine faults
Raft: tolerates crash faults only

Timing Model

Phase King: requires strict synchrony
Raft: assumes eventual synchrony

Practicality

Phase King: theoretical and pedagogical
Raft: widely used in production systems

Mechanism

Phase King: rotating "king" authority per round
Raft: stable leader-based replication

Complexity

Phase King: <math>O(n^2 f)</math>
Raft: <math>O(n)</math> per operation

Intuition

Phase King answers:
- "Can we reach agreement even if some nodes are actively malicious?"

Raft answers:
- "Can we build a reliable system from nodes that may crash?"

Raft vs Paxos: Use Cases, Assumptions, and Risks

Raft and Paxos are both consensus algorithms designed for crash fault-tolerant distributed systems. They provide similar theoretical guarantees but differ significantly in design philosophy, usability, and typical deployment scenarios.

Where assumptions differ

Raft

Strong leadership:
- A single leader manages log replication and client interaction

Paxos

No fixed leader in basic Paxos:
- Roles (proposer, acceptor, learner) are logically separate
- Leader optimization (Multi-Paxos) is commonly used in practice

Design Philosophy

Raft

Explicit goal: understandability
Decomposes consensus into:
- Leader election
- Log replication
- Safety rules
Strong, stable leader simplifies reasoning

Paxos

Minimal and abstract
Focuses on safety properties with minimal assumptions
More flexible but harder to understand and implement correctly

Risks and Failure Modes

Raft

Leader dependency:
- System performance depends on leader health
- Leader failure triggers re-election delays

Availability risks:
- Requires a majority of nodes to be available
- Network partitions can stall progress

Simplicity tradeoff:
- Less flexible for certain optimizations compared to Paxos variants

Paxos

Liveness challenges:
- Basic Paxos can experience repeated conflicts between proposers
- Requires leader optimization (Multi-Paxos) for efficiency

Operational difficulty:
- Harder to debug and reason about in production systems

Efficiency and Performance

Raft

Leader-based replication:
- <math>O(n)</math> messages per log entry
Typically commits in 1–2 round-trip times (RTTs)
Predictable performance under normal operation

Paxos

Basic Paxos:
- Multiple phases per decision
- Higher latency without optimization

Multi-Paxos (common in practice):
- Similar efficiency to Raft after leader is established
- <math>O(n)</math> messages per decision

Key Differences Summary

Aspect	Raft	Paxos
Design goal	Understandability	Minimality and flexibility
Leadership	Strong, explicit leader	Implicit or optional (explicit in Multi-Paxos)
Ease of implementation	Easier	Difficult
Typical use	Production systems, teaching	Research, highly optimized systems
Risk profile	Operational simplicity, leader bottleneck	High complexity, subtle correctness issues

Practical Insight

Raft is generally preferred for new systems due to its clarity and strong engineering model.
Paxos (especially Multi-Paxos) remains influential and is used in systems where fine-grained control and optimization are required.

ZooKeeper vs Raft and Paxos

Overview

Apache ZooKeeper is a distributed coordination service rather than a standalone consensus algorithm. Internally, it uses a consensus protocol called Zab (ZooKeeper Atomic Broadcast), which is similar in spirit to leader-based consensus algorithms such as Raft and Multi-Paxos.

What ZooKeeper Is

A distributed coordination service providing:
- Configuration management
- Naming/registry services
- Distributed synchronization (locks, leader election)
Exposes a filesystem-like hierarchical data model (znodes)
Clients interact with a replicated state maintained by a cluster

Underlying Algorithm

ZooKeeper uses Zab (ZooKeeper Atomic Broadcast)
Zab is:
- Leader-based
- Crash fault-tolerant (not Byzantine)
- Designed for high-throughput broadcast of updates

Assumptions

ZooKeeper (Zab)

Partial synchrony
Crash fault model
Majority quorum:
- <math>n \ge 2f + 1</math>
Single leader at a time

Comparison

Aspect	ZooKeeper (Zab)	Raft	Paxos
Fault model	Crash	Crash	Crash
Timing	Partial synchrony	Partial synchrony	Partial synchrony
Leader	Yes	Yes	Optional (explicit in Multi-Paxos)

Key intuition:

Zab ≈ a specialized form of Multi-Paxos optimized for ordered broadcast
Very similar operational model to Raft:
- One leader
- Followers replicate state
- Majority acknowledgment required

Where ZooKeeper Is Used

ZooKeeper is widely used as a coordination backbone in distributed systems:

Apache Kafka:
- Broker metadata and controller election (historically)
Apache Hadoop:
- Coordination of distributed components
Apache HBase:
- Master election and region management
Apache Solr / Elasticsearch (older versions):
- Cluster coordination

Typical use cases:

Leader election
Distributed locks
Service discovery
Configuration storage

Intuition

ZooKeeper:
- "Provide a shared coordination service for distributed systems."

Raft:
- "Provide an understandable consensus building block."

Paxos:
- "Provide a minimal, flexible foundation for consensus."

How PBFT Differs from Phase King, Raft, Paxos, and ZooKeeper

Practical Byzantine Fault Tolerance (PBFT) is a consensus algorithm designed to tolerate Byzantine (malicious) faults in partially synchronous systems. It bridges the gap between:

Theoretical Byzantine algorithms (e.g., Phase King)
Practical crash-fault systems (Raft, Paxos, ZooKeeper)

PBFT is widely used as a foundation for modern Byzantine fault-tolerant and permissioned blockchain systems.

Examples:

Tendermint
Hyperledger Fabric (variants)

Fault Model

Algorithm	Fault Model
Phase King	Byzantine faults
PBFT	Byzantine faults
Raft	Crash faults only
Paxos	Crash faults only
ZooKeeper (Zab)	Crash faults only

PBFT tolerates arbitrary (malicious) behavior
Requires <math>n \ge 3f + 1</math> nodes
Crash-fault algorithms only require <math>n \ge 2f + 1</math>

Timing Assumptions

Algorithm	Timing Model
Phase King	Fully synchronous
PBFT	Partial synchrony
Raft	Partial synchrony
Paxos	Partial synchrony
ZooKeeper	Partial synchrony

Algorithm Structure

PBFT

Leader-based (called the primary)
Each decision proceeds in multiple phases:
- Pre-prepare
- Prepare
- Commit
Requires agreement from <math>2f + 1</math> nodes

Phase king vs raft vs paxos vs zookeeper

Sisukord

Comparison to blockchain consensus algorithms

Do Real Systems Use the Phase King Algorithm?

Short Answer

Why Phase King Is Not Used in Practice

1. Unrealistic Timing Assumptions

2. High Communication Cost

3. Strong (and Expensive) Fault Model

4. Lack of Practical Ecosystem

What Real Systems Use Instead

Crash Fault-Tolerant Systems

Byzantine Fault-Tolerant Systems

Partial Synchrony in Raft

Why Partial Synchrony Is Needed

How Raft Uses Partial Synchrony

Phase King vs Raft Consensus Algorithms

Big Picture

Assumptions

Phase King

Raft

Guarantees

Phase King

Raft

Algorithm Structure

Phase King

Raft

Efficiency

Phase King

Raft

Key Differences

Fault Model

Timing Model

Practicality

Mechanism

Complexity

Intuition

Raft vs Paxos: Use Cases, Assumptions, and Risks

Where assumptions differ

Raft

Paxos

Design Philosophy

Raft

Paxos

Risks and Failure Modes

Raft

Paxos

Efficiency and Performance

Raft

Paxos

Key Differences Summary

Practical Insight

ZooKeeper vs Raft and Paxos

Overview

What ZooKeeper Is

Underlying Algorithm

Assumptions

ZooKeeper (Zab)

Comparison

Where ZooKeeper Is Used

Intuition

How PBFT Differs from Phase King, Raft, Paxos, and ZooKeeper

Fault Model

Timing Assumptions

Algorithm Structure

PBFT

Navigeerimismenüü

Otsing