Skip to content

Clustering

This document outlines the design for FlowMQ's clustering capabilities. The architecture is built on the modern cloud-native principle of separating stateless compute from stateful storage, which provides superior scalability, fault tolerance, and operational simplicity compared to traditional stateful broker designs.


Goals

  • High Availability: The system must remain fully operational for both reads and writes during broker failures.
  • Fault Tolerance & Durability: No single component failure should result in data loss. All committed writes must be durable.
  • Scalability: The system must scale horizontally by adding more brokers to handle increased client connections and throughput, and the storage layer must scale to handle increased data volume.
  • Operational Simplicity: Broker failures should not require complex recovery procedures. Replacing a failed broker should be a trivial operation.

Stateless Brokers

The fundamental design choice in FlowMQ is that the brokers themselves are stateless. They do not store any message data or consumer state on their local disks. Instead, they act as a highly-scalable "gateway" fleet, responsible for protocol translation, authentication, and routing requests to the appropriate stateful backend layers.

This architecture consists of three distinct layers:


System Components

Broker Layer (Stateless)

This layer consists of a pool of identical, stateless broker nodes.

  • Role: They handle all client connections, perform authentication/authorization, translate protocols (e.g., AMQP to FlowMQ's internal representation), and orchestrate the reads and writes to the backend layers.
  • Scalability: This layer can be scaled up or down instantly by adding or removing broker instances in response to load.
  • Fault Tolerance: If a broker fails, it is simply removed from the pool. Clients will transparently reconnect to another healthy broker. Since no state is stored on the broker, its failure has no impact on data durability or system availability.

Metadata Layer

This is the "brain" of the cluster and the source of truth for all system state.

  • Role: It stores all configuration (topics, bindings), consumer state (queue message lists, stream consumer offsets), and pointers to where message data is located in the Data Storage Layer.
  • Implementation: This layer is a highly-available, consistent key-value store or database. It could be built on a consensus system like etcd or a managed, distributed database like FoundationDB or CockroachDB.
  • Consistency: All operations that change state (e.g., consuming a message from a queue) are performed as atomic transactions in this layer to ensure consistency.

Data Storage Layer

This layer is responsible for the durable, long-term storage of message payloads.

  • Role: It provides an append-only, replicated log storage service. When a broker receives a message, it writes the payload to this layer and receives a durable pointer (e.g., a segment ID and offset) in return.
  • Implementation: This is typically a distributed log store like Apache BookKeeper or a cloud-native object store like AWS S3 that is optimized for high-throughput, sequential writes.
  • Durability: Replication and data durability are entirely handled within this layer (e.g., BookKeeper's ensemble replication or S3's inherent multi-AZ durability), completely offloading the responsibility from the brokers.

Message Flow Example (Queue)

  1. A Producer sends a message for topic orders/new to any broker.
  2. The Broker authenticates the client. It queries the Metadata Layer to find which queues are bound to orders/new.
  3. The Broker writes the message payload to the Data Storage Layer, which returns a durable pointer (e.g., ptr-123).
  4. The Broker executes a transaction in the Metadata Layer to append ptr-123 to the list of messages for the target queue (e.g., order-processing-queue).
  5. The Broker sends an ack to the Producer.
  6. Later, a Consumer connects to another Broker and asks for a message from order-processing-queue.
  7. That Broker executes an atomic "lease" transaction in the Metadata Layer to get ptr-123 from the queue and mark it as invisible to other consumers.
  8. The Broker uses ptr-123 to retrieve the payload from the Data Storage Layer and sends it to the Consumer.
  9. The Consumer processes the message and sends an ack back.
  10. The Broker executes a final transaction in the Metadata Layer to permanently delete ptr-123 from the queue's list.

High Availability Summary

  • Broker Failure: Trivial. The failed node is removed from the load balancer. Clients reconnect to healthy nodes. No data loss. No failover process.
  • Metadata Layer Failure: Handled by the underlying consensus or database replication of that layer.
  • Data Storage Layer Failure: Handled by the internal replication mechanism of the distributed log/object store.

This architecture moves all the complexity of state management and replication into specialized backend systems, allowing the broker fleet to remain simple, stateless, and easy to scale.