Configuration
Discovery
To enable clustering, the Connectors need to be able to discover each other. Clustering is supported on AWS ECS and Kubernetes.- AWS ECS Fargate (Terraform)
- Kubernetes (Helm)
For AWS ECS deployments, configure IAM permissions to allow Connector instances to list and describe ECS tasks. This enables them to discover other instances in the same service:See our AWS Terraform example for a complete configuration.
Network
Connectors communicate with each other over the network on ports 7946.- AWS ECS Fargate (Terraform)
- Kubernetes (Helm)
Configure the security group to allow inter-instance communication on the cluster ports:See our AWS Terraform example for a complete configuration.
Example
Here’s an example policy that uses the cluster’s shared state to enforce a rate limit on S3 bucket access:sensitive-bucket if it has been accessed more than five times in the last minute by the user, across all Connector instances.
How It Works
Instances discover each other via service discovery (e.g. ECS task listing, Kubernetes pod listing) and communicate over port 7946 (TCP and UDP) using a gossip protocol. Each node maintains a membership list and shared state such as rate limit counters and cache entries.Node Failure and Recovery
When a cluster member becomes unresponsive:- Other nodes detect the failure through missed gossip heartbeats
- The node is marked as failed in the membership list
- After 1 minute, the failed node is removed from the cluster state
- Remaining healthy nodes continue operating with shared state preserved
- The instance discovers existing members via service discovery
- It sends a join request to known peers
- A peer sends recent state updates from its event buffer (up to 10,000)
- The node can resume normal operation with up-to-date shared state
Consistency Model
The cluster uses eventual consistency via gossip. State updates are broadcast to all nodes and ordered using a Lamport Clock for causal ordering. There is a brief propagation delay before all nodes have the same state. The Control Plane remains the source of truth for all configuration (policies, resources, users).Rate limiting counters may have slight inaccuracies during the gossip propagation window. For strict rate limiting, this means a user might get a few extra requests through before the limit is enforced across all nodes.