Operations

Once a Connector is deployed, keeping it running smoothly requires occasional maintenance. This guide covers day-to-day operational tasks: upgrades, health monitoring, and what to do when things go wrong.

Upgrading the Connector

The Connector is distributed as a Docker image and follows semantic versioning (e.g., 1.35.0). Updates are applied by pulling the latest image and restarting the container; there is no in-place auto-update mechanism.

Configuration changes (policies, resources, listeners, users, etc.) are pushed from the Control Plane to the Connector in real time: they usually don’t require a restart or an upgrade.

Version Checking

Before upgrading, it’s useful to know what version is currently running. The running version can be checked through:

OTLP Metrics: the service.version attribute on all emitted metrics
Connector Logs: the version is logged at startup
Console: the “Connected Instances” section on the Connector page shows the version

Upgrade Procedure

AWS ECS Fargate
Kubernetes
Docker

Update the Connector image tag in your ECS task definition
Deploy the new task definition
ECS performs a rolling update: it starts new tasks with the new image, waits for them to pass health checks, then drains and stops old tasks
Verify the new version is running via health metrics

Update the image tag in your Helm values or Deployment manifest
Apply the change (helm upgrade or kubectl apply)
Kubernetes performs a rolling update: new pods start, pass readiness checks, then old pods are terminated
Verify the new version via health metrics

# Example: Helm values
image:
  tag: "1.35.0"  # Update to new version

Pull the new image
Stop the running container
Start a new container with the same configuration

export FORMAL_CONNECTOR_IMAGE="654654333078.dkr.ecr.us-east-1.amazonaws.com/formalco-prod-connector:1.35.0"
# Or use GCP Artifact Registry:
# export FORMAL_CONNECTOR_IMAGE="us-docker.pkg.dev/formal-public-assets/formalco-prod-connector/formalco-prod-connector:1.35.0"

docker pull "$FORMAL_CONNECTOR_IMAGE"
docker stop formal-connector
docker run -d --name formal-connector \
  -e FORMAL_CONTROL_PLANE_API_KEY=your-api-token \
  "$FORMAL_CONNECTOR_IMAGE"

Standalone Docker deployments have downtime during the restart. Use ECS or Kubernetes for zero-downtime upgrades.

Lifecycle

The Connector is designed for zero-downtime deployments.

Health Checks

The Connector exposes two HTTP health check endpoints on port 8080:

Endpoint	Purpose	Success	Failure
`GET /health`	Liveness probe: is the process running?	`200 OK`	No response
`GET /ready`	Readiness probe: is initialization complete?	`200 OK`	`503 Service Unavailable`

The readiness endpoint returns 503 until the Connector has connected to the Control Plane, loaded its configuration, and started all listeners.

The Quickstart deployment options come with probes already configured to make the best use of these health check endpoints.

Graceful Shutdown

When the Connector receives a SIGTERM or SIGINT signal (e.g., during a rolling update or manual stop), it performs a graceful shutdown:

Stops accepting new connections on all listeners
Leaves the cluster gracefully, notifying other Connector instances
Stops the telemetry exporter
Closes the cache and cleans up resources
Shuts down the health check server

Use multiple Connector instances behind a load balancer to ensure clients can reconnect to a healthy instance during rolling updates.

Disaster Recovery

The Connector is designed to be mostly stateless, which makes disaster recovery straightforward. All configuration (policies, resources, users, etc.) is stored in the Control Plane and pushed to the Connector on startup and in real time, so any lost instance can simply be replaced. The only local data that can be lost is:

Data	Location	Impact
Log spool	`/formal/logs`	Buffered logs are lost if the disk isn’t persistent
Cluster state	In-memory	Rate limit counters and cache reset; rebuilds via gossip on rejoin

When running multiple instances with clustering enabled, cluster state is shared and rebuilds automatically on rejoin.

Recovery Scenarios

Single instance failure

Scenario: A Connector instance stops unexpectedly.Impact: Clients connected to the failed instance are disconnected.Recovery:

The orchestrator (ECS/Kubernetes) automatically starts a replacement instance
The new instance connects to the Control Plane and loads configuration
Clients reconnect via the load balancer to a healthy instance
If clustering is enabled, the new instance rejoins the cluster and receives state replay

Data loss: In-memory state (rate limit counters, cache) is lost unless clustering is enabled. Buffered logs on persistent disk are recovered on restart.

All instances fail

Scenario: All Connector instances stop unexpectedly.Impact: All client connections are dropped.Recovery:

The orchestrator starts new instances
Each instance connects to the Control Plane and loads full configuration
Instances discover each other and form a new cluster
Clients reconnect via the load balancer

Data loss: All in-memory state (rate limit counters, cache) is lost. Buffered logs on disk spool are recovered on restart.

Control Plane outage

Scenario: The Formal Control Plane becomes unavailable.Impact: The Connector continues operating with its last known configuration. Configuration updates are not received, logs are spooled to disk, and new instances cannot start.Recovery:

The Connector automatically reconnects when the Control Plane is restored
Pending configuration updates are received
Buffered logs are flushed

Data loss: None, if a persistent disk is available for the log spool.

Network partition: Connectors cluster

Scenario: Network issues prevent some Connector instances from communicating with each other.Impact: If clustering is enabled, each partition continues operating independently with its own local state; rate limiting and cache sharing only works within each partition.Recovery:

Nodes rejoin when the partition heals
State is reconciled via event replay
Configuration and policies are not affected

Data loss: None.

Network partition: Control Plane

Scenario: Network issues prevent the Connector from reaching the Control Plane.Impact: The Connector continues operating with its last known configuration. Configuration updates are not received, logs are spooled to disk, and new instances cannot start.Recovery:

The Connector automatically reconnects when network access is restored
Pending configuration updates are received
Buffered logs are flushed

Data loss: None, if a persistent disk is available for the log spool.

Best Practices for Resilience

Run multiple instances

Deploy at least 2 Connector instances behind a load balancer for high availability. Distribute across availability zones for fault tolerance.

Configure probes

Set up probes to ensure traffic is only routed to healthy instances. See Health Checks for details.

Configure disk spool

Mount a persistent volume to /formal/logs to buffer logs during outages. A typical log entry is ~1 KB — at 100 requests/sec, 10 GB provides over a day of buffer. See Log Spool for details.

Monitor Connector health

Set up alerts on Connector metrics: heartbeat gaps, memory usage spikes, connection count anomalies, and missed Control Plane pings.

Use rolling updates

Always use rolling update strategies (the default in ECS and Kubernetes) to avoid downtime during upgrades. Ensure readiness probes are configured so traffic isn’t routed to instances that haven’t finished starting.

Getting Started

Configuring Formal

Leveraging Formal

Integrations

Configuration

Client Apps

Upgrading the Connector

Version Checking

Upgrade Procedure

Lifecycle

Health Checks

Graceful Shutdown

Disaster Recovery

Recovery Scenarios

Best Practices for Resilience

Getting Started

Configuring Formal

Leveraging Formal

Integrations

Configuration

Client Apps

​Upgrading the Connector

​Version Checking

​Upgrade Procedure

​Lifecycle

​Health Checks

​Graceful Shutdown

​Disaster Recovery

​Recovery Scenarios

​Best Practices for Resilience

Upgrading the Connector

Version Checking

Upgrade Procedure

Lifecycle

Health Checks

Graceful Shutdown

Disaster Recovery

Recovery Scenarios

Best Practices for Resilience