Skip to content

Leader Election

TinySystems modules support horizontal scaling through Kubernetes-based leader election. Understanding leader election is essential for building scalable components.

Why Leader Election?

When running multiple replicas of a module:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    PROBLEM: MULTIPLE REPLICAS                                │
└─────────────────────────────────────────────────────────────────────────────┘

Without leader election:

   Pod A                    Pod B                    Pod C
     │                        │                        │
     │ Update TinyNode ───────│────────────────────────│
     │                        │ Update TinyNode ───────│
     │                        │                        │ Update TinyNode
     │                        │                        │
     ▼                        ▼                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          CONFLICT!                                           │
│   All pods try to update the same CRs                                       │
│   Race conditions, lost updates, inconsistent state                         │
└─────────────────────────────────────────────────────────────────────────────┘

With leader election:

   Pod A (LEADER)           Pod B (READER)           Pod C (READER)
     │                        │                        │
     │ Update TinyNode        │ Watch only             │ Watch only
     │ Process signals        │ Handle messages        │ Handle messages
     │                        │                        │
     ▼                        ▼                        ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          CONSISTENT                                          │
│   Only leader writes to CRs                                                 │
│   All pods handle incoming messages                                         │
└─────────────────────────────────────────────────────────────────────────────┘

Kubernetes Lease-Based Election

TinySystems uses Kubernetes Leases for leader election:

go
// cli/run.go
func setupLeaderElection(ctx context.Context, namespace, moduleName, podName string) (*atomic.Bool, error) {
    isLeader := &atomic.Bool{}

    // Create lease lock
    lock, err := resourcelock.New(
        resourcelock.LeasesResourceLock,
        namespace,
        fmt.Sprintf("%s-lock", utils.SanitizeResourceName(moduleName)),
        nil,
        coreClient.CoordinationV1(),
        resourcelock.ResourceLockConfig{
            Identity: utils.SanitizeResourceName(podName),
        },
    )
    if err != nil {
        return nil, err
    }

    // Start leader election
    go leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
        Lock:            lock,
        LeaseDuration:   15 * time.Second,
        RenewDeadline:   10 * time.Second,
        RetryPeriod:     2 * time.Second,
        Callbacks: leaderelection.LeaderCallbacks{
            OnStartedLeading: func(ctx context.Context) {
                log.Info("became leader")
                isLeader.Store(true)
            },
            OnStoppedLeading: func() {
                log.Info("stopped leading")
                isLeader.Store(false)
            },
            OnNewLeader: func(identity string) {
                log.Info("new leader elected", "leader", identity)
            },
        },
    })

    return isLeader, nil
}

The Lease Resource

yaml
apiVersion: coordination.k8s.io/v1
kind: Lease
metadata:
  name: common-module-v1-lock
  namespace: tinysystems
spec:
  holderIdentity: common-module-pod-abc123
  leaseDurationSeconds: 15
  acquireTime: "2024-01-15T10:30:00Z"
  renewTime: "2024-01-15T10:30:10Z"
  leaderTransitions: 5

Checking Leadership

Components check leadership via context:

go
import "github.com/tiny-systems/module/pkg/utils"

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == v1alpha1.ControlPort {
        // Only leader should process control actions
        if !utils.IsLeader(ctx) {
            return nil  // Ignore on non-leader pods
        }

        // Leader-only logic
        c.startOperation()
    }
    return nil
}

Leader Responsibilities

Only the leader pod should:

ActionWhy Leader Only
Update TinyModule statusAvoid conflicting updates
Update TinyNode statusSingle source of truth
Process TinySignal CRsPrevent duplicate execution
Expose ports to IngressSingle ingress configuration
Write to shared metadataConsistent state

Reader Responsibilities

All pods (including leader) should:

ActionWhy All Pods
Watch CRs for changesStay in sync
Handle incoming messagesLoad distribution
Apply local reconciliationMaintain state
Run gRPC serverAccept cross-module calls

Leader Election Flow

┌─────────────────────────────────────────────────────────────────────────────┐
│                       LEADER ELECTION FLOW                                   │
└─────────────────────────────────────────────────────────────────────────────┘

1. STARTUP
   ┌────────────────────────────────────────────────────────────────────────┐
   │  All pods try to acquire the Lease                                     │
   │  Only one succeeds (becomes leader)                                    │
   │  Others become readers                                                  │
   └────────────────────────────────────────────────────────────────────────┘


2. LEADER ACTIVE
   ┌────────────────────────────────────────────────────────────────────────┐
   │  Leader renews lease every 10 seconds                                  │
   │  Leader updates CRs and processes signals                              │
   │  Readers watch and handle messages                                     │
   └────────────────────────────────────────────────────────────────────────┘


3. LEADER FAILURE
   ┌────────────────────────────────────────────────────────────────────────┐
   │  Leader pod dies or network partition                                  │
   │  Lease expires after 15 seconds                                        │
   └────────────────────────────────────────────────────────────────────────┘


4. NEW ELECTION
   ┌────────────────────────────────────────────────────────────────────────┐
   │  Remaining pods compete for lease                                      │
   │  One becomes new leader                                                │
   │  System continues operating                                            │
   └────────────────────────────────────────────────────────────────────────┘

Failover Timing

Leader dies

     │ ◀─── Up to 15 seconds (lease duration)


Lease expires

     │ ◀─── Up to 2 seconds (retry period)


New leader elected

     │ ◀─── Immediate


System operational

Total failover time: ~17 seconds worst case

Using IsLeader in Components

Ticker Component Example

go
func (t *Ticker) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == v1alpha1.ControlPort {
        // Only leader starts the ticker
        if !utils.IsLeader(ctx) {
            return nil
        }

        control := msg.(Control)
        if control.Start {
            go t.startEmitting(ctx, output)
        } else if control.Stop {
            t.stopEmitting()
        }
    }
    return nil
}

HTTP Server Example

go
func (s *Server) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == v1alpha1.ReconcilePort {
        node := msg.(v1alpha1.TinyNode)

        // Read port from metadata (all pods)
        port := node.Status.Metadata["http-server-port"]

        if utils.IsLeader(ctx) && port == "" {
            // Leader starts server and publishes port
            actualPort := s.startServer()
            output(ctx, v1alpha1.ReconcilePort, func(n *v1alpha1.TinyNode) {
                n.Status.Metadata["http-server-port"] = strconv.Itoa(actualPort)
            })
        } else if port != "" {
            // All pods use the published port
            s.startOnPort(port)
        }
    }
    return nil
}

Controller-Level Leadership

Controllers also check leadership:

go
func (r *TinyNodeReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    // All pods reconcile locally
    r.Scheduler.Update(ctx, node)

    // Only leader updates status
    if !r.IsLeader.Load() {
        return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
    }

    // Leader-only: update status
    r.Status().Update(ctx, node)
    return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil
}

Testing Leadership

For local development with a single replica:

go
// Local development: always leader
if os.Getenv("FORCE_LEADER") == "true" {
    isLeader.Store(true)
    return isLeader, nil
}

Best Practices

1. Don't Assume Leadership

go
// Bad: Assumes will always be leader
func (c *Component) Handle(...) {
    c.updateClusterState()  // May not be leader!
}

// Good: Check leadership
func (c *Component) Handle(ctx context.Context, ...) {
    if utils.IsLeader(ctx) {
        c.updateClusterState()
    }
}

2. Handle Leadership Changes

go
type Component struct {
    cancelFunc context.CancelFunc
    mu         sync.Mutex
}

func (c *Component) Handle(ctx context.Context, ...) {
    if port == v1alpha1.ReconcilePort {
        c.mu.Lock()
        defer c.mu.Unlock()

        if utils.IsLeader(ctx) && c.cancelFunc == nil {
            // Just became leader
            ctx, c.cancelFunc = context.WithCancel(ctx)
            go c.startLeaderOnlyWork(ctx)
        } else if !utils.IsLeader(ctx) && c.cancelFunc != nil {
            // Lost leadership
            c.cancelFunc()
            c.cancelFunc = nil
        }
    }
}

3. Idempotent Leader Operations

go
func (c *Component) Handle(ctx context.Context, output module.Handler, ...) {
    if utils.IsLeader(ctx) {
        // Idempotent: safe to call multiple times
        output(ctx, v1alpha1.ReconcilePort, func(n *v1alpha1.TinyNode) {
            if n.Status.Metadata["initialized"] != "true" {
                n.Status.Metadata["initialized"] = "true"
                // Do initialization...
            }
        })
    }
}

Next Steps

Build flow-based applications on Kubernetes