Horizontal Scaling

This guide covers best practices for building TinySystems components that scale horizontally across multiple pod replicas.

Scaling Overview

+-----------------------------------------------------------------------------+
|                      HORIZONTALLY SCALED MODULE                              |
+-----------------------------------------------------------------------------+

                    Kubernetes Deployment
                    replicas: 3
                           |
         +-----------------+-----------------+
         |                 |                 |
         v                 v                 v
    +---------+       +---------+       +---------+
    |  Pod 1  |       |  Pod 2  |       |  Pod 3  |
    | (Leader)|       |(Reader) |       |(Reader) |
    +----+----+       +----+----+       +----+----+
         |                 |                 |
         +-----------------+-----------------+
                           |
                    Kubernetes Service
                    (Load Balancing)
                           |
                    +------+------+
                    |             |
              gRPC Traffic    HTTP Traffic
              (from other     (from Ingress)
               modules)

Design Principles

1. Stateless by Default

Components should be stateless - all state comes from:

Incoming messages
Settings (via settings port)
TinyNode metadata (shared state)

// Good: Stateless processing
func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    input := msg.(Input)
    result := transform(input, c.settings)  // Only uses input + settings
    return output(ctx, "output", result)
}

// Bad: Local state
func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    c.cache[input.ID] = input  // This won't be shared across pods!
    return nil
}

2. CR-Based Shared State

For state that must be shared:

// Use TinyNode metadata
func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == v1alpha1.ReconcilePort {
        node := msg.(v1alpha1.TinyNode)

        // Read shared state (all pods)
        c.sharedConfig = node.Status.Metadata["config"]

        // Write shared state (leader only)
        if utils.IsLeader(ctx) {
            output(ctx, v1alpha1.ReconcilePort, func(n *v1alpha1.TinyNode) {
                n.Status.Metadata["last-update"] = time.Now().Format(time.RFC3339)
            })
        }
    }
    return nil
}

3. Idempotent Operations

Operations should be safe to repeat:

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    // Idempotent: Check before acting
    if c.currentPort == targetPort {
        return nil  // Already done
    }

    // Perform action
    c.startOnPort(targetPort)
    return nil
}

4. Leader-Only Writes

Only the leader modifies cluster state:

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    // All pods: handle messages
    if port == "input" {
        return c.processInput(ctx, output, msg)
    }

    // Leader only: update status
    if port == v1alpha1.ReconcilePort && utils.IsLeader(ctx) {
        c.updateClusterState(ctx, output)
    }

    return nil
}

Scaling Patterns

Pattern 1: Load-Balanced Processing

All pods process incoming messages:

// All pods can handle this
func (c *Processor) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == "input" {
        result := c.process(msg.(Input))
        return output(ctx, "output", result)
    }
    return nil
}

Traffic is distributed by Kubernetes Service.

Pattern 2: Coordinated Server

One port, all pods listen:

func (c *Server) handleReconcile(ctx context.Context, output module.Handler, node v1alpha1.TinyNode) any {
    configuredPort := getPortFromMetadata(node)

    if configuredPort == 0 && utils.IsLeader(ctx) {
        // Leader: assign port
        port := c.startOnRandomPort()
        publishPort(output, port)
        return nil
    }

    if configuredPort > 0 && c.currentPort != configuredPort {
        // All pods: use assigned port
        c.startOnPort(configuredPort)
    }

    return nil
}

Pattern 3: Singleton Operations

Only leader performs certain operations:

func (c *Scheduler) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    if port == v1alpha1.ControlPort {
        if !utils.IsLeader(ctx) {
            return nil  // Only leader handles control
        }

        control := msg.(Control)
        if control.Start {
            c.startScheduledJobs(ctx, output)
        }
    }
    return nil
}

Pattern 4: Sharded Processing

Distribute work by key:

func (c *Sharded) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    input := msg.(Input)

    // Determine if this pod should handle this key
    podIndex := getPodIndex()
    totalPods := getTotalPods()
    targetPod := hash(input.Key) % totalPods

    if podIndex != targetPod {
        // Not our shard - forward to correct pod
        return c.forwardToShard(ctx, input, targetPod)
    }

    // Our shard - process
    return c.process(ctx, output, input)
}

Helm Values for Scaling

yaml

# values.yaml
replicaCount: 3

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

# Horizontal Pod Autoscaler
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

Graceful Scaling

Scale Up

1. New pod starts
2. Acquires TinyModule CR watch
3. Discovers other modules
4. Starts receiving gRPC traffic
5. Processes messages

No special handling needed - Kubernetes and the SDK handle it.

Scale Down

1. Pod receives SIGTERM
2. Context cancelled
3. Graceful shutdown
4. Stop accepting new requests
5. Complete in-flight requests
6. Pod terminates

Handle context cancellation:

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) any {
    for _, item := range items {
        select {
        case <-ctx.Done():
            return ctx.Err()  // Graceful shutdown
        default:
            c.process(item)
        }
    }
    return nil
}

Monitoring Scaled Deployments

Metrics per Pod

import "go.opentelemetry.io/otel/metric"

var (
    messagesProcessed = metric.NewInt64Counter("messages_processed")
)

func (c *Component) Handle(ctx context.Context, ...) any {
    // Increment counter with pod label
    messagesProcessed.Add(ctx, 1, attribute.String("pod", os.Getenv("HOSTNAME")))
    return nil
}

Leader Status

var (
    isLeaderGauge = metric.NewInt64Gauge("is_leader")
)

func updateLeaderMetric(isLeader bool) {
    value := int64(0)
    if isLeader {
        value = 1
    }
    isLeaderGauge.Set(context.Background(), value)
}

Common Pitfalls

1. Local Caching

// Bad: Cache not shared
type Component struct {
    cache map[string]Result
}

// Good: Use TinyNode metadata or external cache
func (c *Component) getFromMetadata(node v1alpha1.TinyNode, key string) string {
    return node.Status.Metadata[key]
}

2. Assuming Single Instance

// Bad: Assumes single instance
var globalCounter int

// Good: Use distributed counter or accept per-pod counting
func (c *Component) Handle(ctx context.Context, ...) any {
    // Either: Don't count, or use distributed storage
}

3. File-Based State

// Bad: File not shared between pods
f, _ := os.Create("/data/state.json")

// Good: Use ConfigMap, Secret, or TinyNode metadata

4. In-Memory Timers

// Bad: Timer only runs on one pod
time.AfterFunc(5*time.Minute, doWork)

// Good: Use leader-only timer with state in CR
if utils.IsLeader(ctx) {
    go c.runTimerLoop(ctx, output)
}

Checklist for Scalable Components

[ ] All processing is stateless or uses CR metadata
[ ] Leader-only operations check utils.IsLeader(ctx)
[ ] Context cancellation is respected
[ ] Operations are idempotent
[ ] No local file storage for shared state
[ ] No global variables for shared data
[ ] Graceful shutdown implemented
[ ] Metrics include pod identifier

Next Steps

Leader-Reader Pattern - Coordination patterns
CR-Based State Propagation - State sharing
Testing Components - Test scaling scenarios

Horizontal Scaling ​

Scaling Overview ​

Design Principles ​

1. Stateless by Default ​

2. CR-Based Shared State ​

3. Idempotent Operations ​

4. Leader-Only Writes ​

Scaling Patterns ​

Pattern 1: Load-Balanced Processing ​

Pattern 2: Coordinated Server ​

Pattern 3: Singleton Operations ​

Pattern 4: Sharded Processing ​

Helm Values for Scaling ​

Graceful Scaling ​

Scale Up ​

Scale Down ​

Monitoring Scaled Deployments ​

Metrics per Pod ​

Leader Status ​

Common Pitfalls ​

1. Local Caching ​

2. Assuming Single Instance ​

3. File-Based State ​

4. In-Memory Timers ​

Checklist for Scalable Components ​

Next Steps ​