Observability

Observability in TinySystems covers logging, metrics, and tracing. This guide helps you understand and monitor your modules in production.

Overview

+-----------------------------------------------------------------------------+
|                         OBSERVABILITY STACK                                  |
+-----------------------------------------------------------------------------+

  Module Pods                  Collectors                    Backends
      |                           |                             |
      |  +---------------------+  |                             |
      |  | Logs (stdout/stderr)|--+--> Fluentd/Fluent Bit -----> Loki/ES
      |  +---------------------+  |                             |
      |                           |                             |
      |  +---------------------+  |                             |
      |  | Metrics (Prometheus)|--+--> Prometheus Scrape ------> Prometheus
      |  +---------------------+  |                             |
      |                           |                             |
      |  +---------------------+  |                             |
      |  | Traces (OTLP)       |--+--> OTEL Collector ---------> Tempo/Jaeger
      |  +---------------------+  |                             |
      |                           |                             |

Logging

Structured Logging

Use structured logging for better searchability:

import "sigs.k8s.io/controller-runtime/pkg/log"

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) error {
    logger := log.FromContext(ctx)

    logger.Info("processing message",
        "port", port,
        "type", fmt.Sprintf("%T", msg),
    )

    // On error
    if err != nil {
        logger.Error(err, "processing failed",
            "port", port,
            "input", msg,
        )
        return err
    }

    return nil
}

Log Levels

Level	Use Case	Example
Error	Failures requiring attention	Connection failed, data corruption
Info	Normal operations	Message processed, state changed
Debug	Development details	Full payload, timing
V(1)	Verbose	Every function entry/exit

Log Context

Add context to all logs:

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) error {
    logger := log.FromContext(ctx).WithValues(
        "component", c.GetInfo().Name,
        "node", c.nodeName,
    )

    logger.Info("handling message", "port", port)
    // All subsequent logs include component and node
}

Metrics

Prometheus Metrics

Define metrics for your component:

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    messagesProcessed = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "tinysystems_messages_processed_total",
            Help: "Total number of messages processed",
        },
        []string{"component", "port", "status"},
    )

    processingDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "tinysystems_processing_duration_seconds",
            Help:    "Message processing duration",
            Buckets: prometheus.DefBuckets,
        },
        []string{"component", "port"},
    )

    activeConnections = promauto.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "tinysystems_active_connections",
            Help: "Number of active connections",
        },
        []string{"component"},
    )
)

Using Metrics

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) error {
    start := time.Now()
    defer func() {
        duration := time.Since(start).Seconds()
        processingDuration.WithLabelValues(c.name, port).Observe(duration)
    }()

    err := c.process(ctx, output, msg)

    status := "success"
    if err != nil {
        status = "error"
    }
    messagesProcessed.WithLabelValues(c.name, port, status).Inc()

    return err
}

Standard Metrics

Recommended metrics for all components:

Metric	Type	Labels	Description
`_messages_total`	Counter	component, port, status	Messages processed
`_duration_seconds`	Histogram	component, port	Processing time
`_errors_total`	Counter	component, error_type	Errors by type
`_active_operations`	Gauge	component	In-flight operations

Tracing

OpenTelemetry Integration

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/trace"
)

var tracer = otel.Tracer("my-module")

func (c *Component) Handle(ctx context.Context, output module.Handler, port string, msg any) error {
    ctx, span := tracer.Start(ctx, "Handle",
        trace.WithAttributes(
            attribute.String("component", c.name),
            attribute.String("port", port),
        ),
    )
    defer span.End()

    // Processing...
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, err.Error())
        return err
    }

    span.SetStatus(codes.Ok, "")
    return nil
}

Trace Context Propagation

Traces propagate across modules:

// Automatic via gRPC metadata
// Manual for HTTP:
func (c *HTTPClient) doRequest(ctx context.Context, req *http.Request) (*http.Response, error) {
    // Inject trace context into headers
    otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))

    return c.client.Do(req)
}

Health Endpoints

Liveness Probe

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ok"))
})

Readiness Probe

func (c *Component) readinessHandler(w http.ResponseWriter, r *http.Request) {
    if !c.isReady() {
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte("not ready"))
        return
    }
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ready"))
}

func (c *Component) isReady() bool {
    return c.settings.Initialized && c.connections.AllHealthy()
}

Kubernetes Configuration

Pod Annotations

yaml

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
    prometheus.io/path: "/metrics"

ServiceMonitor

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-module
spec:
  selector:
    matchLabels:
      app: my-module
  endpoints:
    - port: metrics
      interval: 15s

Dashboards

Grafana Dashboard

json

{
  "title": "Module Overview",
  "panels": [
    {
      "title": "Messages/sec",
      "type": "graph",
      "targets": [
        {
          "expr": "rate(tinysystems_messages_processed_total[5m])",
          "legendFormat": "{{component}} - {{port}}"
        }
      ]
    },
    {
      "title": "Processing Latency p99",
      "type": "graph",
      "targets": [
        {
          "expr": "histogram_quantile(0.99, rate(tinysystems_processing_duration_seconds_bucket[5m]))",
          "legendFormat": "{{component}}"
        }
      ]
    }
  ]
}

Alerting

Prometheus Alerts

yaml

groups:
  - name: tinysystems
    rules:
      - alert: HighErrorRate
        expr: |
          rate(tinysystems_messages_processed_total{status="error"}[5m])
          / rate(tinysystems_messages_processed_total[5m]) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate in {{ $labels.component }}"

      - alert: HighLatency
        expr: |
          histogram_quantile(0.99, rate(tinysystems_processing_duration_seconds_bucket[5m])) > 5
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High latency in {{ $labels.component }}"

Best Practices

1. Use Labels Wisely

// Good: Low cardinality
messagesProcessed.WithLabelValues(componentName, port, status)

// Bad: High cardinality (unbounded values)
messagesProcessed.WithLabelValues(componentName, userID, requestID)

2. Log at Appropriate Levels

// Development
log.V(1).Info("detailed debug info", "data", fullPayload)

// Production
log.Info("message processed", "count", 1)
log.Error(err, "processing failed")

3. Include Trace IDs in Logs

logger.Info("processing",
    "traceID", trace.SpanFromContext(ctx).SpanContext().TraceID().String(),
)

4. Monitor Business Metrics

var (
    ordersCreated = promauto.NewCounter(prometheus.CounterOpts{
        Name: "orders_created_total",
    })
    orderValue = promauto.NewHistogram(prometheus.HistogramOpts{
        Name:    "order_value_dollars",
        Buckets: []float64{10, 50, 100, 500, 1000},
    })
)

Next Steps

Debugging - Debug issues
Testing Components - Test coverage
Horizontal Scaling - Scale monitoring

Observability ​

Overview ​

Logging ​

Structured Logging ​

Log Levels ​

Log Context ​

Metrics ​

Prometheus Metrics ​

Using Metrics ​

Standard Metrics ​

Tracing ​

OpenTelemetry Integration ​

Trace Context Propagation ​

Health Endpoints ​

Liveness Probe ​

Readiness Probe ​

Kubernetes Configuration ​

Pod Annotations ​

ServiceMonitor ​

Dashboards ​

Grafana Dashboard ​

Alerting ​

Prometheus Alerts ​

Best Practices ​

1. Use Labels Wisely ​

2. Log at Appropriate Levels ​

3. Include Trace IDs in Logs ​

4. Monitor Business Metrics ​

Next Steps ​

Observability

Overview

Logging

Structured Logging

Log Levels

Log Context

Metrics

Prometheus Metrics

Using Metrics

Standard Metrics

Tracing

OpenTelemetry Integration

Trace Context Propagation

Health Endpoints

Liveness Probe

Readiness Probe

Kubernetes Configuration

Pod Annotations

ServiceMonitor

Dashboards

Grafana Dashboard

Alerting

Prometheus Alerts

Best Practices

1. Use Labels Wisely

2. Log at Appropriate Levels

3. Include Trace IDs in Logs

4. Monitor Business Metrics

Next Steps