Skip to content

Troubleshooting Clusters

This guide helps you diagnose and resolve common cluster connection and operation issues.

Connection Issues

Cluster Not Connecting

Symptoms:

  • Status shows "Disconnected" or "Error"
  • Cannot deploy modules
  • Cluster dashboard shows no data

Diagnosis:

bash
# Check if kubectl can reach the cluster
kubectl cluster-info

# Check namespace exists
kubectl get namespace tinysystems

# Check service account
kubectl get serviceaccount tinysystems-controller -n tinysystems

Solutions:

  1. API Server Unreachable

    • Verify cluster is running
    • Check network connectivity
    • Confirm API server URL is correct
  2. Invalid Kubeconfig

    • Regenerate kubeconfig
    • Verify token is valid
    • Check certificate hasn't expired
  3. Namespace Missing

    bash
    kubectl create namespace tinysystems

Authentication Errors

Symptoms:

  • "Unauthorized" errors
  • "Forbidden" responses
  • Token-related failures

Diagnosis:

bash
# Check token validity
kubectl get secret tinysystems-controller-token -n tinysystems

# Test authentication
kubectl auth can-i get pods -n tinysystems \
  --as system:serviceaccount:tinysystems:tinysystems-controller

Solutions:

  1. Expired Token

    bash
    # Recreate token secret
    kubectl delete secret tinysystems-controller-token -n tinysystems
    
    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: tinysystems-controller-token
      namespace: tinysystems
      annotations:
        kubernetes.io/service-account.name: tinysystems-controller
    type: kubernetes.io/service-account-token
    EOF
    
    # Update kubeconfig in TinySystems
  2. Missing Service Account

    bash
    kubectl apply -f - <<EOF
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: tinysystems-controller
      namespace: tinysystems
    EOF

Permission Denied

Symptoms:

  • Cannot create/update resources
  • "Forbidden" errors for specific actions
  • Module deployment fails

Diagnosis:

bash
# Check RBAC
kubectl get clusterrolebinding tinysystems-controller

# Test specific permissions
kubectl auth can-i create pods -n tinysystems \
  --as system:serviceaccount:tinysystems:tinysystems-controller

kubectl auth can-i create tinynodes -n tinysystems \
  --as system:serviceaccount:tinysystems:tinysystems-controller

Solutions:

  1. Missing ClusterRole

    bash
    # Apply RBAC rules (see Connecting Your Cluster guide)
    kubectl apply -f tinysystems-rbac.yaml
  2. Wrong Namespace Binding

    • Verify RoleBinding references correct namespace
    • Check ClusterRoleBinding for cluster-wide access

Module Issues

Module Not Deploying

Symptoms:

  • Module stuck in "Deploying" state
  • Pods not created
  • Timeout errors

Diagnosis:

bash
# Check module resource
kubectl get tinymodules -n tinysystems

# Check deployments
kubectl get deployments -n tinysystems

# Check for pending pods
kubectl get pods -n tinysystems

# Check events
kubectl get events -n tinysystems --sort-by='.lastTimestamp'

Solutions:

  1. Insufficient Resources

    bash
    # Check node resources
    kubectl describe nodes | grep -A 5 "Allocated resources"
    
    # Reduce module resource requests or add nodes
  2. Image Pull Failure

    bash
    # Check pod status
    kubectl describe pod <pod-name> -n tinysystems
    
    # Look for ImagePullBackOff
    
    # If private registry, create pull secret
    kubectl create secret docker-registry regcred \
      --docker-server=<registry> \
      --docker-username=<user> \
      --docker-password=<token> \
      -n tinysystems
  3. CRD Not Installed

    bash
    # Check CRDs exist
    kubectl get crd | grep tinysystems
    
    # Install CRDs if missing (usually done by module deployment)

Module Crash Loop

Symptoms:

  • Pod repeatedly restarting
  • CrashLoopBackOff status
  • Module shows unhealthy

Diagnosis:

bash
# Check pod status
kubectl get pods -n tinysystems -l app.kubernetes.io/name=<module-name>

# View logs
kubectl logs -n tinysystems <pod-name> --previous

# Describe pod
kubectl describe pod -n tinysystems <pod-name>

Solutions:

  1. Configuration Error

    • Check environment variables
    • Verify settings in TinyModule resource
    • Review module documentation
  2. Memory Limit Too Low

    bash
    # Check OOMKilled
    kubectl describe pod <pod-name> -n tinysystems | grep -A 5 "Last State"
    
    # Increase memory limit
  3. Health Check Failing

    • Check readiness/liveness probes
    • Verify health endpoints work
    • Adjust probe timing

Module Not Responding

Symptoms:

  • Module deployed but not processing
  • gRPC connection failures
  • Timeout on message delivery

Diagnosis:

bash
# Check module logs
kubectl logs -n tinysystems deployment/<module-name>

# Check service
kubectl get svc -n tinysystems

# Test gRPC connectivity
kubectl exec -it <debug-pod> -- grpcurl -plaintext <service>:50051 list

Solutions:

  1. Service Misconfigured

    bash
    # Verify service selectors match pod labels
    kubectl get svc <module-name> -n tinysystems -o yaml
  2. Network Policy Blocking

    bash
    # Check network policies
    kubectl get networkpolicies -n tinysystems

Node Issues

TinyNode Not Processing

Symptoms:

  • Node created but not working
  • No execution logs
  • Messages not flowing

Diagnosis:

bash
# Check TinyNode status
kubectl get tinynodes -n tinysystems <node-name> -o yaml

# Look for errors
kubectl get tinynodes -n tinysystems <node-name> -o jsonpath='{.status.error}'

# Check module for the component
kubectl get tinymodules -n tinysystems

Solutions:

  1. Module Not Found

    • Verify module is deployed
    • Check component name is correct
    • Ensure module version matches
  2. Configuration Error

    • Review edge configurations
    • Check expression syntax
    • Verify port names exist
  3. Scheduling Issue

    • Module might not be leader
    • Check leader election status

TinyNode Stuck

Symptoms:

  • Node in pending state
  • No status update
  • ObservedGeneration not updating

Diagnosis:

bash
# Check TinyNode resource
kubectl get tinynode <name> -n tinysystems -o yaml

# Compare spec.generation with status.observedGeneration
# If different, reconciliation is stuck

# Check controller logs
kubectl logs -n tinysystems deployment/<module-name> | grep <node-name>

Solutions:

  1. Recreate Node

    bash
    # Delete and recreate
    kubectl delete tinynode <name> -n tinysystems
    # TinySystems will recreate from flow definition
  2. Restart Module

    bash
    kubectl rollout restart deployment/<module-name> -n tinysystems

Resource Issues

Out of Memory

Symptoms:

  • Pods OOMKilled
  • Module performance degradation
  • Node-level memory pressure

Solutions:

  1. Increase Pod Memory

    • Update module deployment
    • Increase memory limits
  2. Add Nodes

    • Scale cluster nodes
    • Add more capacity
  3. Optimize Flows

    • Process smaller batches
    • Use Split for arrays
    • Avoid loading large data

CPU Throttling

Symptoms:

  • Slow message processing
  • High latency
  • CPU limit reached

Solutions:

  1. Increase CPU Limits

    • Update module deployment
    • Allow more CPU
  2. Scale Horizontally

    • Add more replicas
    • Distribute load
  3. Optimize Processing

    • Simplify expressions
    • Cache repeated calculations
    • Use async where appropriate

Network Issues

DNS Resolution Failing

Symptoms:

  • Cannot reach external services
  • Internal service discovery fails
  • Name resolution errors

Diagnosis:

bash
# Test DNS from pod
kubectl exec -it <pod> -n tinysystems -- nslookup kubernetes.default

# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns

Solutions:

  1. CoreDNS Not Running

    • Check and restart CoreDNS pods
    • Verify CoreDNS configuration
  2. Network Policy Blocking DNS

    • Allow egress to kube-dns
    • Port 53 UDP/TCP

External Connectivity Issues

Symptoms:

  • Cannot reach external APIs
  • HTTP client timeouts
  • Connection refused

Diagnosis:

bash
# Test from pod
kubectl exec -it <pod> -n tinysystems -- curl -v https://api.example.com

# Check egress
kubectl get networkpolicies -n tinysystems

Solutions:

  1. Network Policy Too Restrictive

    • Allow required egress
    • Open necessary ports
  2. Firewall/Security Groups

    • Check cloud provider firewall
    • Allow outbound traffic

Debug Checklist

When issues occur:

  1. Check Status

    bash
    kubectl get all -n tinysystems
  2. View Events

    bash
    kubectl get events -n tinysystems --sort-by='.lastTimestamp'
  3. Check Logs

    bash
    kubectl logs -n tinysystems deployment/<module> --tail=100
  4. Describe Resources

    bash
    kubectl describe tinynode <name> -n tinysystems
    kubectl describe pod <name> -n tinysystems
  5. Test Connectivity

    bash
    kubectl exec -it <pod> -n tinysystems -- /bin/sh

Getting Help

If issues persist:

  1. Collect diagnostics:

    bash
    kubectl get all -n tinysystems -o yaml > diagnostics.yaml
    kubectl get events -n tinysystems >> diagnostics.yaml
    kubectl logs -n tinysystems deployment/<module> >> diagnostics.yaml
  2. Check TinySystems status page

  3. Contact support with diagnostics

Next Steps

Build flow-based applications on Kubernetes