Troubleshooting Clusters
This guide helps you diagnose and resolve common cluster connection and operation issues.
Connection Issues
Cluster Not Connecting
Symptoms:
- Status shows "Disconnected" or "Error"
- Cannot deploy modules
- Cluster dashboard shows no data
Diagnosis:
# Check if kubectl can reach the cluster
kubectl cluster-info
# Check namespace exists
kubectl get namespace tinysystems
# Check service account
kubectl get serviceaccount tinysystems-controller -n tinysystemsSolutions:
API Server Unreachable
- Verify cluster is running
- Check network connectivity
- Confirm API server URL is correct
Invalid Kubeconfig
- Regenerate kubeconfig
- Verify token is valid
- Check certificate hasn't expired
Namespace Missing
bashkubectl create namespace tinysystems
Authentication Errors
Symptoms:
- "Unauthorized" errors
- "Forbidden" responses
- Token-related failures
Diagnosis:
# Check token validity
kubectl get secret tinysystems-controller-token -n tinysystems
# Test authentication
kubectl auth can-i get pods -n tinysystems \
--as system:serviceaccount:tinysystems:tinysystems-controllerSolutions:
Expired Token
bash# Recreate token secret kubectl delete secret tinysystems-controller-token -n tinysystems kubectl apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: tinysystems-controller-token namespace: tinysystems annotations: kubernetes.io/service-account.name: tinysystems-controller type: kubernetes.io/service-account-token EOF # Update kubeconfig in TinySystemsMissing Service Account
bashkubectl apply -f - <<EOF apiVersion: v1 kind: ServiceAccount metadata: name: tinysystems-controller namespace: tinysystems EOF
Permission Denied
Symptoms:
- Cannot create/update resources
- "Forbidden" errors for specific actions
- Module deployment fails
Diagnosis:
# Check RBAC
kubectl get clusterrolebinding tinysystems-controller
# Test specific permissions
kubectl auth can-i create pods -n tinysystems \
--as system:serviceaccount:tinysystems:tinysystems-controller
kubectl auth can-i create tinynodes -n tinysystems \
--as system:serviceaccount:tinysystems:tinysystems-controllerSolutions:
Missing ClusterRole
bash# Apply RBAC rules (see Connecting Your Cluster guide) kubectl apply -f tinysystems-rbac.yamlWrong Namespace Binding
- Verify RoleBinding references correct namespace
- Check ClusterRoleBinding for cluster-wide access
Module Issues
Module Not Deploying
Symptoms:
- Module stuck in "Deploying" state
- Pods not created
- Timeout errors
Diagnosis:
# Check module resource
kubectl get tinymodules -n tinysystems
# Check deployments
kubectl get deployments -n tinysystems
# Check for pending pods
kubectl get pods -n tinysystems
# Check events
kubectl get events -n tinysystems --sort-by='.lastTimestamp'Solutions:
Insufficient Resources
bash# Check node resources kubectl describe nodes | grep -A 5 "Allocated resources" # Reduce module resource requests or add nodesImage Pull Failure
bash# Check pod status kubectl describe pod <pod-name> -n tinysystems # Look for ImagePullBackOff # If private registry, create pull secret kubectl create secret docker-registry regcred \ --docker-server=<registry> \ --docker-username=<user> \ --docker-password=<token> \ -n tinysystemsCRD Not Installed
bash# Check CRDs exist kubectl get crd | grep tinysystems # Install CRDs if missing (usually done by module deployment)
Module Crash Loop
Symptoms:
- Pod repeatedly restarting
- CrashLoopBackOff status
- Module shows unhealthy
Diagnosis:
# Check pod status
kubectl get pods -n tinysystems -l app.kubernetes.io/name=<module-name>
# View logs
kubectl logs -n tinysystems <pod-name> --previous
# Describe pod
kubectl describe pod -n tinysystems <pod-name>Solutions:
Configuration Error
- Check environment variables
- Verify settings in TinyModule resource
- Review module documentation
Memory Limit Too Low
bash# Check OOMKilled kubectl describe pod <pod-name> -n tinysystems | grep -A 5 "Last State" # Increase memory limitHealth Check Failing
- Check readiness/liveness probes
- Verify health endpoints work
- Adjust probe timing
Module Not Responding
Symptoms:
- Module deployed but not processing
- gRPC connection failures
- Timeout on message delivery
Diagnosis:
# Check module logs
kubectl logs -n tinysystems deployment/<module-name>
# Check service
kubectl get svc -n tinysystems
# Test gRPC connectivity
kubectl exec -it <debug-pod> -- grpcurl -plaintext <service>:50051 listSolutions:
Service Misconfigured
bash# Verify service selectors match pod labels kubectl get svc <module-name> -n tinysystems -o yamlNetwork Policy Blocking
bash# Check network policies kubectl get networkpolicies -n tinysystems
Node Issues
TinyNode Not Processing
Symptoms:
- Node created but not working
- No execution logs
- Messages not flowing
Diagnosis:
# Check TinyNode status
kubectl get tinynodes -n tinysystems <node-name> -o yaml
# Look for errors
kubectl get tinynodes -n tinysystems <node-name> -o jsonpath='{.status.error}'
# Check module for the component
kubectl get tinymodules -n tinysystemsSolutions:
Module Not Found
- Verify module is deployed
- Check component name is correct
- Ensure module version matches
Configuration Error
- Review edge configurations
- Check expression syntax
- Verify port names exist
Scheduling Issue
- Module might not be leader
- Check leader election status
TinyNode Stuck
Symptoms:
- Node in pending state
- No status update
- ObservedGeneration not updating
Diagnosis:
# Check TinyNode resource
kubectl get tinynode <name> -n tinysystems -o yaml
# Compare spec.generation with status.observedGeneration
# If different, reconciliation is stuck
# Check controller logs
kubectl logs -n tinysystems deployment/<module-name> | grep <node-name>Solutions:
Recreate Node
bash# Delete and recreate kubectl delete tinynode <name> -n tinysystems # TinySystems will recreate from flow definitionRestart Module
bashkubectl rollout restart deployment/<module-name> -n tinysystems
Resource Issues
Out of Memory
Symptoms:
- Pods OOMKilled
- Module performance degradation
- Node-level memory pressure
Solutions:
Increase Pod Memory
- Update module deployment
- Increase memory limits
Add Nodes
- Scale cluster nodes
- Add more capacity
Optimize Flows
- Process smaller batches
- Use Split for arrays
- Avoid loading large data
CPU Throttling
Symptoms:
- Slow message processing
- High latency
- CPU limit reached
Solutions:
Increase CPU Limits
- Update module deployment
- Allow more CPU
Scale Horizontally
- Add more replicas
- Distribute load
Optimize Processing
- Simplify expressions
- Cache repeated calculations
- Use async where appropriate
Network Issues
DNS Resolution Failing
Symptoms:
- Cannot reach external services
- Internal service discovery fails
- Name resolution errors
Diagnosis:
# Test DNS from pod
kubectl exec -it <pod> -n tinysystems -- nslookup kubernetes.default
# Check CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dnsSolutions:
CoreDNS Not Running
- Check and restart CoreDNS pods
- Verify CoreDNS configuration
Network Policy Blocking DNS
- Allow egress to kube-dns
- Port 53 UDP/TCP
External Connectivity Issues
Symptoms:
- Cannot reach external APIs
- HTTP client timeouts
- Connection refused
Diagnosis:
# Test from pod
kubectl exec -it <pod> -n tinysystems -- curl -v https://api.example.com
# Check egress
kubectl get networkpolicies -n tinysystemsSolutions:
Network Policy Too Restrictive
- Allow required egress
- Open necessary ports
Firewall/Security Groups
- Check cloud provider firewall
- Allow outbound traffic
Debug Checklist
When issues occur:
Check Status
bashkubectl get all -n tinysystemsView Events
bashkubectl get events -n tinysystems --sort-by='.lastTimestamp'Check Logs
bashkubectl logs -n tinysystems deployment/<module> --tail=100Describe Resources
bashkubectl describe tinynode <name> -n tinysystems kubectl describe pod <name> -n tinysystemsTest Connectivity
bashkubectl exec -it <pod> -n tinysystems -- /bin/sh
Getting Help
If issues persist:
Collect diagnostics:
bashkubectl get all -n tinysystems -o yaml > diagnostics.yaml kubectl get events -n tinysystems >> diagnostics.yaml kubectl logs -n tinysystems deployment/<module> >> diagnostics.yamlCheck TinySystems status page
Contact support with diagnostics
Next Steps
- Cluster Requirements - Verify requirements
- Connecting Your Cluster - Reconnect if needed
- Cluster Management - Manage your cluster