10 KiB
Troubleshooting Guide
Common Issues and Solutions
1. Webhook Not Responding
Symptoms:
- Pods fail to create with timeout errors
- Events show webhook timeout
kubectl get podshangs
Diagnosis:
# Check webhook pod status
kubectl get pods -n irsa-system
# View webhook logs
kubectl logs -n irsa-system -l app=irsa-webhook
# Check webhook service
kubectl get svc -n irsa-system irsa-webhook
kubectl get endpoints -n irsa-system irsa-webhook
Solutions:
-
Webhook pods not running:
kubectl describe pods -n irsa-system -l app=irsa-webhook # Fix image pull issues, resource constraints, etc. -
TLS certificate issues:
# Regenerate certificates ./generate-certs.sh kubectl rollout restart deployment -n irsa-system irsa-webhook -
Service not routing correctly:
# Check if service selectors match pod labels kubectl get svc -n irsa-system irsa-webhook -o yaml kubectl get pods -n irsa-system -l app=irsa-webhook --show-labels
2. Pods Not Being Mutated
Symptoms:
- Pods create successfully but don't have injected configuration
- Environment variables missing
- Volume not mounted
Diagnosis:
# Check if ServiceAccount has annotation
kubectl get sa <service-account-name> -o yaml | grep vultr.com/role-arn
# Check webhook configuration
kubectl get mutatingwebhookconfiguration irsa-webhook -o yaml
# View webhook logs for the specific pod creation
kubectl logs -n irsa-system -l app=irsa-webhook --tail=100
Solutions:
-
ServiceAccount annotation missing:
kubectl annotate sa <service-account-name> \ vultr.com/role-arn="arn:aws:iam::123456789012:role/your-role" -
Namespace excluded from webhook: Check the
namespaceSelectorin the MutatingWebhookConfiguration:kubectl get mutatingwebhookconfiguration irsa-webhook -o yaml -
Webhook not receiving requests:
# Check webhook logs for incoming requests kubectl logs -n irsa-system -l app=irsa-webhook --tail=50 # Verify webhook configuration matches service kubectl get mutatingwebhookconfiguration irsa-webhook -o jsonpath='{.webhooks[0].clientConfig}'
3. RBAC Permission Errors
Symptoms:
- Webhook logs show "forbidden" or "unauthorized" errors
- Error fetching ServiceAccounts
Diagnosis:
# Check webhook ServiceAccount permissions
kubectl auth can-i get serviceaccounts \
--as=system:serviceaccount:irsa-system:irsa-webhook \
--all-namespaces
# View RBAC resources
kubectl get clusterrole irsa-webhook -o yaml
kubectl get clusterrolebinding irsa-webhook -o yaml
Solutions:
-
Missing RBAC permissions:
# Reapply RBAC configuration kubectl apply -f deploy.yaml -
ServiceAccount not bound to role:
kubectl get clusterrolebinding irsa-webhook -o yaml # Verify subjects include the correct ServiceAccount
4. TLS/Certificate Issues
Symptoms:
- "x509: certificate signed by unknown authority"
- "TLS handshake error"
- Webhook returns 401 or 403
Diagnosis:
# Check certificate in secret
kubectl get secret -n irsa-system irsa-webhook-certs -o yaml
# Verify CA bundle in webhook config
kubectl get mutatingwebhookconfiguration irsa-webhook \
-o jsonpath='{.webhooks[0].clientConfig.caBundle}' | base64 -d
Solutions:
-
Regenerate certificates:
./generate-certs.sh -
Manually update CA bundle:
CA_BUNDLE=$(kubectl get secret -n irsa-system irsa-webhook-certs \ -o jsonpath='{.data.ca\.crt}') kubectl patch mutatingwebhookconfiguration irsa-webhook \ --type='json' \ -p="[{'op': 'replace', 'path': '/webhooks/0/clientConfig/caBundle', 'value':'${CA_BUNDLE}'}]" -
Verify certificate SANs:
kubectl get secret -n irsa-system irsa-webhook-certs \ -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -text -noout
5. AWS Credential Issues
Symptoms:
- Pods can't authenticate with AWS
- "Unable to locate credentials" error
- "InvalidIdentityToken" error from AWS STS
Diagnosis:
# Check injected environment variables
kubectl exec <pod-name> -- env | grep AWS
# Verify token file exists
kubectl exec <pod-name> -- ls -la /var/run/secrets/vultr.com/serviceaccount/
# Check token contents (first 50 chars)
kubectl exec <pod-name> -- head -c 50 /var/run/secrets/vultr.com/serviceaccount/token
# Test AWS STS
kubectl exec <pod-name> -- aws sts get-caller-identity
Solutions:
-
Token not mounted:
- Verify pod has the volume and volume mount
- Check webhook logs for mutation
- Delete and recreate the pod
-
IAM role trust policy issue: Ensure your IAM role has the correct trust policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::YOUR_ACCOUNT_ID:oidc-provider/YOUR_OIDC_PROVIDER" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "YOUR_OIDC_PROVIDER:aud": "vultr" } } } ] } -
Wrong audience in token:
- Verify the projected token has audience "vultr"
- Check webhook configuration uses correct tokenAudience constant
6. Performance Issues
Symptoms:
- Pod creation is slow
- Webhook timeout warnings
- High resource usage
Diagnosis:
# Check webhook resource usage
kubectl top pods -n irsa-system
# View webhook latency in logs
kubectl logs -n irsa-system -l app=irsa-webhook | grep "Processing pod"
# Check for throttling
kubectl describe pods -n irsa-system -l app=irsa-webhook
Solutions:
-
Increase webhook timeout:
kubectl patch mutatingwebhookconfiguration irsa-webhook \ --type='json' \ -p='[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 30}]' -
Scale webhook deployment:
kubectl scale deployment -n irsa-system irsa-webhook --replicas=3 -
Increase resource limits: Edit deploy.yaml and increase CPU/memory limits:
resources: requests: cpu: 200m memory: 256Mi limits: cpu: 1000m memory: 512Mi
7. JSON Patch Generation Errors
Symptoms:
- "Failed to generate patches" in webhook logs
- Malformed patch errors
- Array index out of bounds
Diagnosis:
# Enable verbose logging (add to deployment)
# Set LOG_LEVEL=debug in environment
# Check specific pod that failed
kubectl logs -n irsa-system -l app=irsa-webhook --tail=100 | grep -A 10 "Failed"
Solutions:
-
Review pod specification:
- Ensure pod spec is valid JSON
- Check for unusual container configurations
-
Update webhook logic:
- Fix any bugs in generatePatches function
- Add error handling for edge cases
8. Multiple Webhooks Conflict
Symptoms:
- Pod mutations from other webhooks interfering
- Unexpected pod configuration
- Volume/env var conflicts
Diagnosis:
# List all mutating webhooks
kubectl get mutatingwebhookconfigurations
# Check webhook order
kubectl get mutatingwebhookconfigurations -o yaml | grep -A 5 "name:"
Solutions:
-
Adjust webhook order: Webhooks are processed alphabetically by name. Rename if needed:
# Add a prefix to control order kubectl patch mutatingwebhookconfiguration irsa-webhook \ --type='json' \ -p='[{"op": "replace", "path": "/metadata/name", "value": "01-irsa-webhook"}]' -
Add reinvocationPolicy:
webhooks: - name: irsa.vultr.com reinvocationPolicy: IfNeeded # or Never
Debug Commands Cheat Sheet
# View all webhook-related resources
kubectl get all -n irsa-system
kubectl get mutatingwebhookconfiguration irsa-webhook
kubectl get clusterrole irsa-webhook
kubectl get clusterrolebinding irsa-webhook
# Test webhook directly
kubectl run test-pod --image=nginx --dry-run=client -o yaml | \
kubectl create -f - --namespace=default
# Watch webhook logs in real-time
kubectl logs -n irsa-system -l app=irsa-webhook -f
# Check webhook pod health
kubectl get pods -n irsa-system -l app=irsa-webhook -o wide
kubectl describe pods -n irsa-system -l app=irsa-webhook
# View recent events
kubectl get events -n irsa-system --sort-by='.lastTimestamp'
# Test ServiceAccount annotation
kubectl get sa -A -o jsonpath='{range .items[?(@.metadata.annotations.vultr\.com/role-arn)]}{.metadata.namespace}{" "}{.metadata.name}{" "}{.metadata.annotations.vultr\.com/role-arn}{"\n"}{end}'
# Validate webhook configuration
kubectl get mutatingwebhookconfiguration irsa-webhook -o yaml | grep -E "(caBundle|service|path|port)"
Getting Help
If you're still experiencing issues:
-
Collect diagnostic information:
# Run this script and save output kubectl get all -n irsa-system > diagnostics.txt kubectl logs -n irsa-system -l app=irsa-webhook --tail=200 >> diagnostics.txt kubectl get mutatingwebhookconfiguration irsa-webhook -o yaml >> diagnostics.txt kubectl get events -n irsa-system >> diagnostics.txt -
Check webhook version:
kubectl get deployment -n irsa-system irsa-webhook -o jsonpath='{.spec.template.spec.containers[0].image}' -
Review logs with timestamps:
kubectl logs -n irsa-system -l app=irsa-webhook --timestamps=true --tail=100 -
Test in isolation:
- Create a separate test namespace
- Deploy a simple test pod
- Monitor webhook behavior
Prevention
Best Practices to Avoid Issues:
- Always test in a non-production cluster first
- Set
failurePolicy: Ignoreduring initial deployment - Monitor webhook performance and logs
- Keep certificates up to date (rotate every 365 days)
- Use resource limits to prevent webhook from consuming too much
- Implement readiness and liveness probes
- Scale webhook deployment for high-traffic clusters
- Document all ServiceAccount annotations