Enterprise Operator Crashloop (Missing gcp-storage Crossplane Endpoints)
Overview
The enterprise-operator pod enters a crashloop state when the GCP storage crossplane provider pod
is scaled down to 0 replicas. This causes conversion webhook failures for
BucketIAMMember.storage.gcp.upbound.io resources.
Symptoms
enterprise-operatorpod crashlooping- Error in logs:
no endpoints available for service "gcp-storage" appconfigs not syncing- Conversion webhook failures for
storage.gcp.upbound.io/v1beta1
Error Message
failed to load initial state of resource BucketIAMMember.storage.gcp.upbound.io:
conversion webhook for storage.gcp.upbound.io/v1beta1, Kind=BucketIAMMember failed:
Post "https://gcp-storage.crossplane.svc:9443/convert?timeout=30s":
no endpoints available for service "gcp-storage"
Root Cause
The gcp-storage crossplane provider pod was intentionally scaled to 0 replicas (typically due to operational requirements like the Citadel soft deletion retention rule >= 8 days). When BucketIAM member events trigger conversions, the enterprise-operator cannot reach the webhook endpoint, causing it to fail and crashloop.
Impact
- AppConfigs fail to sync
- Potential impact on heartbeats and communications to mothership
- Critical: Gateway client certificate rotation may fail if enterprise-operator is down (check certificate expiry with the diagnostic command below)
Prerequisites
kubectlaccess to the cluster- Access to the
crossplanenamespace - Permissions to edit
DeploymentRuntimeConfigresources
Diagnostic Steps
1. Verify the crashloop
Check enterprise-operator logs:
kubectl logs -n alphasense <enterprise-operator-pod> | grep "gcp-storage"
2. Check gcp-storage service endpoints
kubectl -n crossplane get svc gcp-storage -o yaml
Look for the selector:
spec:
selector:
pkg.crossplane.io/provider: provider-gcp-storage
3. Check if gcp-storage pods are running
kubectl -n crossplane get pods | grep gcp-storage
If no pods are returned, the provider is scaled to 0.
4. Check gateway certificate expiry (Critical)
kubectl -n mothership get secret gateway-client-certificate -ojson | jq -r '.data."tls.crt"' | base64 -d | openssl x509 -text -noout -in /dev/stdin | grep "Not After"
If the certificate expires soon (3 days), the enterprise-operator must be running to rotate it.
5. Check the DeploymentRuntimeConfig
kubectl get DeploymentRuntimeConfig gcp-upbound-storage -o yaml
Look for the replicas field in spec.deploymentTemplate.spec.replicas.
Resolution
Option 1: Re-enable the gcp-storage crossplane provider (Recommended)
Step 1: Get the current DeploymentRuntimeConfig
kubectl get DeploymentRuntimeConfig gcp-upbound-storage -o yaml > gcp-upbound-storage-backup.yaml
Step 2: Edit the DeploymentRuntimeConfig
kubectl edit DeploymentRuntimeConfig gcp-upbound-storage
Step 3: Change replicas: 0 to replicas: 1
spec:
deploymentTemplate:
spec:
replicas: 1 # Change from 0 to 1
Step 4: Verify the pod is running
kubectl -n crossplane get pods | grep gcp-storage
Wait until the pod shows 1/1 Running.
Step 5: Verify enterprise-operator recovers
kubectl logs -n alphasense <enterprise-operator-pod> --tail=50
You should see successful sync messages.
Step 6: Verify heartbeat to mothership - Only for Alphasense
Contact Alphasense Support to monitor heartbeat. Check monitoring/logs for successful heartbeat messages after the fix.