Skip to main content
For any failure, you can deploy a debug mode instance to SSH in and investigate.

Deployment failures

Container stuck in “Deploying” or “Pending”

The enclave may be taking longer than usual to boot, or the container image is large. Check your image size. Large images take longer to pull. Keep your Docker image lean — use multi-stage builds and minimal base images. Wait a few minutes. Initial deployments typically take 2-10 minutes for CPU-only workloads. If it’s been more than 10 minutes, try deleting and redeploying.

Container goes to “Failed”

The enclave started but your application didn’t pass health checks. The most common cause is a port mismatch; make sure your app listens on the port you configured as upstream-port in tinfoil-config.yml. If your config sets PORT=8080 but your app listens on 3000, health checks will fail. Check your startup command. If your container’s entrypoint or CMD is misconfigured, it may exit immediately. SSH into a debug mode instance to test manually. Check for missing environment variables or secrets. If your app requires a DATABASE_URL or API key at startup and it’s not configured, it will crash. The dashboard warns about missing secrets during deployment! Make sure you’ve created and selected all required ones.

”Invalid configuration” during validation

The dashboard validates your tinfoil-config.yml before deploying. Invalid CPU or memory values. CPU must be one of: 2, 4, 8, 16, 32. Memory must be one of: 8192, 16384, 32768, 65536, 131072 (in MB). See resource limits. Org quota exceeded. You can have up to 10 containers per org and 2 instances per repository. Check your current usage in the Active tab. Name already taken. Container names must be unique within your org. The dashboard checks this in real time.

Container image can’t be pulled

Missing SHA256 digest. Images must include a SHA256 digest (e.g. image:tag@sha256:...). See the configuration reference. Private registry without credentials. If your image is in a private registry, add credentials in the Registry Auth tab. Registry credentials expired or revoked. If credentials are configured but the registry rejects them, you’ll see an “expired or been revoked” error and an amber banner on the Active tab. Update your credentials in the Registry Credentials tab with a fresh token. See the registry auth troubleshooting guide for common causes. Image tag doesn’t exist. Double-check the Git tag you selected. The image may not have been built for that tag yet. Wrong image reference. Make sure the image field in tinfoil-config.yml matches your actual registry path (e.g. ghcr.io/myorg/my-app, not myorg/my-app).

Config format issues

Started from scratch instead of the template. The tinfoil-containers-template includes the correct shim-version, cvm-version, and routing configuration. If you wrote your config from scratch, compare it against the template. Missing routing config. Every container needs a shim section with listen-port, upstream-port, and paths. Without it, no traffic can reach your container. Paths not exposed. Only paths listed in paths are reachable. If your app serves /api/v1/users but you only listed /health, API requests will be rejected.

Runtime issues

Application works locally but not in the enclave

Network differences. Inside the enclave, your app runs in an isolated network. External services (databases, APIs) must be reachable over the public internet — there’s no VPC peering or private network access. Filesystem is ephemeral. Don’t rely on writing to disk for persistence. Data written to the local filesystem is lost when the enclave restarts. No GPU access by default. Containers run on CPUs unless your org has GPU access enabled. CPU-only workloads should not depend on CUDA or GPU libraries. CUDA version mismatch. If your container image requires a CUDA version newer than the enclave’s driver supports, it will fail to start or crash. SSH into a debug mode instance and run nvidia-smi to check the installed driver version and the maximum supported CUDA version. Then pick a container image built for a compatible CUDA version (e.g. cu128 for CUDA 12.8, cu130 for CUDA 13.0).

Secrets not available at runtime

Secret not selected during deployment. Creating a secret in the Secrets tab doesn’t automatically make it available to all containers. You must select which secrets a container can access when deploying. Secret name mismatch. The secret name in the dashboard must exactly match the name in your tinfoil-config.yml secrets list. Names are case-sensitive. Stale secrets. If you update a secret’s value after deploying, the running container still has the old value. Redeploy to pick up changes. The dashboard shows a stale secrets indicator when this happens.

Container is slow or unresponsive

If your app is CPU- or memory-constrained, it may respond slowly or OOM-kill. Try scaling up (see resource options). Also note that if your application has a slow startup (large frameworks, JVM warmup, model loading), the initial requests after deployment may be slow.

Update issues

Blue-green update stuck

The new version follows the same lifecycle as a fresh deployment. If it’s stuck in “Deploying”, the same troubleshooting steps for deployment failures apply. You can always click Cancel Update to abort without affecting the running version.

Updated version fails health checks

Click Cancel Update to keep the current version running. Deploy a debug mode instance with the new tag to investigate, fix the issue, and try the update again.

Getting help

If you’re stuck, reach out at [email protected].