Troubleshooting - Tinfoil Documentation

For any failure, you can deploy a debug mode instance to SSH in and investigate.

Deployment failures

Container stuck in “Deploying” or “Pending”

The enclave may be taking longer than usual to boot, or the container image is large. Check your image size. Large images take longer to pull. Keep your Docker image lean — use multi-stage builds and minimal base images. Wait a few minutes. Initial deployments typically take 2-10 minutes for CPU-only workloads. If it’s been more than 10 minutes, try deleting and redeploying.

Container goes to “Failed”

The enclave started but your application didn’t pass health checks. The most common cause is a port mismatch; make sure your app listens on the port you configured as upstream-port in tinfoil-config.yml. If your config sets PORT=8080 but your app listens on 3000, health checks will fail. Check your startup command. If your container’s entrypoint or CMD is misconfigured, it may exit immediately. SSH into a debug mode instance to test manually. Check for missing environment variables or secrets. If your app requires a DATABASE_URL or API key at startup and it’s not configured, it will crash. The dashboard warns about missing secrets during deployment! Make sure you’ve created and selected all required ones.

”Invalid configuration” during validation

The dashboard validates your tinfoil-config.yml before deploying. Invalid CPU or memory values. CPU must be one of: 2, 4, 8, 16, 32. Memory must be one of: 8192, 16384, 32768, 65536, 131072 (in MB). See resource limits. Org quota exceeded. You can have up to 10 containers per org and 2 instances per repository. Check your current usage in the Active tab. Name already taken. Container names must be unique within your org. The dashboard checks this in real time.

Container image can’t be pulled

Missing SHA256 digest. Images must include a SHA256 digest (e.g. image:tag@sha256:...). See the configuration reference. Private registry without credentials. If your image is in a private registry, add credentials in the Registry Auth tab. Registry credentials expired or revoked. If credentials are configured but the registry rejects them, you’ll see an “expired or been revoked” error and an amber banner on the Active tab. Update your credentials in the Registry Credentials tab with a fresh token. See the registry auth troubleshooting guide for common causes. Image tag doesn’t exist. Double-check the Git tag you selected. The image may not have been built for that tag yet. Wrong image reference. Make sure the image field in tinfoil-config.yml matches your actual registry path (e.g. ghcr.io/myorg/my-app, not myorg/my-app).

Config format issues

Started from scratch instead of the template. The tinfoil-containers-template includes the correct cvm-version and routing configuration. If you wrote your config from scratch, compare it against the template. Missing routing config. Every container needs a shim section with upstream-port and paths. Without it, no traffic can reach your container. Paths not exposed. Only paths listed in paths are reachable. If your app serves /api/v1/users but you only listed /health, API requests will be rejected.

Runtime issues

Application works locally but not in the enclave

Network differences. Inside the enclave, your app runs in an isolated network. External services (databases, APIs) must be reachable over the public internet — there’s no VPC peering or private network access. Filesystem is ephemeral. Don’t rely on writing to disk for persistence. Data written to the local filesystem is lost when the enclave restarts. No GPU access by default. Containers run on CPUs unless your org has GPU access enabled. CPU-only workloads should not depend on CUDA or GPU libraries. CUDA version mismatch. If your container image requires a CUDA version newer than the enclave’s driver supports, it will fail to start or crash. SSH into a debug mode instance and run nvidia-smi to check the installed driver version and the maximum supported CUDA version. Then pick a container image built for a compatible CUDA version (e.g. cu128 for CUDA 12.8, cu130 for CUDA 13.0).

Secrets not available at runtime

Secret not selected during deployment. Creating a secret in the Secrets tab doesn’t automatically make it available to all containers. You must select which secrets a container can access when deploying. Secret name mismatch. The secret name in the dashboard must exactly match the name in your tinfoil-config.yml secrets list. Names are case-sensitive. Stale secrets. If you update a secret’s value after deploying, the running container still has the old value. Redeploy to pick up changes. The dashboard shows a stale secrets indicator when this happens.

Container is slow or unresponsive

If your app is CPU- or memory-constrained, it may respond slowly or OOM-kill. Try scaling up (see resource options). Also note that if your application has a slow startup (large frameworks, JVM warmup, model loading), the initial requests after deployment may be slow.

Update issues

See Updating & lifecycle for the full update flow. Common failure modes:

New tag doesn’t appear in the picker

The Tinfoil Release workflow (or its auto-triggered publish phase) hasn’t completed. Check your repo’s Actions tab — if either phase failed, fix the issue and re-run the failed workflow. The tag shows up in the dashboard only after both phases succeed and tinfoil-release-publish.yml creates the GitHub release.

Blue-green update stuck

The new version follows the same lifecycle as a fresh deployment. If it’s stuck in “Deploying”, the same troubleshooting steps for deployment failures apply. You can always click Cancel Update to abort without affecting the running version.

Updated version fails health checks

Click Cancel Update to keep the current version running. Deploy a debug mode instance with the new tag to investigate, fix the issue, and try the update again.

Inspecting from the CLI

The Tinfoil CLI is often the fastest way to triage a deployment without leaving the terminal:

tinfoil container list                        # find the failing container
tinfoil container get my-api                  # status, host, error message
tinfoil container update status my-api        # for stuck blue-green updates
tinfoil container metrics my-api --time 1h    # CPU / GPU / memory utilization (JSON)

Pair tinfoil container get -o json with jq to script health checks against your fleet.

Getting help

If you’re stuck, reach out at [email protected].

​Deployment failures

​Container stuck in “Deploying” or “Pending”

​Container goes to “Failed”

​”Invalid configuration” during validation

​Container image can’t be pulled

​Config format issues

​Runtime issues

​Application works locally but not in the enclave

​Secrets not available at runtime

​Container is slow or unresponsive

​Update issues

​New tag doesn’t appear in the picker

​Blue-green update stuck

​Updated version fails health checks

​Inspecting from the CLI

​Getting help

Deployment failures

Container stuck in “Deploying” or “Pending”

Container goes to “Failed”

”Invalid configuration” during validation

Container image can’t be pulled

Config format issues

Runtime issues

Application works locally but not in the enclave

Secrets not available at runtime

Container is slow or unresponsive

Update issues

New tag doesn’t appear in the picker

Blue-green update stuck

Updated version fails health checks

Inspecting from the CLI

Getting help