Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The Models tab prepares Hugging Face model weights for use inside a Tinfoil Container. It does not deploy an inference server by itself. Instead, it creates a verified model-weight artifact and gives you the models: block to add to tinfoil-config.yml. Use this when you are deploying a GPU inference container, such as vLLM, and want the model weights to be pinned and verified separately from the Docker image.
Your Docker image still needs to contain the inference server runtime. The Models tab prepares the weights that the runtime will load.

Why this exists

Enclave attestation proves what code and configuration were present when the enclave booted. Model weights are usually loaded from disk after boot, so they need their own integrity commitment. Tinfoil uses Modelwrap to turn a pinned Hugging Face commit into a read-only model package with a dm-verity root hash. The enclave config commits to that root hash, and dm-verity verifies each disk read while the inference server loads the model. For the full technical explanation, read How Tinfoil Proves Exactly What Model Is Running.

Prepare weights

  1. Open the Tinfoil Dashboard
  2. Go to Tinfoil Containers > Models
  3. Enter the Hugging Face repo in owner/model form
  4. Use the auto-filled commit, or paste a specific commit SHA
  5. Add an HF token if the repo is gated or private
  6. Click Prepare weights
Large models can take several minutes to wrap. When the job finishes, copy the generated models: block into your config repo.

Add the model block

The generated block looks like this:
tinfoil-config.yml
models:
  - name: "gemma-4-31b-it"
    repo: "google/gemma-4-31B-it@419b2efe421994fdfd3394e621983d4cc511cd4f"
    mpk: "0900ca6b913db0036792149d3ea5862986d66a6964b010e998f56fbb7e1276ab_62578683904_59fe9787-ed93-577a-9fd9-a7804c932a11"
The mpk value is generated by Tinfoil and includes the model root hash, verity offset, and verity UUID. Keep it exactly as generated.

Point your server at the mounted model

At boot, Tinfoil verifies the model artifact and mounts it read-only under /tinfoil/mpk. In your inference server command, use:
tinfoil-config.yml
command: [
  "--model", "/tinfoil/mpk/mpk-0900ca6b913db0036792149d3ea5862986d66a6964b010e998f56fbb7e1276ab",
  "--served-model-name", "gemma-4-31b-it",
  "--port", "8001"
]
The path uses only the root hash portion of the mpk value:
/tinfoil/mpk/mpk-<root_hash>

Example vLLM config

tinfoil-config.yml
cvm-version: 0.7.5
cpus: 16
memory: 65536
gpus: 1

models:
  - name: "gemma-4-31b-it"
    repo: "google/gemma-4-31B-it@419b2efe421994fdfd3394e621983d4cc511cd4f"
    mpk: "0900ca6b913db0036792149d3ea5862986d66a6964b010e998f56fbb7e1276ab_62578683904_59fe9787-ed93-577a-9fd9-a7804c932a11"

containers:
  - name: "inference"
    image: "vllm/vllm-openai:v0.14.1@sha256:..."
    runtime: nvidia
    gpus: all
    ipc: host
    restart: always
    command: [
      "--model", "/tinfoil/mpk/mpk-0900ca6b913db0036792149d3ea5862986d66a6964b010e998f56fbb7e1276ab",
      "--served-model-name", "gemma-4-31b-it",
      "--port", "8001"
    ]
    healthcheck:
      test: ["CMD", "curl", "-sf", "http://localhost:8001/health"]
      interval: 30s
      timeout: 5s
      start_period: 30m

shim:
  upstream-port: 8001
  paths:
    - /v1/chat/completions
    - /v1/models
    - /health
After editing the config, commit it, tag a release, wait for Build and Attest to complete, then deploy or update from the All Containers tab.

Updating weights

To update a model, prepare the new Hugging Face commit from the Models tab, replace the repo and mpk values in tinfoil-config.yml, then tag and deploy a new release.
Keep each deployment pinned to a specific Hugging Face commit. Avoid relying on a moving default branch for production workloads.