> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt
> Use this file to discover all available pages before exploring further.

# Document processing

> Learn how to use Tinfoil for secure and private document processing.

## Document Processing API

Tinfoil's document processing service extracts structured Markdown from uploaded documents — including PDFs, DOCX, PPTX, XLSX, HTML, CSV, and images. The entire service runs inside a [secure enclave](/containers/overview), and the VLM used for OCR and visual extraction also runs in its own secure enclave — so your documents are never exposed to any operator. Born-digital PDFs are parsed using [MuPDF](https://mupdf.com/) inside a sandboxed subprocess with no network access, environment variables, or filesystem; scanned pages and images are sent to the VLM for OCR.

You can use document processing in two ways:

* Call `/v1/convert/file` directly when you want extracted Markdown (or page images) back from the document service.
* Send a base64-encoded file through the OpenAI-compatible `/v1/responses` or `/v1/chat/completions` APIs. Tinfoil privately converts the attachment and forwards either Markdown (for text-only models) or per-page Markdown plus page images (for vision-capable models) to the model. You can override the default with the optional [`tinfoil_mode`](#4-override-the-pdf-processing-mode) field.

<Note>
  **Current scope:** OpenAI-compatible file input support currently accepts base64 `file_data` only. `file_id` and the `/v1/files` upload flow are not supported.
</Note>

### 1. Convert A Document Directly

The document processing endpoint accepts `multipart/form-data` requests at `/v1/convert/file`. Upload one or more files with field name `files`.

You can control extraction behavior with the `mode` query parameter:

| Mode             | Description                                                                                                                  |
| ---------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `text` (default) | Markdown from the text layer. VLM OCR only for scanned pages.                                                                |
| `vision`         | Text plus VLM OCR for scanned pages and VLM visual descriptions (tables, charts, diagrams, formulas) for born-digital pages. |
| `images`         | Per-page text plus page images as base64 PNG. No VLM.                                                                        |
| `raw`            | Text layer only. No VLM, no image rendering.                                                                                 |
| `vlm`            | Full-page VLM OCR on every page.                                                                                             |

<CodeGroup>
  ```javascript JavaScript theme={"dark"}
  import { SecureClient } from 'tinfoil'
  import fs from 'fs'

  const client = new SecureClient()

  const fileBuffer = fs.readFileSync('doc.pdf')
  const blob = new Blob([fileBuffer], { type: 'application/pdf' })

  const formData = new FormData()
  formData.append('files', blob, 'doc.pdf')

  // Default mode — fast, no VLM for born-digital PDFs
  const response = await client.fetch('/v1/convert/file', {
    method: 'POST',
    body: formData,
  })

  const result = await response.json()
  // result.document.md_content contains the converted Markdown
  console.log(result.document.md_content)
  ```

  ```rust Rust theme={"dark"}
  // Requires reqwest with the "multipart" feature enabled.
  use tinfoil::Client;
  use tokio::fs;

  #[tokio::main]
  async fn main() -> Result<(), Box<dyn std::error::Error>> {
      let client = Client::new_default().await?;

      let pdf_bytes = fs::read("doc.pdf").await?;
      let part = reqwest::multipart::Part::bytes(pdf_bytes)
          .file_name("doc.pdf")
          .mime_str("application/pdf")?;
      let form = reqwest::multipart::Form::new()
          .part("files", part)
          .text("model", "doc-upload")
          .text("to_formats", "md")
          .text("from_formats", "pdf")
          .text("pipeline", "standard");

      let secure = client.secure_client();
      let response = secure.http_client()?
          .post(format!("{}/v1/convert/file", secure.base_url()))
          .bearer_auth(secure.api_key())
          .multipart(form)
          .send()
          .await?
          .error_for_status()?
          .json::<serde_json::Value>()
          .await?;

      println!("{}", response["document"]["md_content"].as_str().unwrap_or(""));
      Ok(())
  }
  ```
</CodeGroup>

The response includes the extracted Markdown content. When uploading a single file, the result is in `document`; for multiple files, results are in a `documents` array:

```json theme={"dark"}
{
  "document": {
    "md_content": "# Title\n\nExtracted text..."
  },
  "status": "success",
  "processing_time": 1.23
}
```

In `images` mode, each page includes its extracted text, a base64-encoded PNG, and a scanned/born-digital flag:

```json theme={"dark"}
{
  "document": {
    "md_content": "# Title\n\nExtracted text...",
    "pages": [
      { "page": 1, "text": "# Title\n\nExtracted text...", "image": "iVBORw0KGgo...", "is_scanned": false },
      { "page": 2, "text": "",                              "image": "iVBORw0KGgo...", "is_scanned": true },
      { "page": 3, "text": "## Conclusion\n\n...",          "image": "iVBORw0KGgo...", "is_scanned": false }
    ]
  },
  "status": "success",
  "processing_time": 2.45
}
```

`text` mirrors the per-page slice of `md_content`; pure scans come back with an empty `text` field.

When uploading multiple files, the response uses `documents` (an array) instead of `document`:

```json theme={"dark"}
{
  "documents": [
    { "md_content": "# First document..." },
    { "md_content": "# Second document..." }
  ],
  "status": "success",
  "processing_time": 3.21
}
```

#### Pairing `images` Mode With A Vision Model

For a vision-capable model (e.g. `kimi-k2-6`, `gemma4-31b`), interleave each page's text and image and wrap the raw `image` base64 as a data URI:

```javascript theme={"dark"}
const convertResp = await client.fetch('/v1/convert/file?mode=images', {
  method: 'POST',
  body: formData,
})
const { document } = await convertResp.json()

const content = [{ type: 'text', text: '[Attached file: doc.pdf]' }]
for (const p of document.pages) {
  const label = p.is_scanned ? `Page ${p.page} (scanned):` : `Page ${p.page}:`
  content.push({ type: 'text', text: p.text ? `${label}\n${p.text}` : label })
  content.push({
    type: 'image_url',
    image_url: { url: `data:image/png;base64,${p.image}` },
  })
}
content.push({ type: 'text', text: 'What is this PDF about?' })

const visionResp = await client.fetch('/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'kimi-k2-6',
    messages: [{ role: 'user', content }],
  }),
})
```

This recovers visual elements that text extraction discards — illustrations, diagrams, color-coding, page decorations, and other layout cues — while still giving the model accurate, parser-extracted text.

When you instead attach a PDF as base64 `file_data` on `/v1/responses` or `/v1/chat/completions` with a vision-capable model, Tinfoil performs this same per-page interleave automatically.

### 2. Use File Inputs With The Responses API

If you want OpenAI-compatible file attachments, send a base64-encoded file in an `input_file` content part on `/v1/responses`.

<CodeGroup>
  ```javascript JavaScript theme={"dark"}
  import { SecureClient } from 'tinfoil'
  import fs from 'fs'

  const client = new SecureClient()
  const fileData = fs.readFileSync('doc.pdf').toString('base64')

  const response = await client.fetch('/v1/responses', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-oss-120b',
      input: [
        {
          role: 'user',
          content: [
            {
              type: 'input_file',
              filename: 'doc.pdf',
              file_data: `data:application/pdf;base64,${fileData}`,
            },
            {
              type: 'input_text',
              text: 'Summarize this document in 3 bullet points.',
            },
          ],
        },
      ],
    }),
  })

  const result = await response.json()
  console.log(result.output_text)
  ```

  ```rust Rust theme={"dark"}
  use base64::{engine::general_purpose::STANDARD as B64, Engine as _};
  use serde_json::json;
  use tinfoil::Client;
  use tokio::fs;

  #[tokio::main]
  async fn main() -> Result<(), Box<dyn std::error::Error>> {
      let client = Client::new_default().await?;
      let file_data = B64.encode(fs::read("doc.pdf").await?);

      let secure = client.secure_client();
      let response = secure.http_client()?
          .post(format!("{}/v1/responses", secure.base_url()))
          .bearer_auth(secure.api_key())
          .json(&json!({
              "model": "gpt-oss-120b",
              "input": [{
                  "role": "user",
                  "content": [
                      {
                          "type": "input_file",
                          "filename": "doc.pdf",
                          "file_data": format!("data:application/pdf;base64,{}", file_data),
                      },
                      { "type": "input_text", "text": "Summarize this document in 3 bullet points." }
                  ]
              }]
          }))
          .send()
          .await?
          .error_for_status()?
          .json::<serde_json::Value>()
          .await?;

      println!("{}", response["output_text"].as_str().unwrap_or(""));
      Ok(())
  }
  ```
</CodeGroup>

For binary formats such as PDF, DOCX, PPTX, and images, Tinfoil processes the attachment through the private document-processing backend before forwarding it to the model. By default the router picks the best shape per attachment:

| Routed model   | Default PDF / image behavior                       |
| -------------- | -------------------------------------------------- |
| Vision-capable | Per-page interleaved Markdown **and** page images. |
| Text-only      | Markdown only, for speed.                          |

You can check whether a model is vision-capable via the `multimodal` field on `GET /v1/models`. DOCX, PPTX, XLSX, HTML, CSV, and plain text attachments are always forwarded as extracted Markdown regardless of the routed model. You can override the default per attachment with [`tinfoil_mode`](#4-override-the-pdf-processing-mode).

### 3. Use File Inputs With Chat Completions

The OpenAI-compatible Chat Completions shape uses `type: "file"` with a nested `file` object.

<CodeGroup>
  ```javascript JavaScript theme={"dark"}
  import { SecureClient } from 'tinfoil'
  import fs from 'fs'

  const client = new SecureClient()
  const fileData = fs.readFileSync('doc.pdf').toString('base64')

  const response = await client.fetch('/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'gpt-oss-120b',
      messages: [
        {
          role: 'user',
          content: [
            {
              type: 'file',
              file: {
                filename: 'doc.pdf',
                file_data: `data:application/pdf;base64,${fileData}`,
              },
            },
            {
              type: 'text',
              text: 'Summarize this document in 3 bullet points.',
            },
          ],
        },
      ],
    }),
  })

  const result = await response.json()
  console.log(result.choices[0].message.content)
  ```

  ```rust Rust theme={"dark"}
  use base64::{engine::general_purpose::STANDARD as B64, Engine as _};
  use serde_json::json;
  use tinfoil::Client;
  use tokio::fs;

  #[tokio::main]
  async fn main() -> Result<(), Box<dyn std::error::Error>> {
      let client = Client::new_default().await?;
      let file_data = B64.encode(fs::read("doc.pdf").await?);

      let body = client.chat_relaxed().request()
          .model("gpt-oss-120b")
          .messages([json!({
              "role": "user",
              "content": [
                  {
                      "type": "file",
                      "file": {
                          "filename": "doc.pdf",
                          "file_data": format!("data:application/pdf;base64,{}", file_data),
                      }
                  },
                  { "type": "text", "text": "Summarize this document in 3 bullet points." }
              ]
          })]);

      let response = client.chat_relaxed().create(body).await?;
      println!("{}", response.content().unwrap_or(""));
      Ok(())
  }
  ```
</CodeGroup>

### 4. Override The PDF Processing Mode

Set the optional Tinfoil-specific `tinfoil_mode` field directly on the file content part to override the auto-default — for example to force VLM full-page OCR on a low-quality scan:

```javascript theme={"dark"}
const response = await client.fetch('/v1/responses', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gpt-oss-120b',
    input: [{
      role: 'user',
      content: [
        {
          type: 'input_file',
          filename: 'scan.pdf',
          file_data: `data:application/pdf;base64,${fileData}`,
          tinfoil_mode: 'vlm',
        },
        { type: 'input_text', text: 'Extract every line as plain text.' },
      ],
    }],
  }),
})
```

The router consumes the field and strips it before the request is forwarded, so the upstream model never sees it.

| Value            | Behavior                                                                                            |
| ---------------- | --------------------------------------------------------------------------------------------------- |
| `auto` (default) | `images` for vision-capable models, `text` for text-only models.                                    |
| `text`           | Markdown from the text layer; VLM OCR only on scanned pages.                                        |
| `vision`         | Markdown plus VLM visual descriptions for figures, charts, and tables.                              |
| `images`         | Per-page interleaved Markdown and images. Requires a vision-capable model; returns `400` otherwise. |
| `raw`            | Text layer only. No VLM, no image rendering.                                                        |
| `vlm`            | Full-page VLM OCR on every page. Highest quality, slowest.                                          |

`tinfoil_mode` only affects PDF and image attachments; for DOCX, PPTX, XLSX, HTML, CSV, and plain text the field has no effect.

<Note>
  `tinfoil_mode` is a Tinfoil-specific extension and is not understood by OpenAI's API. If your code needs to target both Tinfoil and OpenAI from the same request body, omit the field.
</Note>

On Chat Completions the field nests inside the `file` object alongside `filename` and `file_data`:

```json theme={"dark"}
{
  "type": "file",
  "file": {
    "filename": "doc.pdf",
    "file_data": "data:application/pdf;base64,...",
    "tinfoil_mode": "text"
  }
}
```

### Supported Formats

| Format                      | Extraction                   |
| --------------------------- | ---------------------------- |
| PDF (born-digital)          | MuPDF text layer to Markdown |
| PDF (scanned)               | VLM OCR                      |
| DOCX, PPTX, HTML, XLSX, CSV | Server-side parsers          |
| Images                      | VLM OCR                      |
| Markdown, text, JSON, XML   | Passthrough                  |

### Errors And Limits

Per request: up to 10 files, 50 MB each, `multipart/form-data` only.

All non-2xx responses are `{"error": "<message>"}`.

`/health` reflects the state of the different pipeline elements:

```json theme={"dark"}
{ "status": "ok",       "router": true, "sidecar": true, "vlm": true  }
{ "status": "degraded", "router": true, "sidecar": true, "vlm": false }
```

### Attestation

The document upload API uses the same attestation mechanism as other Tinfoil services. Use `SecureClient` (as shown above) to verify attestation automatically.

<CardGroup cols={2}>
  <Card title="Try Private Chat" icon="comment" href="https://tinfoil.sh/chat">
    Experience document upload in our private chat interface with real-time privacy verification.
  </Card>

  <Card title="Configuration Repo" icon="github" href="https://github.com/tinfoilsh/confidential-doc-upload">
    View the open-source configuration for Tinfoil's confidential document processing service.
  </Card>
</CardGroup>