Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Tinfoil’s web search lets models augment their answers with fresh information from the web. Search runs inside a secure enclave and talks directly to Exa, a search provider with Zero Data Retention (ZDR). That gives you:
  • Query privacy: queries go from the enclave to Exa over TLS. Tinfoil never sees the query contents in plaintext.
  • User anonymity from the search provider: all users share a single enclave-held API key, so Exa only sees the enclave’s IP address.
  • Legal protections: Exa’s ZDR agreement ensures queries are never written to persistent storage or sent to external subprocessors.
  • Optional PII protection: a safeguard model can block outgoing queries that contain sensitive information before they are sent to Exa.
  • Optional prompt-injection protection: a safeguard model can scan search results and fetched pages for prompt-injection attempts and filter them out before the model sees them.
You turn web search on per request. The response you get back is a normal chat completion (or Responses API response) with citations already attached - you do not need to handle any tool calls yourself.
Read our blog post on private AI web search for background.

Choosing an API surface

You can enable web search on either of Tinfoil’s OpenAI-compatible endpoints:
  • /v1/responses — recommended. Search progress and citations are surfaced through OpenAI’s native web_search_call items and response.web_search_call.* / response.output_text.annotation.added streaming events.
  • /v1/chat/completions — supported for compatibility. Citations are surfaced on the final message and on delta.annotations in streaming. The OpenAI Chat Completions spec has no native live-progress event for web search, so if you want a live progress UI there, see Streaming progress markers.
If you want to drive the tool loop yourself — for example, from a custom agent runtime — call the underlying MCP web-search server directly instead.

Quick start

Enable web search by adding web_search_options to a Chat Completions request, or a web_search tool to a Responses request. Optionally add pii_check_options and/or prompt_injection_check_options to enable the safety filters.

Chat Completions (/v1/chat/completions)

import { TinfoilAI } from 'tinfoil';

const client = new TinfoilAI({
  apiKey: process.env.TINFOIL_API_KEY,
});

const stream = await client.chat.completions.create({
  model: '<MODEL_NAME>',
  messages: [
    { role: 'user', content: 'What are the latest developments in quantum computing?' },
  ],
  web_search_options: {},
  pii_check_options: {},              // Optional: block queries containing PII
  prompt_injection_check_options: {}, // Optional: filter injected instructions from results
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices?.[0]?.delta;

  // Citations arrive as annotation deltas alongside content.
  if (delta?.annotations) {
    for (const annotation of delta.annotations) {
      if (annotation.type === 'url_citation') {
        const { title, url } = annotation.url_citation;
        console.log(`\nCitation: ${title} - ${url}`);
      }
    }
  }

  if (delta?.content) {
    process.stdout.write(delta.content);
  }
}

Responses (/v1/responses)

import { TinfoilAI } from 'tinfoil';

const client = new TinfoilAI({
  apiKey: process.env.TINFOIL_API_KEY,
});

const stream = await client.responses.create({
  model: '<MODEL_NAME>',
  input: 'What are the latest developments in quantum computing?',
  tools: [{ type: 'web_search' }],
  stream: true,
});

for await (const event of stream) {
  switch (event.type) {
    case 'response.web_search_call.in_progress':
    case 'response.web_search_call.searching':
    case 'response.web_search_call.completed':
      console.log(event.type, event.item_id);
      break;
    case 'response.output_text.delta':
      process.stdout.write(event.delta);
      break;
    case 'response.output_text.annotation.added':
      console.log('\nCitation:', event.annotation.title, event.annotation.url);
      break;
  }
}

Request options

Top-level request fields

FieldAPIRequiredDescription
web_search_optionsChat CompletionsYes, to enableEnables web search. Accepts the tuning fields listed below.
tools: [{ "type": "web_search", ... }]ResponsesYes, to enableEnables web search. Per-tool fields match web_search_options.
pii_check_optionsBothNoBlock queries containing PII from being sent to Exa. Presence of the key enables the filter.
prompt_injection_check_optionsBothNoFilter prompt-injection attempts out of search results and fetched pages. Presence of the key enables the filter.
include: ["web_search_call.action.sources"]ResponsesNoOpt in to populating action.sources on web_search_call output items with the URLs each search returned.

Search tuning fields

These fields are all optional. They have the same meaning on both APIs. On Chat Completions they go under web_search_options.<field>; on Responses they go on the web_search tool entry (tools[].<field>).
FieldTypeDescription
search_context_size"low" | "medium" | "high"Retrieval-depth tier. low favors short highlight snippets, medium is the default, high pulls more results and longer content per result.
user_locationobjectApproximate location context. Only approximate.country (ISO 3166-1 alpha-2) is honored today.
filters.allowed_domainsstring[]Restrict search results to these hostnames.
filters.excluded_domainsstring[]Drop these hostnames from search results.
content_mode"highlights" | "text"Override what each result carries. Defaults to the tier choice implied by search_context_size.
max_content_charsintegerPer-result character budget for returned content. Defaults to the tier choice implied by search_context_size.
categorystringNarrow search to a topical category (for example, news, research paper).
start_published_dateYYYY-MM-DD or RFC 3339Only return results published on or after this date.
end_published_dateYYYY-MM-DD or RFC 3339Only return results published on or before this date.
max_age_hoursintegerOnly return results from the last N hours.
Example combining several options on Chat Completions:
{
  "model": "<MODEL_NAME>",
  "messages": [{ "role": "user", "content": "Best restaurants near me" }],
  "web_search_options": {
    "search_context_size": "medium",
    "user_location": {
      "type": "approximate",
      "approximate": { "country": "US" }
    },
    "filters": {
      "allowed_domains": ["yelp.com", "tripadvisor.com"]
    },
    "max_age_hours": 720,
    "category": "news"
  }
}
The equivalent on the Responses API:
{
  "model": "<MODEL_NAME>",
  "input": "Best restaurants near me",
  "tools": [{
    "type": "web_search",
    "search_context_size": "medium",
    "user_location": {
      "type": "approximate",
      "approximate": { "country": "US" }
    },
    "filters": {
      "allowed_domains": ["yelp.com", "tripadvisor.com"]
    },
    "max_age_hours": 720,
    "category": "news"
  }],
  "include": ["web_search_call.action.sources"]
}
Safety filters are opt-in per request. Include pii_check_options to block PII-bearing queries before they reach Exa, and prompt_injection_check_options to filter injection attempts out of search results and fetched pages. Both are independent; enabling one does not enable the other.

Response format

Chat Completions

The response is a standard OpenAI chat.completion. Citations appear in two places:
  • inline in the assistant text as ASCII markdown links ([label](url)),
  • as structured url_citation annotations on the assistant message, whose start_index / end_index span the label text.
Non-streaming example:
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Quantum error correction reached a milestone in 2025 [Nature](https://www.nature.com/article/...).",
      "annotations": [{
        "type": "url_citation",
        "url_citation": {
          "url": "https://www.nature.com/article/...",
          "title": "Nature article title",
          "start_index": 44,
          "end_index": 50
        }
      }]
    },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 123, "completion_tokens": 456, "total_tokens": 579 }
}
In streaming mode, the same citations arrive as delta.annotations entries interleaved with delta.content:
{
  "choices": [{
    "index": 0,
    "delta": {
      "annotations": [{
        "type": "url_citation",
        "url_citation": {
          "url": "https://www.nature.com/article/...",
          "title": "Nature article title",
          "start_index": 44,
          "end_index": 50
        }
      }]
    }
  }]
}
The Chat Completions stream is a standard chat.completion.chunk stream — there are no custom top-level events on this API. If you want live search progress on Chat Completions, see Streaming progress markers.

Responses

The response.output array contains a web_search_call item for each search or page fetch, followed by the assistant message item. Citations are attached to each output_text content part as flat url_citation annotations.
{
  "id": "resp_...",
  "object": "response",
  "status": "completed",
  "output": [
    {
      "type": "web_search_call",
      "id": "ws_abc123",
      "status": "completed",
      "action": {
        "type": "search",
        "query": "latest quantum error correction results 2025"
      }
    },
    {
      "type": "web_search_call",
      "id": "ws_def456",
      "status": "completed",
      "action": {
        "type": "open_page",
        "url": "https://www.nature.com/article/..."
      }
    },
    {
      "type": "message",
      "role": "assistant",
      "status": "completed",
      "content": [{
        "type": "output_text",
        "text": "Quantum error correction reached a milestone in 2025 [Nature](https://www.nature.com/article/...).",
        "annotations": [{
          "type": "url_citation",
          "url": "https://www.nature.com/article/...",
          "title": "Nature article title",
          "start_index": 44,
          "end_index": 50
        }]
      }]
    }
  ],
  "usage": { "input_tokens": 123, "output_tokens": 456, "total_tokens": 579 }
}
The annotation shape is flat on Responses ({type, url, start_index, end_index, title}) whereas Chat Completions nests it under url_citation. When the request opts in with include: ["web_search_call.action.sources"], each search-kind web_search_call also carries the URLs the search returned:
{
  "type": "web_search_call",
  "id": "ws_abc123",
  "status": "completed",
  "action": {
    "type": "search",
    "query": "latest quantum error correction results 2025",
    "sources": [
      { "type": "url", "url": "https://www.nature.com/article/..." },
      { "type": "url", "url": "https://arxiv.org/abs/..." }
    ]
  }
}

Streaming events (Responses)

Responses streaming emits OpenAI’s standard events. The ones relevant to web search are:
EventWhen
response.output_item.addedA web_search_call item is surfaced, initially with status: "in_progress".
response.web_search_call.in_progressA search or page fetch is starting.
response.web_search_call.searchingA search is being sent to the provider. Page fetches do not emit this event.
response.web_search_call.completedA search or fetch finished successfully. Not emitted on failure.
response.output_item.doneTerminal event for the item. event.item.status is completed or failed.
Assistant text streams through the usual response.output_text.delta and response.output_text.done events. Citations arrive as response.output_text.annotation.added:
{
  "type": "response.output_text.annotation.added",
  "output_index": 2,
  "item_id": "msg_...",
  "content_index": 0,
  "annotation_index": 0,
  "annotation": {
    "type": "url_citation",
    "url": "https://www.nature.com/article/...",
    "title": "Nature article title",
    "start_index": 44,
    "end_index": 50
  }
}
The stream terminates with a single response.completed event carrying the full output and usage.

Failure and blocked statuses

A search or page fetch that fails surfaces with status: "failed" on the terminal web_search_call item (OpenAI’s web_search_call.status enum is in_progress, searching, completed, failed). When a search is blocked by the PII filter, the spec-visible status is still failed because the OpenAI enum has no blocked value. The distinct “blocked by safety filter” signal is exposed through an optional _tinfoil sidecar that Tinfoil-aware clients can read to render a different affordance.

Optional: streaming progress markers (Chat Completions)

The OpenAI Chat Completions spec has no native event for web-search progress. If you want a live progress UI on Chat Completions (a spinner, a “searching the web…” line, a per-URL fetch indicator), you can opt in to Tinfoil progress markers. Send the request header:
X-Tinfoil-Events: web_search
When markers are enabled, progress payloads are carried as tagged JSON inside the normal delta.content of chat.completion.chunk frames. Each marker is a standalone line:
<tinfoil-event>{"type":"tinfoil.web_search_call","item_id":"ws_...","status":"in_progress","action":{"type":"search","query":"..."}}</tinfoil-event>
The payload shape is:
{
  "type": "tinfoil.web_search_call",
  "item_id": "ws_abc123",
  "status": "in_progress" | "completed" | "failed" | "blocked",
  "action": { "type": "search", "query": "..." }
             | { "type": "open_page", "url": "..." },
  "error": { "code": "blocked_by_safety_filter" | "tool_error" },
  "sources": [{ "url": "...", "title": "..." }]
}
  • status follows a simple lifecycle: one in_progress marker when a search or fetch starts, followed by one terminal marker (completed, failed, or blocked).
  • action.type is search for web searches and open_page for page fetches. A search that triggers multiple page fetches produces one marker pair per URL.
  • error is only present on failed and blocked markers.
  • sources is only present on terminal markers for search calls that produced results. Each entry is a {url, title} pair attributing a citation to this specific call. title may be an empty string; clients should fall back to the URL or hostname in that case.
The same marker pairs are also prefixed onto the assistant content string in non-streaming Chat Completions responses when the header is set, so you can render an identical progress timeline from either mode.
Clients that do not parse the markers will render them as text inside the assistant message. Either parse and strip them, or leave the header off to get a pristine stream with no markers.

Consuming markers on the client

A single regex is enough to extract and strip markers. The leading and trailing newlines are absorbed so the text before and after a marker collapses seamlessly:
\n?<tinfoil-event>[\s\S]*?</tinfoil-event>\n?
const MARKER_RE = /\n?<tinfoil-event>([\s\S]*?)<\/tinfoil-event>\n?/g;

// For each chat.completion.chunk you receive from the SSE stream,
// run its delta.content through this helper before rendering it.
function onContentDelta(content: string) {
  const visible = content.replace(MARKER_RE, (_full, payload) => {
    try {
      const evt = JSON.parse(payload);
      if (evt.type === 'tinfoil.web_search_call') {
        renderSearchProgress(evt); // your own progress UI
      }
    } catch {
      // Ignore malformed markers; treat them as plain text.
    }
    return '';
  });
  if (visible) appendAssistantText(visible);
}
The marker channel is Tinfoil-specific. If you want a portable progress UI, prefer the Responses API’s native response.web_search_call.* events instead.

Optional: the _tinfoil sidecar

On the Responses API, web_search_call.status is restricted to OpenAI’s enum: in_progress, searching, completed, failed. Tinfoil exposes richer information through an optional vendor-extension field named _tinfoil that rides alongside the envelope:
{
  "type": "web_search_call",
  "id": "ws_...",
  "status": "failed",
  "action": { "type": "search", "query": "..." },
  "_tinfoil": {
    "status": "blocked",
    "error": { "code": "blocked_by_safety_filter" }
  }
}
Rules:
  • _tinfoil is only present on failed calls. Successful calls omit it entirely.
  • _tinfoil.status is only present when the unfiltered status differs from the envelope status. Today that means it appears when a call was blocked by safety filters.
  • _tinfoil.error.code is present on every failed call. Known codes are blocked_by_safety_filter and tool_error.
Strict OpenAI SDKs treat unknown object keys as no-ops, so _tinfoil is invisible unless you read it explicitly. The sidecar appears on:
  • the non-streaming response.output[*] web_search_call items,
  • the streaming response.output_item.done.item for a web_search_call.

Optional: usage reporting

To have aggregated token usage surfaced as an HTTP response header or trailer, send:
X-Tinfoil-Request-Usage-Metrics: true
Tinfoil returns the usage summary in X-Tinfoil-Usage-Metrics:
  • as a response header for non-streaming requests,
  • as an HTTP trailer for streaming requests.
If you also want the final usage-bearing chat.completion.chunk on streaming Chat Completions, request it through standard OpenAI stream_options:
{
  "stream": true,
  "stream_options": { "include_usage": true }
}

PII protection

The pii_check_options field prevents search queries containing sensitive personally identifiable information from being sent to Exa. When PII is detected, the query is blocked and the model responds without search results for that turn. Blocked PII types:
  • Government IDs: social security numbers, tax IDs, passport numbers, driver’s licenses, voter IDs, national IDs
  • Financial: bank account numbers, credit card numbers, IBANs
  • Contact: personal email addresses, personal phone numbers, home addresses
  • Linkable identifiers: VINs, license plates, device serial numbers
  • Identifying combinations: name + date of birth, name + address, or other combinations that identify a specific person
Not blocked:
  • Names alone
  • Dates of birth alone
  • Business or corporate contact information
  • Public figures’ public information
Generic descriptions without identifying details are allowed. A blocked search surfaces as:
  • On the Responses API, a web_search_call output item with status: "failed" and a _tinfoil.status: "blocked" sidecar.
  • On Chat Completions with progress markers enabled, a marker with status: "blocked" and error.code: "blocked_by_safety_filter".

Prompt-injection protection

The prompt_injection_check_options field runs a safeguard model over each search snippet and fetched page before the content is handed back to the responding model. Results and pages that contain instructions aimed at hijacking the model (for example “ignore previous instructions”, embedded tool-use directives, or credential-exfiltration prompts) are dropped. When every result for a query is filtered out, the model sees an empty result set for that search and answers without web grounding. Fetch failures due to injection filtering surface as status: "failed" on the corresponding web_search_call item. Prompt-injection filtering is opt-in per request. Omit the field to skip the check.

Multi-turn conversations (Responses API)

The Responses API supports previous_response_id to continue a conversation across turns. Prior search results and fetched pages are carried forward so the model can build on them:
{
  "model": "<MODEL_NAME>",
  "input": "Can you tell me more about the first result?",
  "previous_response_id": "resp_abc12345",
  "tools": [{ "type": "web_search" }]
}