Documentation Index
Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt
Use this file to discover all available pages before exploring further.
Available models and capabilities are subject to change. If you require SLA guarantees, specific model availability, or long-term production usage, please contact us to discuss your needs. We’re also happy to work with you to add support for your desired model.
Chat models use the OpenAI chat completions API.

DeepSeek V4 Pro
deepseek-v4-pro Parameters: 1.6T total (49B activated)Context: 800K tokens on TinfoilStrengths: Long-context reasoning, coding, math, agentic tasks, and efficient MoE inference with hybrid attentionStructured Outputs: Structured response formatting supportBest for: Very long-context reasoning, complex coding and math, and agentic workflows that need a large working contextConfiguration repo: tinfoilsh/confidential-deepseek-v4-proLong Context: The upstream model card describes one-million-token context support; this Tinfoil deployment is configured for an 800K-token context window.
Parameters: 754B (40B active)Context: 200K tokensStrengths: State-of-the-art agentic engineering, long-horizon tool use, sustained reasoning over hundreds of iterationsStructured Outputs: Structured response formatting supportBest for: Agentic engineering tasks, complex coding workflows, repo-level code generation, and long-running tool-use sessionsConfiguration repo: tinfoilsh/confidential-glm-5-1
Parameters: 1T total (32B activated)Context: 256K tokensStrengths: Long-horizon coding, image and video understanding, generates code and interfaces from visual inputs, large-scale agent orchestration, strong tool callingStructured Outputs: Structured response formatting supportBest for: Agentic coding, design-to-code workflows, multimodal applications, and long-running tool-based tasks that benefit from strong reasoningConfiguration repo: tinfoilsh/confidential-kimi-k2-6Vision + Language: Supports text, image, and video inputs with native reasoning and tool calling for agentic workflows.
Parameters: 31BContext: 256K tokensStrengths: Built-in thinking mode, image understanding, native function calling, multilingual support for 35+ languagesStructured Outputs: Structured response formatting supportBest for: Reasoning tasks, coding, image analysis, and agentic workflows with tool callingConfiguration repo: tinfoilsh/confidential-gemma4-31bVision + Language: Processes text and image inputs. Features step-by-step reasoning with configurable thinking mode.
Parameters: 117B (5.1B active)Context: 131K tokensStrengths: Configurable reasoning effort levels, full chain-of-thought access, built-in capabilities including function calling, web browsing, and Python code executionStructured Outputs: Structured response formatting supportBest for: Production use cases requiring configurable reasoning and tool useConfiguration repo: tinfoilsh/confidential-gpt-oss-120b

Llama 3.3 70B
llama3-3-70b Parameters: 70BContext: 128K tokensStrengths: Multilingual, dialogue-optimized, function callingStructured Outputs: Structured response formatting supportBest for: Conversational AI applications and complex dialogue systemsConfiguration repo: tinfoilsh/confidential-llama-mistral-qwen-turbo
Structured Outputs: All chat models support structured outputs for reliable data extraction and API integration. Full JSON schema validation available in Python, Node, and Go SDKs. See the Structured Outputs Guide for implementation examples.