Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt

Use this file to discover all available pages before exploring further.

Available models and capabilities are subject to change. If you require SLA guarantees, specific model availability, or long-term production usage, please contact us to discuss your needs. We’re also happy to work with you to add support for your desired model.
Chat models use the OpenAI chat completions API.

DeepSeek
DeepSeek V4 Pro
deepseek-v4-pro
Parameters: 1.6T total (49B activated)Context: 800K tokens on TinfoilStrengths: Long-context reasoning, coding, math, agentic tasks, and efficient MoE inference with hybrid attentionStructured Outputs: Structured response formatting supportBest for: Very long-context reasoning, complex coding and math, and agentic workflows that need a large working contextConfiguration repo: tinfoilsh/confidential-deepseek-v4-pro
Long Context: The upstream model card describes one-million-token context support; this Tinfoil deployment is configured for an 800K-token context window.

Z.AI
GLM-5.1
glm-5-1
Parameters: 754B (40B active)Context: 200K tokensStrengths: State-of-the-art agentic engineering, long-horizon tool use, sustained reasoning over hundreds of iterationsStructured Outputs: Structured response formatting supportBest for: Agentic engineering tasks, complex coding workflows, repo-level code generation, and long-running tool-use sessionsConfiguration repo: tinfoilsh/confidential-glm-5-1

Moonshot
Kimi K2.6
kimi-k2-6
Parameters: 1T total (32B activated)Context: 256K tokensStrengths: Long-horizon coding, image and video understanding, generates code and interfaces from visual inputs, large-scale agent orchestration, strong tool callingStructured Outputs: Structured response formatting supportBest for: Agentic coding, design-to-code workflows, multimodal applications, and long-running tool-based tasks that benefit from strong reasoningConfiguration repo: tinfoilsh/confidential-kimi-k2-6
Vision + Language: Supports text, image, and video inputs with native reasoning and tool calling for agentic workflows.

Google DeepMind
Gemma 4 31B
gemma4-31b
Parameters: 31BContext: 256K tokensStrengths: Built-in thinking mode, image understanding, native function calling, multilingual support for 35+ languagesStructured Outputs: Structured response formatting supportBest for: Reasoning tasks, coding, image analysis, and agentic workflows with tool callingConfiguration repo: tinfoilsh/confidential-gemma4-31b
Vision + Language: Processes text and image inputs. Features step-by-step reasoning with configurable thinking mode.

OpenAI
GPT-OSS 120B
gpt-oss-120b
Parameters: 117B (5.1B active)Context: 131K tokensStrengths: Configurable reasoning effort levels, full chain-of-thought access, built-in capabilities including function calling, web browsing, and Python code executionStructured Outputs: Structured response formatting supportBest for: Production use cases requiring configurable reasoning and tool useConfiguration repo: tinfoilsh/confidential-gpt-oss-120b

Llama
Llama 3.3 70B
llama3-3-70b
Parameters: 70BContext: 128K tokensStrengths: Multilingual, dialogue-optimized, function callingStructured Outputs: Structured response formatting supportBest for: Conversational AI applications and complex dialogue systemsConfiguration repo: tinfoilsh/confidential-llama-mistral-qwen-turbo
Structured Outputs: All chat models support structured outputs for reliable data extraction and API integration. Full JSON schema validation available in Python, Node, and Go SDKs. See the Structured Outputs Guide for implementation examples.