Chat models

Available models and capabilities are subject to change. If you require SLA guarantees, specific model availability, or long-term production usage, please contact us to discuss your needs. We’re also happy to work with you to add support for your desired model.

Chat models use the OpenAI chat completions API.

DeepSeek V4 Pro

deepseek-v4-pro

Parameters: 1.6T total (49B activated)Context: 800K tokens on TinfoilStrengths: Long-context reasoning, coding, math, agentic tasks, and efficient MoE inference with hybrid attentionStructured Outputs: Structured response formatting supportBest for: Very long-context reasoning, complex coding and math, and agentic workflows that need a large working contextConfiguration repo: tinfoilsh/confidential-deepseek-v4-pro

Long Context: The upstream model card describes one-million-token context support; this Tinfoil deployment is configured for an 800K-token context window.

GLM-5.1

glm-5-1

Parameters: 754B (40B active)Context: 200K tokensStrengths: State-of-the-art agentic engineering, long-horizon tool use, sustained reasoning over hundreds of iterationsStructured Outputs: Structured response formatting supportBest for: Agentic engineering tasks, complex coding workflows, repo-level code generation, and long-running tool-use sessionsConfiguration repo: tinfoilsh/confidential-glm-5-1

Kimi K2.6

kimi-k2-6

Parameters: 1T total (32B activated)Context: 256K tokensStrengths: Long-horizon coding, image and video understanding, generates code and interfaces from visual inputs, large-scale agent orchestration, strong tool callingStructured Outputs: Structured response formatting supportBest for: Agentic coding, design-to-code workflows, multimodal applications, and long-running tool-based tasks that benefit from strong reasoningConfiguration repo: tinfoilsh/confidential-kimi-k2-6

Vision + Language: Supports text, image, and video inputs with native reasoning and tool calling for agentic workflows.

Gemma 4 31B

gemma4-31b

Parameters: 31BContext: 256K tokensStrengths: Built-in thinking mode, image understanding, native function calling, multilingual support for 35+ languagesStructured Outputs: Structured response formatting supportBest for: Reasoning tasks, coding, image analysis, and agentic workflows with tool callingConfiguration repo: tinfoilsh/confidential-gemma4-31b

Vision + Language: Processes text and image inputs. Features step-by-step reasoning with configurable thinking mode.

GPT-OSS 120B

gpt-oss-120b

Parameters: 117B (5.1B active)Context: 131K tokensStrengths: Configurable reasoning effort levels, full chain-of-thought access, built-in capabilities including function calling, web browsing, and Python code executionStructured Outputs: Structured response formatting supportBest for: Production use cases requiring configurable reasoning and tool useConfiguration repo: tinfoilsh/confidential-gpt-oss-120b

Llama 3.3 70B

llama3-3-70b

Parameters: 70BContext: 128K tokensStrengths: Multilingual, dialogue-optimized, function callingStructured Outputs: Structured response formatting supportBest for: Conversational AI applications and complex dialogue systemsConfiguration repo: tinfoilsh/confidential-llama-mistral-qwen-turbo

Structured Outputs: All chat models support structured outputs for reliable data extraction and API integration. Full JSON schema validation available in Python, Node, and Go SDKs. See the Structured Outputs Guide for implementation examples.

Getting Started

Model catalog

Tinfoil SDKs

Tinfoil Containers

Guides

Verification & Attestation

Tutorials

Admin API

Resources

Getting Started

Model catalog

Tinfoil SDKs

Tinfoil Containers

Guides

Verification & Attestation

Tutorials

Admin API

Resources

Documentation Index