Skip to main content
Available models and capabilities are subject to change. If you require SLA guarantees, specific model availability, or long-term production usage, please contact us to discuss your needs. We’re also happy to work with you to add support for your desired model.
Chat models use the OpenAI chat completions API.

Z.AI
GLM-5.1
glm-5-1
Parameters: MoE (FP8 quantized)Context: 202K tokensStrengths: State-of-the-art agentic engineering, long-horizon tool use, sustained reasoning over hundreds of iterations, function callingStructured Outputs: Structured response formatting supportBest for: Agentic engineering tasks, complex coding workflows, repo-level code generation, and long-running tool-use sessionsConfiguration repo: tinfoilsh/confidential-glm-5-1

Moonshot
Kimi K2.5
kimi-k2-5
Parameters: 1T total (32B activated)Context: 256K tokensStrengths: Unified vision and text processing, image analysis, generates code from screenshots and mockups, parallel task execution across specialized sub-agentsStructured Outputs: Structured response formatting supportBest for: Building applications that process visual inputs, converting designs to code, orchestrating complex workflows with multiple parallel agentsConfiguration repo: tinfoilsh/confidential-kimi-k2-5
Vision + Language: Jointly trained on images and text. Handles visual reasoning tasks and can spawn coordinated sub-agents for complex problems.

Google DeepMind
Gemma 4 31B
gemma4-31b
Parameters: 31BContext: 256K tokensStrengths: Built-in thinking mode, image understanding, native function calling, multilingual support for 140+ languagesStructured Outputs: Structured response formatting supportBest for: Reasoning tasks, coding, image analysis, and agentic workflows with tool callingConfiguration repo: tinfoilsh/confidential-gemma4-31b
Vision + Language: Processes text and image inputs. Features step-by-step reasoning with configurable thinking mode.

OpenAI
GPT-OSS 120B
gpt-oss-120b
Parameters: 117BContext: 128K tokensStrengths: Configurable reasoning effort levels, full chain-of-thought access, built-in capabilities including function calling, web browsing, and Python code executionStructured Outputs: Structured response formatting supportBest for: Production use cases requiring configurable reasoning and tool useConfiguration repo: tinfoilsh/confidential-gpt-oss-120b

Llama
Llama 3.3 70B
llama3-3-70b
Context: 128K tokensStrengths: Multilingual, dialogue-optimized, function callingStructured Outputs: Structured response formatting supportBest for: Conversational AI applications and complex dialogue systemsConfiguration repo: tinfoilsh/confidential-llama-mistral-qwen-turbo
Structured Outputs: All chat models support structured outputs for reliable data extraction and API integration. Full JSON schema validation available in Python, Node, and Go SDKs. See the Structured Outputs Guide for implementation examples.