Reasoning effort
Reasoning models can spend extra tokens thinking through a problem before they answer. The OpenAI-compatiblereasoning_effort parameter controls how much of that thinking the model does. Higher effort generally improves quality on hard tasks at the cost of more latency and output tokens.
Pass reasoning_effort as a string on the chat completions request. Use a reasoning-capable model and a value it supports.
Swift’s
ReasoningEffort enum covers none, minimal, low, medium, and high; pass other values with .customValue("xhigh"). Rust’s ReasoningEffort enum covers none, minimal, low, medium, high, and xhigh.Supported values per model
The accepted values differ by model. Sending an unsupported value returns a400 error.
| Model | Type | Supported reasoning_effort values |
|---|---|---|
deepseek-v4-pro | Chat | none, minimal, low, medium, high, xhigh, max |
glm-5-2 | Chat | none, minimal, low, medium, high, xhigh, max |
kimi-k2-6 | Chat / Vision | none, minimal, low, medium, high, xhigh, max |
gemma4-31b | Chat / Vision | none, minimal, low, medium, high, xhigh, max |
qwen3-vl-30b | Vision | none, minimal, low, medium, high, xhigh, max |
gpt-oss-120b | Chat | low, medium, high |
gpt-oss-safeguard-120b | Safety | low, medium, high |
none disables reasoning and effort increases up to max. The gpt-oss models use OpenAI’s Harmony response format, which defines only low, medium, and high; sending none, minimal, xhigh, or max to these models returns a 400 error.
llama3-3-70b is not a reasoning model. It accepts the parameter without error but does not produce a reasoning trace.
Reading the reasoning trace
The model’s thinking is returned in thereasoning field of the response message, separate from the final answer in content. Higher effort produces a longer trace.
Available models and their supported values can change. Query the models endpoint to see which models report
"reasoning": true.
