Reasoning effort

Reasoning models can spend extra tokens thinking through a problem before they answer. The OpenAI-compatible reasoning_effort parameter controls how much of that thinking the model does. Higher effort generally improves quality on hard tasks at the cost of more latency and output tokens. Pass reasoning_effort as a string on the chat completions request. Use a reasoning-capable model and a value it supports.

from tinfoil import TinfoilAI

client = TinfoilAI(api_key="<YOUR_API_KEY>")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    reasoning_effort="medium",
    messages=[
        {"role": "user", "content": "What is 17 * 23? Think step by step."}
    ],
)

print(response.choices[0].message.content)

Swift’s ReasoningEffort enum covers none, minimal, low, medium, and high; pass other values with .customValue("xhigh"). Rust’s ReasoningEffort enum covers none, minimal, low, medium, high, and xhigh.

Supported values per model

The accepted values differ by model. Sending an unsupported value returns a 400 error.

Model	Type	Supported `reasoning_effort` values
`deepseek-v4-pro`	Chat	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`glm-5-2`	Chat	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`kimi-k2-6`	Chat / Vision	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`gemma4-31b`	Chat / Vision	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`qwen3-vl-30b`	Vision	`none`, `minimal`, `low`, `medium`, `high`, `xhigh`, `max`
`gpt-oss-120b`	Chat	`low`, `medium`, `high`
`gpt-oss-safeguard-120b`	Safety	`low`, `medium`, `high`

On the standard scale, none disables reasoning and effort increases up to max. The gpt-oss models use OpenAI’s Harmony response format, which defines only low, medium, and high; sending none, minimal, xhigh, or max to these models returns a 400 error. llama3-3-70b is not a reasoning model. It accepts the parameter without error but does not produce a reasoning trace.

Reading the reasoning trace

The model’s thinking is returned in the reasoning field of the response message, separate from the final answer in content. Higher effort produces a longer trace.

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    reasoning_effort="high",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)

message = response.choices[0].message
print("Reasoning:", message.reasoning)
print("Answer:", message.content)

Available models and their supported values can change. Query the models endpoint to see which models report "reasoning": true.