Documentation Index
Fetch the complete documentation index at: https://docs.tinfoil.sh/llms.txt
Use this file to discover all available pages before exploring further.
Structured Outputs
Structured outputs ensure that model responses match specific formats like JSON schemas, regex patterns, or predefined choices. Tinfoil uses vLLM’s guided decoding to constrain outputs by filtering next-token predictions, guaranteeing valid formats without post-processing.
Tinfoil supports structured outputs through vLLM. Use response_format with json_schema type for JSON outputs, or structured_outputs for choice and regex constraints. In Python, pass it via extra_body. In JavaScript, pass it directly on the request body (the OpenAI Node SDK does not support extra_body).
Benefits
- Format Enforcement: Token-level filtering ensures outputs match your exact format
- Type Safety: Works with Pydantic (Python), Zod (TypeScript), and native types in Go
- No Post-Processing: Outputs are guaranteed valid
- Deterministic: Next-token prediction is constrained to produce only valid tokens
- Multiple Backends: Supports xgrammar and guidance backends
Quick Start
Here are basic examples for each structured output type:
Choice
Restrict output to a predefined list:
from tinfoil import TinfoilAI
client = TinfoilAI(api_key="<YOUR_API_KEY>")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={"structured_outputs": {"choice": ["positive", "negative"]}}
)
print(response.choices[0].message.content)
Regex
Enforce regex patterns for formatted outputs:
from tinfoil import TinfoilAI
client = TinfoilAI(api_key="<YOUR_API_KEY>")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com."}
],
extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com"}}
)
print(response.choices[0].message.content)
JSON
Use response_format with json_schema type for reliable JSON generation:
from pydantic import BaseModel
from enum import Enum
from tinfoil import TinfoilAI
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
client = TinfoilAI(api_key="<YOUR_API_KEY>")
json_schema = CarDescription.model_json_schema()
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Output a JSON object with the brand, model, and car_type of the most iconic car from the 90's."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "car-description",
"schema": json_schema
}
}
)
print(response.choices[0].message.content)
Prompt explicitly for JSON. While structured_outputs enforces valid JSON structure, the model produces more reliable results when your prompt explicitly requests JSON output and describes the expected fields. For example, use “Output a JSON object with…” rather than just “Generate…”
Advanced Features
Whitespace Pattern Override
Customize whitespace handling in JSON decoding by combining response_format with extra_body:
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my-schema",
"schema": json_schema
}
},
extra_body={
"structured_outputs": {
"whitespace_pattern": r"[ \t\n]*"
}
}
)
Complex Nested Schemas
Build complex nested structures with Pydantic:
from pydantic import BaseModel
from tinfoil import TinfoilAI
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Employee(BaseModel):
name: str
age: int
email: str | None
addresses: list[Address]
class Company(BaseModel):
name: str
founded_year: int
employees: list[Employee]
headquarters: Address
client = TinfoilAI(api_key="<YOUR_API_KEY>")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Output a JSON object for a company profile. The company is named TechCorp and has 2 employees. Include the company name, founded_year, employees (each with name, age, email, and addresses), and headquarters address."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "company",
"schema": Company.model_json_schema()
}
}
)
import json
company_data = json.loads(response.choices[0].message.content)
print(f"Company: {company_data['name']}, Employees: {len(company_data['employees'])}")
Best Practices
Markdown-Wrapped Responses: Some models may wrap JSON responses in markdown code blocks (```json ... ```). Strip the formatting before parsing the JSON.
Use Low Temperature for Deterministic Outputs
response = client.chat.completions.create(
model="<MODEL_NAME>",
temperature=0.1,
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my-schema",
"schema": schema
}
}
)
Validate Responses
from pydantic import ValidationError
try:
parsed = MySchema.model_validate_json(response.choices[0].message.content)
except ValidationError as e:
print(f"Validation failed: {e}")
Enable Streaming for Large Responses
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my-schema",
"schema": schema
}
},
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Additional Resources