Structured Outputs
Structured outputs ensure that model responses match specific formats like JSON schemas, regex patterns, or predefined choices. Tinfoil uses vLLM’s guided decoding to constrain outputs by filtering next-token predictions, guaranteeing valid formats without post-processing.
Tinfoil supports structured outputs through vLLM’s guided decoding parameters: guided_json, guided_choice, guided_regex, and guided_grammar.
Benefits
- Format Enforcement: Token-level filtering ensures outputs match your exact format
- Type Safety: Works with Pydantic (Python), Zod (TypeScript), and native types in Go
- No Post-Processing: Outputs are guaranteed valid
- Deterministic: Next-token prediction is constrained to produce only valid tokens
- Multiple Backends: Supports outlines, lm-format-enforcer, and xgrammar backends
Quick Start
Here are basic examples for each structured output type:
Guided Choice
Restrict output to a predefined list:
from tinfoil import TinfoilAI
client = TinfoilAI(api_key="your-api-key")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
],
extra_body={"guided_choice": ["positive", "negative"]}
)
print(response.choices[0].message.content)
Guided Regex
Enforce regex patterns for formatted outputs:
from tinfoil import TinfoilAI
client = TinfoilAI(api_key="your-api-key")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line."}
],
extra_body={
"guided_regex": r"\w+@\w+\.com\n",
"stop": ["\n"]
}
)
print(response.choices[0].message.content)
Guided JSON
Use Pydantic models for type-safe JSON generation:
from pydantic import BaseModel
from enum import Enum
from tinfoil import TinfoilAI
class CarType(str, Enum):
sedan = "sedan"
suv = "SUV"
truck = "Truck"
coupe = "Coupe"
class CarDescription(BaseModel):
brand: str
model: str
car_type: CarType
client = TinfoilAI(api_key="your-api-key")
json_schema = CarDescription.model_json_schema()
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's"}
],
extra_body={"guided_json": json_schema}
)
print(response.choices[0].message.content)
Include in your prompt that a JSON should be generated and which fields to fill. This can improve results significantly.
Guided Grammar
Use EBNF grammars for complex structures like SQL:
from tinfoil import TinfoilAI
client = TinfoilAI(api_key="your-api-key")
simplified_sql_grammar = """
?start: select_statement
?select_statement: "SELECT " column_list " FROM " table_name
?column_list: column_name ("," column_name)*
?table_name: identifier
?column_name: identifier
?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
"""
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table."}
],
extra_body={"guided_grammar": simplified_sql_grammar}
)
print(response.choices[0].message.content)
Advanced Features
Backend Selection
Choose a specific guided decoding backend:
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[...],
extra_body={
"guided_json": json_schema,
"guided_decoding_backend": "xgrammar"
}
)
Available backends: outlines, lm-format-enforcer, xgrammar
Whitespace Pattern Override
Customize whitespace handling in JSON decoding:
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[...],
extra_body={
"guided_json": json_schema,
"guided_whitespace_pattern": r"[ \t\n]*"
}
)
Complex Nested Schemas
Build complex nested structures with Pydantic:
from pydantic import BaseModel
from tinfoil import TinfoilAI
class Address(BaseModel):
street: str
city: str
state: str
zip_code: str
class Employee(BaseModel):
name: str
age: int
email: str | None
addresses: list[Address]
class Company(BaseModel):
name: str
founded_year: int
employees: list[Employee]
headquarters: Address
client = TinfoilAI(api_key="your-api-key")
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[
{"role": "user", "content": "Create a company profile for TechCorp with 2 employees"}
],
extra_body={"guided_json": Company.model_json_schema()}
)
import json
company_data = json.loads(response.choices[0].message.content)
print(f"Company: {company_data['name']}, Employees: {len(company_data['employees'])}")
Best Practices
Markdown-Wrapped Responses: Some models may wrap JSON responses in markdown code blocks (```json ... ```). Strip the formatting before parsing the JSON.
Use Low Temperature for Deterministic Outputs
response = client.chat.completions.create(
model="<MODEL_NAME>",
temperature=0.1,
messages=[...],
extra_body={"guided_json": schema}
)
Validate Responses
from pydantic import ValidationError
try:
parsed = MySchema.model_validate_json(response.choices[0].message.content)
except ValidationError as e:
print(f"Validation failed: {e}")
Enable Streaming for Large Responses
response = client.chat.completions.create(
model="<MODEL_NAME>",
messages=[...],
extra_body={"guided_json": schema},
stream=True
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Additional Resources