Skip to main content

Structured Outputs

Structured outputs ensure that model responses match specific formats like JSON schemas, regex patterns, or predefined choices. Tinfoil uses vLLM’s guided decoding to constrain outputs by filtering next-token predictions, guaranteeing valid formats without post-processing.
Tinfoil supports structured outputs through vLLM’s guided decoding parameters: guided_json, guided_choice, guided_regex, and guided_grammar.

Benefits

  • Format Enforcement: Token-level filtering ensures outputs match your exact format
  • Type Safety: Works with Pydantic (Python), Zod (TypeScript), and native types in Go
  • No Post-Processing: Outputs are guaranteed valid
  • Deterministic: Next-token prediction is constrained to produce only valid tokens
  • Multiple Backends: Supports outlines, lm-format-enforcer, and xgrammar backends
For complete documentation, see the vLLM Structured Outputs Guide.

Quick Start

Here are basic examples for each structured output type:

Guided Choice

Restrict output to a predefined list:
from tinfoil import TinfoilAI

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={"guided_choice": ["positive", "negative"]}
)

print(response.choices[0].message.content)

Guided Regex

Enforce regex patterns for formatted outputs:
from tinfoil import TinfoilAI

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line."}
    ],
    extra_body={
        "guided_regex": r"\w+@\w+\.com\n",
        "stop": ["\n"]
    }
)

print(response.choices[0].message.content)

Guided JSON

Use Pydantic models for type-safe JSON generation:
from pydantic import BaseModel
from enum import Enum
from tinfoil import TinfoilAI

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType

client = TinfoilAI(api_key="your-api-key")

json_schema = CarDescription.model_json_schema()

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Generate a JSON with the brand, model and car_type of the most iconic car from the 90's"}
    ],
    extra_body={"guided_json": json_schema}
)

print(response.choices[0].message.content)
Include in your prompt that a JSON should be generated and which fields to fill. This can improve results significantly.

Guided Grammar

Use EBNF grammars for complex structures like SQL:
from tinfoil import TinfoilAI

client = TinfoilAI(api_key="your-api-key")

simplified_sql_grammar = """
    ?start: select_statement

    ?select_statement: "SELECT " column_list " FROM " table_name

    ?column_list: column_name ("," column_name)*

    ?table_name: identifier

    ?column_name: identifier

    ?identifier: /[a-zA-Z_][a-zA-Z0-9_]*/
"""

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Generate an SQL query to show the 'username' and 'email' from the 'users' table."}
    ],
    extra_body={"guided_grammar": simplified_sql_grammar}
)

print(response.choices[0].message.content)

Advanced Features

Backend Selection

Choose a specific guided decoding backend:
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[...],
    extra_body={
        "guided_json": json_schema,
        "guided_decoding_backend": "xgrammar"
    }
)
Available backends: outlines, lm-format-enforcer, xgrammar

Whitespace Pattern Override

Customize whitespace handling in JSON decoding:
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[...],
    extra_body={
        "guided_json": json_schema,
        "guided_whitespace_pattern": r"[ \t\n]*"
    }
)

Complex Nested Schemas

Build complex nested structures with Pydantic:
Python
from pydantic import BaseModel
from tinfoil import TinfoilAI

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Employee(BaseModel):
    name: str
    age: int
    email: str | None
    addresses: list[Address]

class Company(BaseModel):
    name: str
    founded_year: int
    employees: list[Employee]
    headquarters: Address

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Create a company profile for TechCorp with 2 employees"}
    ],
    extra_body={"guided_json": Company.model_json_schema()}
)

import json
company_data = json.loads(response.choices[0].message.content)
print(f"Company: {company_data['name']}, Employees: {len(company_data['employees'])}")

Best Practices

Markdown-Wrapped Responses: Some models may wrap JSON responses in markdown code blocks (```json ... ```). Strip the formatting before parsing the JSON.
Use Low Temperature for Deterministic Outputs
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    temperature=0.1,
    messages=[...],
    extra_body={"guided_json": schema}
)
Validate Responses
from pydantic import ValidationError

try:
    parsed = MySchema.model_validate_json(response.choices[0].message.content)
except ValidationError as e:
    print(f"Validation failed: {e}")
Enable Streaming for Large Responses
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[...],
    extra_body={"guided_json": schema},
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Additional Resources