Skip to main content

Structured Outputs

Structured outputs ensure that model responses match specific formats like JSON schemas, regex patterns, or predefined choices. Tinfoil uses vLLM’s guided decoding to constrain outputs by filtering next-token predictions, guaranteeing valid formats without post-processing.
Tinfoil supports structured outputs through vLLM. Use response_format with json_schema type for JSON outputs, or extra_body with structured_outputs for choice and regex constraints.

Benefits

  • Format Enforcement: Token-level filtering ensures outputs match your exact format
  • Type Safety: Works with Pydantic (Python), Zod (TypeScript), and native types in Go
  • No Post-Processing: Outputs are guaranteed valid
  • Deterministic: Next-token prediction is constrained to produce only valid tokens
  • Multiple Backends: Supports xgrammar and guidance backends
For complete documentation, see the vLLM Structured Outputs Guide and the vLLM blog post on structured decoding.

Quick Start

Here are basic examples for each structured output type:

Choice

Restrict output to a predefined list:
from tinfoil import TinfoilAI

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Classify this sentiment: vLLM is wonderful!"}
    ],
    extra_body={"structured_outputs": {"choice": ["positive", "negative"]}}
)

print(response.choices[0].message.content)

Regex

Enforce regex patterns for formatted outputs:
from tinfoil import TinfoilAI

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Generate an example email address for Alan Turing, who works in Enigma. End in .com and new line."}
    ],
    extra_body={"structured_outputs": {"regex": r"\w+@\w+\.com\n"}},
    stop=["\n"]
)

print(response.choices[0].message.content)

JSON

Use response_format with json_schema type for reliable JSON generation:
from pydantic import BaseModel
from enum import Enum
from tinfoil import TinfoilAI

class CarType(str, Enum):
    sedan = "sedan"
    suv = "SUV"
    truck = "Truck"
    coupe = "Coupe"

class CarDescription(BaseModel):
    brand: str
    model: str
    car_type: CarType

client = TinfoilAI(api_key="your-api-key")

json_schema = CarDescription.model_json_schema()

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Output a JSON object with the brand, model, and car_type of the most iconic car from the 90's."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "car-description",
            "schema": json_schema
        }
    }
)

print(response.choices[0].message.content)
Prompt explicitly for JSON. While structured_outputs enforces valid JSON structure, the model produces more reliable results when your prompt explicitly requests JSON output and describes the expected fields. For example, use “Output a JSON object with…” rather than just “Generate…”

Advanced Features

Whitespace Pattern Override

Customize whitespace handling in JSON decoding by combining response_format with extra_body:
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my-schema",
            "schema": json_schema
        }
    },
    extra_body={
        "structured_outputs": {
            "whitespace_pattern": r"[ \t\n]*"
        }
    }
)

Complex Nested Schemas

Build complex nested structures with Pydantic:
Python
from pydantic import BaseModel
from tinfoil import TinfoilAI

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip_code: str

class Employee(BaseModel):
    name: str
    age: int
    email: str | None
    addresses: list[Address]

class Company(BaseModel):
    name: str
    founded_year: int
    employees: list[Employee]
    headquarters: Address

client = TinfoilAI(api_key="your-api-key")

response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[
        {"role": "user", "content": "Output a JSON object for a company profile. The company is named TechCorp and has 2 employees. Include the company name, founded_year, employees (each with name, age, email, and addresses), and headquarters address."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "company",
            "schema": Company.model_json_schema()
        }
    }
)

import json
company_data = json.loads(response.choices[0].message.content)
print(f"Company: {company_data['name']}, Employees: {len(company_data['employees'])}")

Best Practices

Markdown-Wrapped Responses: Some models may wrap JSON responses in markdown code blocks (```json ... ```). Strip the formatting before parsing the JSON.
Use Low Temperature for Deterministic Outputs
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    temperature=0.1,
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my-schema",
            "schema": schema
        }
    }
)
Validate Responses
from pydantic import ValidationError

try:
    parsed = MySchema.model_validate_json(response.choices[0].message.content)
except ValidationError as e:
    print(f"Validation failed: {e}")
Enable Streaming for Large Responses
response = client.chat.completions.create(
    model="<MODEL_NAME>",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my-schema",
            "schema": schema
        }
    },
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Additional Resources