Advanced·4 min read

Structured Output

Structured output is the ability to constrain an LLM to produce output that conforms to a specific format or schema — most commonly JSON, but also XML

Definition

Structured output is the ability to constrain an LLM to produce output that conforms to a specific format or schema — most commonly JSON, but also XML, CSV, or custom formats. It enables reliable programmatic consumption of LLM responses without fragile string parsing.

The Problem with Free-Form Output

Without structured output, LLMs produce conversational text:

`

"Sure! The customer's name is John Smith, he's 32 years old,

and he's from New York. Let me know if you need anything else!"

`

Extracting data from this with code is brittle — the model might say "thirty two" or "NYC" next time.

With structured output:

`json

{"name": "John Smith", "age": 32, "city": "New York"}

`

Guaranteed parseable. Same every time.

Methods for Getting Structured Output

1. Prompt-Based (Unreliable)

Just ask nicely:

`

"Return your response as a JSON object with fields: name, age, city."

`

Problem: model still sometimes adds preamble ("Here is the JSON: ..."), wraps in markdown code blocks, or deviates from the schema. Requires validation + retry logic.

2. JSON Mode (API-Level Enforcement)

Most providers offer a JSON mode that guarantees valid JSON:

`python

OpenAI

response = client.chat.completions.create(

model="gpt-4o",

response_format={"type": "json_object"},

messages=[...]

)

`

Limitation: guarantees valid JSON but NOT adherence to a specific schema.

3. Structured Outputs / JSON Schema (Strongest)

Specify the exact schema the model must follow:

`python

OpenAI structured outputs

from pydantic import BaseModel

class Customer(BaseModel):

name: str

age: int

city: str

response = client.beta.chat.completions.parse(

model="gpt-4o",

messages=[...],

response_format=Customer

)

customer = response.choices[0].message.parsed # Typed Python object

`

Uses constrained decoding (grammar sampling) to guarantee schema adherence at the token level.

4. Constrained Decoding (Outlines / LMQL)

At the sampling level, restrict which tokens can be generated based on the current state of the schema:

  • Using a grammar/regex/JSON schema as a constraint
  • The model physically cannot generate tokens that would violate the schema
  • Libraries: Outlines, LMQL, Guidance, llama.cpp grammar mode
  • 5. Tool Use for Extraction

    Frame the structured output task as a tool call:

    `json

    {

    "name": "save_customer",

    "description": "Save extracted customer information",

    "parameters": {"type": "object", "properties": {"name": ..., "age": ...}}

    }

    `

    The model is forced to "call" the tool with structured arguments.

    Schema Types Supported

    | Format | Use Case |

    |--------|---------|

    | JSON Schema | Most common, API data, databases |

    | Pydantic models | Python type-safe parsing |

    | TypeScript interfaces | JavaScript/TypeScript applications |

    | XML Schema (XSD) | Enterprise/legacy systems |

    | Regex patterns | Simple extractions |

    | Enum constraints | Classifications |

    Common Structured Output Use Cases

    Data Extraction

    `json

    {

    "entities": [

    {"text": "Apple", "type": "company"},

    {"text": "Tim Cook", "type": "person"}

    ],

    "sentiment": "positive",

    "summary": "..."

    }

    `

    Classification with Confidence

    `json

    {

    "category": "billing",

    "subcategory": "refund_request",

    "urgency": "high",

    "confidence": 0.92

    }

    `

    Multi-Step Reasoning Output

    `json

    {

    "reasoning_steps": ["step 1...", "step 2...", "step 3..."],

    "final_answer": "42",

    "confidence": "high"

    }

    `

    Tool Call Parameters

    `json

    {

    "tool_name": "search_web",

    "parameters": {"query": "...", "max_results": 5}

    }

    `

    Schema Design Best Practices

    1. Keep it flat when possible — deeply nested schemas confuse models more

    2. Use descriptive field names — models use field names as semantic hints

    3. Add descriptions — JSON Schema description fields guide the model

    4. Use enums for constrained values — prevents creative interpretations

    5. Mark optional fields — required vs. optional matters for completeness

    6. Test with edge cases — null values, empty lists, ambiguous inputs

    Provider Support (2024)

    | Provider | JSON Mode | JSON Schema | Constrained |

    |----------|-----------|-------------|-------------|

    | OpenAI | ✅ | ✅ (Structured Outputs) | Via tool use |

    | Anthropic | Via prompt | Via tool use | Via tool use |

    | AWS Bedrock | Model-dependent | Model-dependent | — |

    | Ollama | ✅ (grammar) | ✅ (JSON schema) | ✅ (grammar files) |

    | vLLM | ✅ (guided) | ✅ (guided JSON) | ✅ (Outlines) |

    Reliability Hierarchy (Best → Worst)

    `

    Constrained decoding (grammar) > JSON Schema enforcement > JSON mode > Prompt-only

    `

    Related Concepts

  • Tool Use, Prompt Engineering, Inference, Validation, Pydantic, API, Agent

Go Deeper With Live Instruction

This topic is covered in depth in our llm engineering program (Session 10).