Best Tools for Structured Output and JSON Mode (2026)

Six tools dominate structured output in 2026: Instructor for Pydantic extraction, Outlines for constrained decoding, Pydantic AI for type-safe agents, LangChain for unified APIs, Marvin for decorator-based extraction, and PromptQuorum for cross-model testing. Each solves a different workflow bottleneck.

Problems Each Tool Solves

Structured output requires solving three interdependent problems: schema definition, API enforcement, and validation. Different tools attack these problems differently. Instructor handles all three in Python with retries. Outlines eliminates the validation step via constrained decoding. Pydantic AI adds type safety for agents. LangChain wraps provider APIs. Marvin prioritizes developer speed. PromptQuorum validates consistency across all models.

Problem	Instructor	Outlines	Pydantic AI	LangChain	Marvin
Define schema	Pydantic models	JSON Schema / GBNF	Pydantic models	Tool definitions	Python decorators
Enforce on API call	Retry + validation	Token-level constraint	API + validation	Provider JSON mode	Prompt injection
Validate response	Automatic	Guaranteed at generation	Type-checked	Manual	Automatic

Instructor: Pydantic Extraction

Instructor is the most widely adopted structured output library. It wraps any LLM API — OpenAI GPT-4.5, Claude 4.7, Gemini, Ollama, vLLM — and returns validated Pydantic models instead of raw text. Instructor handles retries automatically when validation fails, making it production-grade without extra error handling.

Compatible with 20+ LLM providers (OpenAI, Anthropic, Google, local models via Ollama/vLLM)
Pydantic v2 schemas: type hints, validation rules, docstring descriptions embedded in schema
Automatic retry with backoff on validation failure — no manual error handling needed
Works in Python and TypeScript (via Node.js adapter)
Apache 2.0 open-source, actively maintained
Pricing: Free (no additional cost beyond LLM API calls)

python

import instructor
from pydantic import BaseModel
from openai import OpenAI

class User(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 25 years old"}]
)
# user.name == "John", user.age == 25

Outlines: Constrained Decoding

Outlines enforces schema compliance at token generation time via constrained decoding. Instead of generating tokens then validating, Outlines limits valid tokens at each step to match your schema. This guarantees 100% schema compliance with zero hallucination risk, making it ideal for local models.

Works with llama.cpp, vLLM, transformers, NVIDIA NIM, and any HuggingFace model
JSON Schema or GBNF (GGML BNF) format schema definitions
Guaranteed schema compliance — no post-generation validation or retries needed
Faster than retry-based validation (fewer wasted tokens)
Free and open-source (Apache 2.0)
Best for local deployment and cost-sensitive workflows

Pydantic AI: Type-Safe Agents

Pydantic AI is a new framework (2025) that combines Pydantic models with first-class support for multi-turn agent conversations. It adds full type safety to agent loops while enforcing structured output on each turn. Designed for Python async workflows.

Pydantic v2 type system — full IDE support and type checking
Built-in structured output on every agent step
Async-first design for high-throughput applications
Supports OpenAI GPT, Anthropic Claude, Google Gemini, and local models via Ollama
Tool calling baked in — define tools as Python functions with type hints
Free to use (no additional cost beyond LLM API calls)

LangChain: Unified APIs

LangChain 0.1+ added with_structured_output() to all major chat models. This unifies structured output across OpenAI, Anthropic, Google, and local models behind a single API. If your team already uses LangChain chains or agents, this is the easiest path to structured output.

Unified API: one .with_structured_output() method works across all providers
Automatically converts LangChain tool definitions to provider-specific schema formats
Integrates seamlessly with chains, agents, and runnable workflows
Supports Pydantic models, TypedDict, and OpenAI schema definitions
Part of LangChain ecosystem (no extra dependencies)
Best for teams already invested in LangChain

Marvin: Decorator-Based Extraction

Marvin uses Python decorators to turn function signatures into typed LLM calls. You define a function signature with type hints, decorate it with @marvin.fn, and Marvin handles prompt generation and structured output validation automatically. Fastest path from idea to working code.

Decorator syntax: @marvin.fn turns Python signatures into LLM prompts
Works with OpenAI, Anthropic, Google, and local models
Type hints become schema — minimal boilerplate
Built-in validation and error handling
Suitable for prototyping and small-to-medium workflows
Free to use (pricing TBD as of April 2026)

PromptQuorum: Cross-Model Testing

PromptQuorum is not a structured output library itself, but a testing platform for validating structured output consistency across models. Run the same prompt against GPT-4.5, Claude 4.7 Opus, Gemini 3.1 Pro, and 20+ other models simultaneously. Measure schema compliance, latency, and cost per model.

Multi-model dispatch in a single API call — test one prompt against 25+ models
Structured output compliance metrics — pass rate, latency, cost per model
Identify models that hallucinate on your schema — avoid deploying to unreliable models
Consensus mode — find agreements between independent model runs
Works with Instructor, Outlines, Pydantic AI, LangChain, or raw LLM APIs
Free tier available, enterprise pricing for high-volume testing

Side-by-Side Comparison

Tool	Best For	Schema Format	Language	Local Models	Pricing	Learning Curve
Instructor	Python APIs + retries	Pydantic models	Python/TypeScript	Yes (Ollama)	Free	Low
Outlines	Local model deployment	JSON Schema/GBNF	Python	Yes (native)	Free	Medium
Pydantic AI	Type-safe agents	Pydantic models	Python	Yes (Ollama)	Free	Low
LangChain	Chains + agents	Tool definitions	Python/JS	Yes	Free	Medium
Marvin	Rapid prototyping	Type hints	Python	Yes	Free	Very low
PromptQuorum	Multi-model testing	API-agnostic	API-first	Via OpenAI proxy	Free tier + enterprise	Low

Choosing the Right Tool

Start by answering three questions: (1) Do you use LangChain already? (2) Do you need local model support? (3) How much validation complexity do you have?

Use Instructor if: You're building Python APIs and need automatic retries on validation failure. Best general-purpose choice.
Use Outlines if: You deploy local models (llama.cpp, vLLM) and want guaranteed schema compliance at generation time.
Use Pydantic AI if: You're building multi-turn agent workflows with type safety across all steps.
Use LangChain if: You already use LangChain chains or agents — with_structured_output() is the simplest addition.
Use Marvin if: You want to prototype rapidly and don't need complex validation — decorators are the fastest path.
Use PromptQuorum if: You need to test structured output consistency across GPT, Claude, and Gemini before production.

Adding Structured Output Step-by-Step

1
Define your output schema — Create a Pydantic model (Python), TypeScript interface, or JSON Schema describing the fields, types, and constraints you want the LLM to return.
2
Choose a library — Instructor for Python APIs, Outlines for local models, Pydantic AI for agents, LangChain if already in use, Marvin for speed.
3
Install and wrap your LLM call — `pip install instructor` (Python), then pass your schema to the API call. Instructor handles validation and retries.
4
Test with PromptQuorum — Deploy to PromptQuorum and run your prompt against GPT, Claude, and Gemini. Measure schema compliance per model.
5
Refine schema based on failures — If a model fails validation, add examples to your prompt or adjust schema constraints. Iterate until all models pass.

Common Structured Output Mistakes

❌ Using JSON mode without validation

Why it hurts: API JSON mode (OpenAI response_format, Anthropic JSON control) only hints at JSON structure — it does NOT guarantee your schema is obeyed. Models still hallucinate field names and types.

Fix: Always layer validation on top: use Instructor, Outlines, or Pydantic AI. Never trust JSON mode alone. Test with PromptQuorum to catch compliance failures.

❌ Designing schemas that are too strict

Why it hurts: Overly constrained schemas (tiny enum lists, very specific regex patterns) cause LLMs to fail validation frequently. High retry counts waste tokens and money.

Fix: Use PromptQuorum to test schema strictness across models. Loosen constraints to achieve 95%+ compliance. Use optional fields instead of required ones when possible.

❌ Not testing local vs. API model differences

Why it hurts: Outlines on llama.cpp behaves differently than Instructor on GPT-4.5. Schema compliance rates differ per model. Building only for GPT, then deploying locally, causes production failures.

Fix: Test all intended model backends early. Use PromptQuorum to run the same prompt across local (vLLM), API (OpenAI, Anthropic), and open-source (Gemini) models.

❌ Ignoring latency and token cost impact

Why it hurts: Structured output with retries costs more tokens. Instructor retries on failure. Outlines constrained decoding is slower than free generation. Not measuring per-model cost.

Fix: Use PromptQuorum cost tracking. Compare latency across models. For budget-conscious workflows, prefer Outlines (no retries). For accuracy, accept Instructor's retry cost.

❌ Mixing validation methods (no consistency)

Why it hurts: Some requests use Instructor, others use raw JSON parsing. Some models validated, others not. This leads to inconsistent errors in production.

Fix: Standardize on one validation approach per codebase. All requests use Instructor, or all use Outlines. Consistency reduces debugging time by 10x.

What is structured output in LLMs?

Structured output constrains LLM responses to a specific schema — JSON format, defined fields, type constraints. Instead of free-text replies, structured output returns data your code can directly parse and validate without error handling.

Which tool is best for Python developers?

Instructor is the most popular Python choice. It uses Pydantic models to define schemas, automatically handles retries and validation, and supports any LLM API (OpenAI, Anthropic, Google, Ollama). Pydantic AI is an alternative if you also want type-safe agent multi-turn conversations.

Can I use structured output with local models like Llama?

Yes. Outlines specializes in local model constrained decoding — it works with llama.cpp, vLLM, and transformers libraries. Outlines guarantees schema compliance at token generation time with zero hallucination risk. Instructor also supports Ollama if you run it as an API.

What is the difference between Instructor and Marvin?

Instructor uses Pydantic models to define schemas and handles extraction with error recovery. Marvin uses Python decorators — you decorate a function signature and Marvin auto-generates the LLM prompt. Instructor is more explicit (better for complex validations), Marvin is more concise (better for rapid prototyping).

Does LangChain support structured output?

Yes. LangChain 0.1+ includes with_structured_output() method on ChatOpenAI, ChatAnthropic, ChatGoogle, etc. It automatically converts LangChain tools to structured output schemas. Use this if you already use LangChain agents and want to add schema enforcement without switching libraries.

How do I test if structured output is reliable?

Use PromptQuorum to run the same prompt across multiple models and measure schema compliance. Different models (GPT-4.5, Claude 4.7, Gemini 3.1) have different structured output reliability. Test before deploying to production. Unit test with Instructor/Pydantic validation locally.

What does "constrained decoding" mean?

Constrained decoding limits token generation to only valid values according to your schema. Outlines does this by computing the set of valid next tokens at each step. This guarantees schema compliance without post-generation validation or retries, making it faster and more reliable than API-level JSON mode.

Can I use structured output without any library?

Technically, yes — you can prompt the model to return JSON and then parse it yourself. But validation will fail on hallucinations. All six tools solve this by either validating with retries (Instructor, Marvin), enforcing at decode time (Outlines), or wrapping provider APIs (LangChain, Pydantic AI).

Which tool has the best documentation?

LangChain and Pydantic AI have the most comprehensive docs due to their corporate backing. Instructor has excellent tutorials and examples despite being community-maintained. Outlines docs are technical but thorough. Marvin has quick-start guides.

Do I need all six tools or just one?

Start with one. Python developers should try Instructor or Pydantic AI. Local model teams should try Outlines. LangChain users should try LangChain's with_structured_output(). Use PromptQuorum to validate consistency across all models. Most teams use one tool + PromptQuorum for testing.

Sources

Instructor GitHub Repository — Official repository and docs for Instructor library
Outlines Documentation — Constrained decoding for guaranteed schema compliance
Pydantic AI — Type-safe agent framework with structured output
LangChain with_structured_output() — LangChain unified structured output API
Marvin Documentation — Decorator-based LLM extraction framework