Problems Each Tool Solves
Structured output requires solving three interdependent problems: schema definition, API enforcement, and validation. Different tools attack these problems differently. Instructor handles all three in Python with retries. Outlines eliminates the validation step via constrained decoding. Pydantic AI adds type safety for agents. LangChain wraps provider APIs. Marvin prioritizes developer speed. PromptQuorum validates consistency across all models.
| Problem | Instructor | Outlines | Pydantic AI | LangChain | Marvin |
|---|---|---|---|---|---|
| Define schema | Pydantic models | JSON Schema / GBNF | Pydantic models | Tool definitions | Python decorators |
| Enforce on API call | Retry + validation | Token-level constraint | API + validation | Provider JSON mode | Prompt injection |
| Validate response | Automatic | Guaranteed at generation | Type-checked | Manual | Automatic |
Instructor: Pydantic Extraction
Instructor is the most widely adopted structured output library. It wraps any LLM API β OpenAI GPT-4.5, Claude 4.7, Gemini, Ollama, vLLM β and returns validated Pydantic models instead of raw text. Instructor handles retries automatically when validation fails, making it production-grade without extra error handling.
- Compatible with 20+ LLM providers (OpenAI, Anthropic, Google, local models via Ollama/vLLM)
- Pydantic v2 schemas: type hints, validation rules, docstring descriptions embedded in schema
- Automatic retry with backoff on validation failure β no manual error handling needed
- Works in Python and TypeScript (via Node.js adapter)
- Apache 2.0 open-source, actively maintained
- Pricing: Free (no additional cost beyond LLM API calls)
import instructor
from pydantic import BaseModel
from openai import OpenAI
class User(BaseModel):
name: str
age: int
client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
model="gpt-4o",
response_model=User,
messages=[{"role": "user", "content": "Extract: John is 25 years old"}]
)
# user.name == "John", user.age == 25Outlines: Constrained Decoding
Outlines enforces schema compliance at token generation time via constrained decoding. Instead of generating tokens then validating, Outlines limits valid tokens at each step to match your schema. This guarantees 100% schema compliance with zero hallucination risk, making it ideal for local models.
- Works with llama.cpp, vLLM, transformers, NVIDIA NIM, and any HuggingFace model
- JSON Schema or GBNF (GGML BNF) format schema definitions
- Guaranteed schema compliance β no post-generation validation or retries needed
- Faster than retry-based validation (fewer wasted tokens)
- Free and open-source (Apache 2.0)
- Best for local deployment and cost-sensitive workflows
Pydantic AI: Type-Safe Agents
Pydantic AI is a new framework (2025) that combines Pydantic models with first-class support for multi-turn agent conversations. It adds full type safety to agent loops while enforcing structured output on each turn. Designed for Python async workflows.
- Pydantic v2 type system β full IDE support and type checking
- Built-in structured output on every agent step
- Async-first design for high-throughput applications
- Supports OpenAI GPT, Anthropic Claude, Google Gemini, and local models via Ollama
- Tool calling baked in β define tools as Python functions with type hints
- Free to use (no additional cost beyond LLM API calls)
LangChain: Unified APIs
LangChain 0.1+ added with_structured_output() to all major chat models. This unifies structured output across OpenAI, Anthropic, Google, and local models behind a single API. If your team already uses LangChain chains or agents, this is the easiest path to structured output.
- Unified API: one .with_structured_output() method works across all providers
- Automatically converts LangChain tool definitions to provider-specific schema formats
- Integrates seamlessly with chains, agents, and runnable workflows
- Supports Pydantic models, TypedDict, and OpenAI schema definitions
- Part of LangChain ecosystem (no extra dependencies)
- Best for teams already invested in LangChain
Marvin: Decorator-Based Extraction
Marvin uses Python decorators to turn function signatures into typed LLM calls. You define a function signature with type hints, decorate it with @marvin.fn, and Marvin handles prompt generation and structured output validation automatically. Fastest path from idea to working code.
- Decorator syntax: @marvin.fn turns Python signatures into LLM prompts
- Works with OpenAI, Anthropic, Google, and local models
- Type hints become schema β minimal boilerplate
- Built-in validation and error handling
- Suitable for prototyping and small-to-medium workflows
- Free to use (pricing TBD as of April 2026)
PromptQuorum: Cross-Model Testing
PromptQuorum is not a structured output library itself, but a testing platform for validating structured output consistency across models. Run the same prompt against GPT-4.5, Claude 4.7 Opus, Gemini 3.1 Pro, and 20+ other models simultaneously. Measure schema compliance, latency, and cost per model.
- Multi-model dispatch in a single API call β test one prompt against 25+ models
- Structured output compliance metrics β pass rate, latency, cost per model
- Identify models that hallucinate on your schema β avoid deploying to unreliable models
- Consensus mode β find agreements between independent model runs
- Works with Instructor, Outlines, Pydantic AI, LangChain, or raw LLM APIs
- Free tier available, enterprise pricing for high-volume testing
Side-by-Side Comparison
| Tool | Best For | Schema Format | Language | Local Models | Pricing | Learning Curve |
|---|---|---|---|---|---|---|
| Instructor | Python APIs + retries | Pydantic models | Python/TypeScript | Yes (Ollama) | Free | Low |
| Outlines | Local model deployment | JSON Schema/GBNF | Python | Yes (native) | Free | Medium |
| Pydantic AI | Type-safe agents | Pydantic models | Python | Yes (Ollama) | Free | Low |
| LangChain | Chains + agents | Tool definitions | Python/JS | Yes | Free | Medium |
| Marvin | Rapid prototyping | Type hints | Python | Yes | Free | Very low |
| PromptQuorum | Multi-model testing | API-agnostic | API-first | Via OpenAI proxy | Free tier + enterprise | Low |
Choosing the Right Tool
Start by answering three questions: (1) Do you use LangChain already? (2) Do you need local model support? (3) How much validation complexity do you have?
- Use Instructor if: You're building Python APIs and need automatic retries on validation failure. Best general-purpose choice.
- Use Outlines if: You deploy local models (llama.cpp, vLLM) and want guaranteed schema compliance at generation time.
- Use Pydantic AI if: You're building multi-turn agent workflows with type safety across all steps.
- Use LangChain if: You already use LangChain chains or agents β with_structured_output() is the simplest addition.
- Use Marvin if: You want to prototype rapidly and don't need complex validation β decorators are the fastest path.
- Use PromptQuorum if: You need to test structured output consistency across GPT, Claude, and Gemini before production.
Adding Structured Output Step-by-Step
- 1Define your output schema β Create a Pydantic model (Python), TypeScript interface, or JSON Schema describing the fields, types, and constraints you want the LLM to return.
- 2Choose a library β Instructor for Python APIs, Outlines for local models, Pydantic AI for agents, LangChain if already in use, Marvin for speed.
- 3Install and wrap your LLM call β `pip install instructor` (Python), then pass your schema to the API call. Instructor handles validation and retries.
- 4Test with PromptQuorum β Deploy to PromptQuorum and run your prompt against GPT, Claude, and Gemini. Measure schema compliance per model.
- 5Refine schema based on failures β If a model fails validation, add examples to your prompt or adjust schema constraints. Iterate until all models pass.
Common Structured Output Mistakes
β Using JSON mode without validation
Why it hurts: API JSON mode (OpenAI response_format, Anthropic JSON control) only hints at JSON structure β it does NOT guarantee your schema is obeyed. Models still hallucinate field names and types.
Fix: Always layer validation on top: use Instructor, Outlines, or Pydantic AI. Never trust JSON mode alone. Test with PromptQuorum to catch compliance failures.
β Designing schemas that are too strict
Why it hurts: Overly constrained schemas (tiny enum lists, very specific regex patterns) cause LLMs to fail validation frequently. High retry counts waste tokens and money.
Fix: Use PromptQuorum to test schema strictness across models. Loosen constraints to achieve 95%+ compliance. Use optional fields instead of required ones when possible.
β Not testing local vs. API model differences
Why it hurts: Outlines on llama.cpp behaves differently than Instructor on GPT-4.5. Schema compliance rates differ per model. Building only for GPT, then deploying locally, causes production failures.
Fix: Test all intended model backends early. Use PromptQuorum to run the same prompt across local (vLLM), API (OpenAI, Anthropic), and open-source (Gemini) models.
β Ignoring latency and token cost impact
Why it hurts: Structured output with retries costs more tokens. Instructor retries on failure. Outlines constrained decoding is slower than free generation. Not measuring per-model cost.
Fix: Use PromptQuorum cost tracking. Compare latency across models. For budget-conscious workflows, prefer Outlines (no retries). For accuracy, accept Instructor's retry cost.
β Mixing validation methods (no consistency)
Why it hurts: Some requests use Instructor, others use raw JSON parsing. Some models validated, others not. This leads to inconsistent errors in production.
Fix: Standardize on one validation approach per codebase. All requests use Instructor, or all use Outlines. Consistency reduces debugging time by 10x.
What is structured output in LLMs?
Structured output constrains LLM responses to a specific schema β JSON format, defined fields, type constraints. Instead of free-text replies, structured output returns data your code can directly parse and validate without error handling.
Which tool is best for Python developers?
Instructor is the most popular Python choice. It uses Pydantic models to define schemas, automatically handles retries and validation, and supports any LLM API (OpenAI, Anthropic, Google, Ollama). Pydantic AI is an alternative if you also want type-safe agent multi-turn conversations.
Can I use structured output with local models like Llama?
Yes. Outlines specializes in local model constrained decoding β it works with llama.cpp, vLLM, and transformers libraries. Outlines guarantees schema compliance at token generation time with zero hallucination risk. Instructor also supports Ollama if you run it as an API.
What is the difference between Instructor and Marvin?
Instructor uses Pydantic models to define schemas and handles extraction with error recovery. Marvin uses Python decorators β you decorate a function signature and Marvin auto-generates the LLM prompt. Instructor is more explicit (better for complex validations), Marvin is more concise (better for rapid prototyping).
Does LangChain support structured output?
Yes. LangChain 0.1+ includes with_structured_output() method on ChatOpenAI, ChatAnthropic, ChatGoogle, etc. It automatically converts LangChain tools to structured output schemas. Use this if you already use LangChain agents and want to add schema enforcement without switching libraries.
How do I test if structured output is reliable?
Use PromptQuorum to run the same prompt across multiple models and measure schema compliance. Different models (GPT-4.5, Claude 4.7, Gemini 3.1) have different structured output reliability. Test before deploying to production. Unit test with Instructor/Pydantic validation locally.
What does "constrained decoding" mean?
Constrained decoding limits token generation to only valid values according to your schema. Outlines does this by computing the set of valid next tokens at each step. This guarantees schema compliance without post-generation validation or retries, making it faster and more reliable than API-level JSON mode.
Can I use structured output without any library?
Technically, yes β you can prompt the model to return JSON and then parse it yourself. But validation will fail on hallucinations. All six tools solve this by either validating with retries (Instructor, Marvin), enforcing at decode time (Outlines), or wrapping provider APIs (LangChain, Pydantic AI).
Which tool has the best documentation?
LangChain and Pydantic AI have the most comprehensive docs due to their corporate backing. Instructor has excellent tutorials and examples despite being community-maintained. Outlines docs are technical but thorough. Marvin has quick-start guides.
Do I need all six tools or just one?
Start with one. Python developers should try Instructor or Pydantic AI. Local model teams should try Outlines. LangChain users should try LangChain's with_structured_output(). Use PromptQuorum to validate consistency across all models. Most teams use one tool + PromptQuorum for testing.
Sources
- Instructor GitHub Repository β Official repository and docs for Instructor library
- Outlines Documentation β Constrained decoding for guaranteed schema compliance
- Pydantic AI β Type-safe agent framework with structured output
- LangChain with_structured_output() β LangChain unified structured output API
- Marvin Documentation β Decorator-based LLM extraction framework