PromptQuorumPromptQuorum
Home/Prompt Engineering/Best Tools for Structured Output and JSON Mode (2026)
Tools & Platforms

Best Tools for Structured Output and JSON Mode (2026)

Β·10 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

Six tools dominate structured output in 2026: Instructor for Pydantic extraction, Outlines for constrained decoding, Pydantic AI for type-safe agents, LangChain for unified APIs, Marvin for decorator-based extraction, and PromptQuorum for cross-model testing. Each solves a different workflow bottleneck.

Choose based on where your models run: Instructor and Pydantic AI for API-first workflows with retries and type safety; Outlines for guaranteed schema compliance on local models; LangChain for teams already using chains or agents; Marvin for rapid decorator-based prototyping; PromptQuorum for consistency testing across GPT, Claude, and Gemini before production.

Key Takeaways

  • Instructor is the most popular Python choice β€” Pydantic schemas, automatic retries, supports any LLM API
  • Outlines guarantees schema compliance on local models via constrained decoding β€” zero hallucination risk
  • Pydantic AI adds type safety to multi-turn agent conversations with first-class structured output
  • LangChain's with_structured_output() unifies structured output across OpenAI, Anthropic, and Google APIs
  • Marvin uses decorators for rapid prototyping β€” turn Python function signatures into typed LLM calls
  • PromptQuorum tests structured output consistency across all models before production deployment

πŸ’‘ TL;DR

Use Instructor for Python API extraction with retries. Use Outlines for guaranteed schema compliance on local models. Use Pydantic AI for type-safe multi-turn agents. Use LangChain if you're already in that ecosystem. Use Marvin for rapid prototyping. Use PromptQuorum to test structured output consistency across all models before production.

⚑ Quick Facts

  • Β·Instructor supports 20+ LLM providers (OpenAI, Anthropic, Google, Ollama, vLLM)
  • Β·Outlines guarantees schema compliance at token generation time (0% hallucination)
  • Β·Pydantic AI runs fully async and supports multi-turn conversation validation
  • Β·LangChain's with_structured_output() wraps 6+ major provider APIs uniformly
  • Β·Marvin decorator syntax: @marvin.fn signature β†’ automatic LLM call binding
  • Β·PromptQuorum tests the same prompt across 25+ models for consistency

Problems Each Tool Solves

Structured output requires solving three interdependent problems: schema definition, API enforcement, and validation. Different tools attack these problems differently. Instructor handles all three in Python with retries. Outlines eliminates the validation step via constrained decoding. Pydantic AI adds type safety for agents. LangChain wraps provider APIs. Marvin prioritizes developer speed. PromptQuorum validates consistency across all models.

ProblemInstructorOutlinesPydantic AILangChainMarvin
Define schemaPydantic modelsJSON Schema / GBNFPydantic modelsTool definitionsPython decorators
Enforce on API callRetry + validationToken-level constraintAPI + validationProvider JSON modePrompt injection
Validate responseAutomaticGuaranteed at generationType-checkedManualAutomatic

Instructor: Pydantic Extraction

Instructor is the most widely adopted structured output library. It wraps any LLM API β€” OpenAI GPT-4.5, Claude 4.7, Gemini, Ollama, vLLM β€” and returns validated Pydantic models instead of raw text. Instructor handles retries automatically when validation fails, making it production-grade without extra error handling.

  • Compatible with 20+ LLM providers (OpenAI, Anthropic, Google, local models via Ollama/vLLM)
  • Pydantic v2 schemas: type hints, validation rules, docstring descriptions embedded in schema
  • Automatic retry with backoff on validation failure β€” no manual error handling needed
  • Works in Python and TypeScript (via Node.js adapter)
  • Apache 2.0 open-source, actively maintained
  • Pricing: Free (no additional cost beyond LLM API calls)
python
import instructor
from pydantic import BaseModel
from openai import OpenAI

class User(BaseModel):
    name: str
    age: int

client = instructor.from_openai(OpenAI())
user = client.chat.completions.create(
    model="gpt-4o",
    response_model=User,
    messages=[{"role": "user", "content": "Extract: John is 25 years old"}]
)
# user.name == "John", user.age == 25

Outlines: Constrained Decoding

Outlines enforces schema compliance at token generation time via constrained decoding. Instead of generating tokens then validating, Outlines limits valid tokens at each step to match your schema. This guarantees 100% schema compliance with zero hallucination risk, making it ideal for local models.

  • Works with llama.cpp, vLLM, transformers, NVIDIA NIM, and any HuggingFace model
  • JSON Schema or GBNF (GGML BNF) format schema definitions
  • Guaranteed schema compliance β€” no post-generation validation or retries needed
  • Faster than retry-based validation (fewer wasted tokens)
  • Free and open-source (Apache 2.0)
  • Best for local deployment and cost-sensitive workflows

Pydantic AI: Type-Safe Agents

Pydantic AI is a new framework (2025) that combines Pydantic models with first-class support for multi-turn agent conversations. It adds full type safety to agent loops while enforcing structured output on each turn. Designed for Python async workflows.

  • Pydantic v2 type system β€” full IDE support and type checking
  • Built-in structured output on every agent step
  • Async-first design for high-throughput applications
  • Supports OpenAI GPT, Anthropic Claude, Google Gemini, and local models via Ollama
  • Tool calling baked in β€” define tools as Python functions with type hints
  • Free to use (no additional cost beyond LLM API calls)

LangChain: Unified APIs

LangChain 0.1+ added with_structured_output() to all major chat models. This unifies structured output across OpenAI, Anthropic, Google, and local models behind a single API. If your team already uses LangChain chains or agents, this is the easiest path to structured output.

  • Unified API: one .with_structured_output() method works across all providers
  • Automatically converts LangChain tool definitions to provider-specific schema formats
  • Integrates seamlessly with chains, agents, and runnable workflows
  • Supports Pydantic models, TypedDict, and OpenAI schema definitions
  • Part of LangChain ecosystem (no extra dependencies)
  • Best for teams already invested in LangChain

Marvin: Decorator-Based Extraction

Marvin uses Python decorators to turn function signatures into typed LLM calls. You define a function signature with type hints, decorate it with @marvin.fn, and Marvin handles prompt generation and structured output validation automatically. Fastest path from idea to working code.

  • Decorator syntax: @marvin.fn turns Python signatures into LLM prompts
  • Works with OpenAI, Anthropic, Google, and local models
  • Type hints become schema β€” minimal boilerplate
  • Built-in validation and error handling
  • Suitable for prototyping and small-to-medium workflows
  • Free to use (pricing TBD as of April 2026)

PromptQuorum: Cross-Model Testing

PromptQuorum is not a structured output library itself, but a testing platform for validating structured output consistency across models. Run the same prompt against GPT-4.5, Claude 4.7 Opus, Gemini 3.1 Pro, and 20+ other models simultaneously. Measure schema compliance, latency, and cost per model.

  • Multi-model dispatch in a single API call β€” test one prompt against 25+ models
  • Structured output compliance metrics β€” pass rate, latency, cost per model
  • Identify models that hallucinate on your schema β€” avoid deploying to unreliable models
  • Consensus mode β€” find agreements between independent model runs
  • Works with Instructor, Outlines, Pydantic AI, LangChain, or raw LLM APIs
  • Free tier available, enterprise pricing for high-volume testing

Side-by-Side Comparison

ToolBest ForSchema FormatLanguageLocal ModelsPricingLearning Curve
InstructorPython APIs + retriesPydantic modelsPython/TypeScriptYes (Ollama)FreeLow
OutlinesLocal model deploymentJSON Schema/GBNFPythonYes (native)FreeMedium
Pydantic AIType-safe agentsPydantic modelsPythonYes (Ollama)FreeLow
LangChainChains + agentsTool definitionsPython/JSYesFreeMedium
MarvinRapid prototypingType hintsPythonYesFreeVery low
PromptQuorumMulti-model testingAPI-agnosticAPI-firstVia OpenAI proxyFree tier + enterpriseLow

Choosing the Right Tool

Start by answering three questions: (1) Do you use LangChain already? (2) Do you need local model support? (3) How much validation complexity do you have?

  • Use Instructor if: You're building Python APIs and need automatic retries on validation failure. Best general-purpose choice.
  • Use Outlines if: You deploy local models (llama.cpp, vLLM) and want guaranteed schema compliance at generation time.
  • Use Pydantic AI if: You're building multi-turn agent workflows with type safety across all steps.
  • Use LangChain if: You already use LangChain chains or agents β€” with_structured_output() is the simplest addition.
  • Use Marvin if: You want to prototype rapidly and don't need complex validation β€” decorators are the fastest path.
  • Use PromptQuorum if: You need to test structured output consistency across GPT, Claude, and Gemini before production.

Adding Structured Output Step-by-Step

  1. 1
    Define your output schema β€” Create a Pydantic model (Python), TypeScript interface, or JSON Schema describing the fields, types, and constraints you want the LLM to return.
  2. 2
    Choose a library β€” Instructor for Python APIs, Outlines for local models, Pydantic AI for agents, LangChain if already in use, Marvin for speed.
  3. 3
    Install and wrap your LLM call β€” `pip install instructor` (Python), then pass your schema to the API call. Instructor handles validation and retries.
  4. 4
    Test with PromptQuorum β€” Deploy to PromptQuorum and run your prompt against GPT, Claude, and Gemini. Measure schema compliance per model.
  5. 5
    Refine schema based on failures β€” If a model fails validation, add examples to your prompt or adjust schema constraints. Iterate until all models pass.

Common Structured Output Mistakes

❌ Using JSON mode without validation

Why it hurts: API JSON mode (OpenAI response_format, Anthropic JSON control) only hints at JSON structure β€” it does NOT guarantee your schema is obeyed. Models still hallucinate field names and types.

Fix: Always layer validation on top: use Instructor, Outlines, or Pydantic AI. Never trust JSON mode alone. Test with PromptQuorum to catch compliance failures.

❌ Designing schemas that are too strict

Why it hurts: Overly constrained schemas (tiny enum lists, very specific regex patterns) cause LLMs to fail validation frequently. High retry counts waste tokens and money.

Fix: Use PromptQuorum to test schema strictness across models. Loosen constraints to achieve 95%+ compliance. Use optional fields instead of required ones when possible.

❌ Not testing local vs. API model differences

Why it hurts: Outlines on llama.cpp behaves differently than Instructor on GPT-4.5. Schema compliance rates differ per model. Building only for GPT, then deploying locally, causes production failures.

Fix: Test all intended model backends early. Use PromptQuorum to run the same prompt across local (vLLM), API (OpenAI, Anthropic), and open-source (Gemini) models.

❌ Ignoring latency and token cost impact

Why it hurts: Structured output with retries costs more tokens. Instructor retries on failure. Outlines constrained decoding is slower than free generation. Not measuring per-model cost.

Fix: Use PromptQuorum cost tracking. Compare latency across models. For budget-conscious workflows, prefer Outlines (no retries). For accuracy, accept Instructor's retry cost.

❌ Mixing validation methods (no consistency)

Why it hurts: Some requests use Instructor, others use raw JSON parsing. Some models validated, others not. This leads to inconsistent errors in production.

Fix: Standardize on one validation approach per codebase. All requests use Instructor, or all use Outlines. Consistency reduces debugging time by 10x.

What is structured output in LLMs?

Structured output constrains LLM responses to a specific schema β€” JSON format, defined fields, type constraints. Instead of free-text replies, structured output returns data your code can directly parse and validate without error handling.

Which tool is best for Python developers?

Instructor is the most popular Python choice. It uses Pydantic models to define schemas, automatically handles retries and validation, and supports any LLM API (OpenAI, Anthropic, Google, Ollama). Pydantic AI is an alternative if you also want type-safe agent multi-turn conversations.

Can I use structured output with local models like Llama?

Yes. Outlines specializes in local model constrained decoding β€” it works with llama.cpp, vLLM, and transformers libraries. Outlines guarantees schema compliance at token generation time with zero hallucination risk. Instructor also supports Ollama if you run it as an API.

What is the difference between Instructor and Marvin?

Instructor uses Pydantic models to define schemas and handles extraction with error recovery. Marvin uses Python decorators β€” you decorate a function signature and Marvin auto-generates the LLM prompt. Instructor is more explicit (better for complex validations), Marvin is more concise (better for rapid prototyping).

Does LangChain support structured output?

Yes. LangChain 0.1+ includes with_structured_output() method on ChatOpenAI, ChatAnthropic, ChatGoogle, etc. It automatically converts LangChain tools to structured output schemas. Use this if you already use LangChain agents and want to add schema enforcement without switching libraries.

How do I test if structured output is reliable?

Use PromptQuorum to run the same prompt across multiple models and measure schema compliance. Different models (GPT-4.5, Claude 4.7, Gemini 3.1) have different structured output reliability. Test before deploying to production. Unit test with Instructor/Pydantic validation locally.

What does "constrained decoding" mean?

Constrained decoding limits token generation to only valid values according to your schema. Outlines does this by computing the set of valid next tokens at each step. This guarantees schema compliance without post-generation validation or retries, making it faster and more reliable than API-level JSON mode.

Can I use structured output without any library?

Technically, yes β€” you can prompt the model to return JSON and then parse it yourself. But validation will fail on hallucinations. All six tools solve this by either validating with retries (Instructor, Marvin), enforcing at decode time (Outlines), or wrapping provider APIs (LangChain, Pydantic AI).

Which tool has the best documentation?

LangChain and Pydantic AI have the most comprehensive docs due to their corporate backing. Instructor has excellent tutorials and examples despite being community-maintained. Outlines docs are technical but thorough. Marvin has quick-start guides.

Do I need all six tools or just one?

Start with one. Python developers should try Instructor or Pydantic AI. Local model teams should try Outlines. LangChain users should try LangChain's with_structured_output(). Use PromptQuorum to validate consistency across all models. Most teams use one tool + PromptQuorum for testing.

Sources

Apply these techniques across 25+ AI models simultaneously with PromptQuorum.

Try PromptQuorum free β†’

← Back to Prompt Engineering

Best Tools for Structured Output and JSON Mode (2026)