PromptQuorumPromptQuorum
Startseite/Prompt Engineering/How to Design Prompts for Reliable Structured Data
Workflows & Automation

How to Design Prompts for Reliable Structured Data

·11 min read·Von Hans Kuepper · Gründer von PromptQuorum, Multi-Model-AI-Dispatch-Tool · PromptQuorum

Structured data extraction succeeds or fails at the prompt level—schema clarity, examples, and constraints drive parsing success. As of April 2026, proven patterns include schema declaration, in-context examples, and field-level instructions.

Wichtigste Erkenntnisse

  • Always show schema + 2–3 examples before asking for extraction
  • Add field constraints in prompt: "phone must be +XX-XXX-XXX-XXXX format"
  • Use XML or JSON templates in the prompt; models fill slots better than free-form generation
  • Mark optional fields explicitly: "If field unknown, use null, not empty string"
  • Test on edge cases (typos, missing data, non-English input) before production

Rule 1: Declare Schema First

Define the exact output schema in the system prompt before asking the model to extract.

  • Schema format: JSON schema, Pydantic model text, or XML tag definitions
  • Placement: System prompt or first user message, before the extraction task
  • Be explicit about types: string (max 100 chars), enum (one of: A, B, C), number (0–100), boolean, or null
  • Example: {"name": "string (required)", "phone": "string (optional, E.164 format)", "age": "integer (18–120)"}

Rule 2: Provide 2–3 Examples

In-context examples reduce parsing errors by 50–70%.

  • Format: "Example input | Expected output"
  • Vary inputs: Normal case, edge case (missing field), invalid case (wrong format)
  • Include type conversions: "John (age text) → 25 (integer)"
  • Explicitly show how to handle missing data: "Phone unknown → null"

Rule 3: Add Field-Level Constraints

Every field needs explicit rules: format, length, allowed values, and null handling.

  • Enum fields: "status must be one of: pending, active, completed, failed"
  • Format fields: "email must be valid RFC 5322 format" or "phone must be +XX-XXX-XXX-XXXX"
  • Length fields: "description max 500 characters" or "title exactly 50–80 characters"
  • Null handling: "If unknown, return null, not empty string or N/A"

Rule 4: Use Template Injection

Pre-fill a template with placeholders; model fills slots instead of generating free-form.

  • XML template: "<name>_____</name><phone>_____</phone>" → model fills blanks
  • JSON template: `{"name": "____", "phone": "____"}` easier to parse than unstructured text
  • Constraint in template: Show expected format: `<phone>+1-555-123-4567</phone>`
  • Reduces hallucination: Model focuses on extraction, not format invention

Rule 5: Test Edge Cases in Prompt

Include edge cases in examples; prompt explicitly about non-English, typos, and missing data.

  • Typos: "What if name is misspelled? Extract as given."
  • Missing data: "If field not found, use null, not guessed values."
  • Non-English: "If text is German/French, still extract to English schema."
  • Conflicting fields: "If two names given, take the first; note discrepancy in error field."

Rule 6: Add Validation Instructions

Tell the model how to validate its own output before returning.

  • Self-check: "Before responding, verify: (1) All required fields present, (2) No null in required fields, (3) Types match schema"
  • Error field optional: "Add error field if extraction uncertain; include confidence 0–100"
  • Retry pattern: "If validation fails, re-read input and retry"
  • Fallback: "If unable to extract, return all nulls, not guesses"

Common Mistakes

  • No schema in prompt—model invents fields or uses inconsistent types
  • Examples too simple—only happy path; no edge cases or missing data handling
  • Ambiguous field names—"contact" could be name, email, or phone; use exact names
  • Null/empty string confusion—models default to empty string; must explicitly say "use null"
  • Type coercion ignored—"2025-04-05" vs "April 5, 2025"; must show canonical format

Sources

  • OpenAI structured outputs guide, April 2026
  • Anthropic prompt best practices for JSON extraction
  • Pydantic documentation: Field constraints and validation

Wenden Sie diese Techniken gleichzeitig mit 25+ KI-Modellen in PromptQuorum an.

PromptQuorum kostenlos testen →

← Zurück zu Prompt Engineering

| PromptQuorum