What Makes Prompts Brittle?
- Vague instructions (model guesses intent)
- No examples (model invents format)
- Untested edge cases (fail on "real" data)
- Tight constraints (fail on minor variations)
How to Make Prompts Robust
- Add examples: 3β5 good examples of inputβoutput
- Specify format explicitly: "Output JSON with keys: X, Y, Z"
- Test edge cases: Typos, missing data, extreme values
- Add safeguards: "If X is invalid, return error message"
- Use structured output: Constrain with schemas or validation
Monitor Brittleness in Production
Track failure rates. Flag edge cases. Log failures for prompt updates.
Error Handling Strategies
- Fallback to simpler prompt
- Retry with different model
- Return structured error (not LLM error)
- Alert human for review
Sources
- OpenAI. Reliability patterns
- Anthropic. Robustness guide
- LangChain. Error handling
Common Mistakes
- Testing only happy path
- Not monitoring production
- Too-strict constraints (prevents valid inputs)
- Failing silently (no error logs)
- Not versioning when brittleness discovered