Key Takeaways
- Move from single prompt to pipeline: Input → Prompt 1 → Validate → Prompt 2 → Validate → Output
- Validation is the key: Check each intermediate output before passing to next step (not optional)
- Error handling: Retry with different prompt, escalate to human, or default to template—plan each path
- Orchestration: Use workflow tools (Langchain, LlamaIndex, custom Python) to manage state and retries
- Monitor metrics: Track success rate, latency, cost per workflow instance; tune as you go
Single Prompt vs. Workflow
A single prompt answers one question; a workflow solves a multi-step problem.
- Single prompt: "Summarize this article" → fast, brittle, fails on edge cases
- Workflow: Extract facts → Validate facts → Summarize → Check summary length → Approve/reject
- Single prompt best for: Real-time chat, one-off analysis, prototyping
- Workflow best for: Content generation, data extraction, decision support, customer-facing operations
Design a Prompt Pipeline
Map the logical steps before writing prompts.
- Input: What do we receive? (document, email, structured data)
- Steps: What needs to happen? (Extract → Validate → Enrich → Format)
- Output: What should be returned? (JSON schema, document, human review required?)
- Constraints: Time limit? Cost budget per instance? Accuracy threshold?
- Failures: What happens if Step 2 fails? Retry? Escalate? Default?
Validation at Every Step
Never pass unvalidated output to the next step; this is where workflows fail.
- Schema validation: "Extracted data must match schema; if not, retry"
- Semantic checks: "Summary must mention decision; if not, prompt differently"
- Threshold checks: "Confidence score >0.8; if lower, mark for human review"
- Format checks: "JSON must parse; if not, fallback to template extraction"
Error Handling Strategy
Define what happens when a step fails; don't let the workflow hang.
- Retry with variant: "If summary too long, re-prompt with stricter max_tokens"
- Escalate to human: "If confidence <0.5, send to Slack #review with context"
- Fallback template: "If extraction fails 3 times, use regex + manual template"
- Circuit breaker: "If 10 failures in a row, pause workflow; page on-call engineer"
Orchestration Tools
Use a workflow framework to manage retries, state, and error paths—don't code it from scratch.
- Langchain: Python library, chainable prompts, built-in error handling, integrations to 50+ tools
- LlamaIndex: Optimized for RAG workflows, automatic caching, observability
- Temporal: More heavyweight; for complex workflows with human approval, long-running tasks
- Custom Python: If pipeline is simple (2–3 steps); use for rapid iteration
- No-code: Zapier + OpenAI API for simple workflows; limited error handling
Optimize Cost Across Steps
Different steps need different models; use cheap models first, expensive ones as fallback.
- Fast filtering (Llama): Route input to correct domain (support vs sales) — 1 token cheap
- Complex reasoning (GPT-4o): Only if Llama uncertain — 10 tokens expensive, but only when needed
- Pattern: Llama → confidence check → if <0.8, re-run with GPT-4o
- Caching: If same input seen before, return cached output (saves 100% of cost for that step)
Monitor Workflow Metrics
Track success rate, latency, and cost; use data to optimize the workflow.
- End-to-end success: % of workflows reaching desired output without human intervention
- Step success: % success per step; identify bottleneck
- Latency: P50, P95, P99 milliseconds; where is time spent?
- Cost: $ per workflow instance; trend over time as you optimize
- Quality: % of human-reviewed outputs marked correct; feedback loop to prompt training
Common Mistakes
- No validation between steps—bad data cascades; downstream steps fail silently
- Same model for all steps—overkill for simple filtering, underkill for complex reasoning
- No error handling—workflow hangs or returns garbage on failure
- No monitoring—don't know if workflow is degrading; find out from angry users
- Over-engineering—3-step workflow doesn't need Temporal; simple Python loop is fine
Sources
- Langchain documentation: Chains and composition
- LlamaIndex docs: Workflow integration
- OpenAI cookbook: Prompt engineering for production workflows