关键要点
- Store prompts in Git alongside code; version together, blame together
- Automate prompt testing: Generated code must pass linting, type-checking, and unit tests
- Use code generation as a tool, not a replacement: Review generated code; add tests; commit your changes
- Prompts for debugging and docs are interactive; prompts for code generation must be validated
- Version control prompt configs (temperature, max_tokens, model) with code; track performance over time
Code Generation vs. Interactive Use
Code generation workflows are production-grade; interactive use is exploratory.
- Code generation: Prompt must produce working code; validated by test suite; requires review
- Interactive: Developer asks "how do I X?" in chat; response is guide, not production artifact
- Workflow difference: Code gen goes through PR → review → merge; interactive is ephemeral
- Tool difference: Code gen uses SDK/API; interactive uses ChatGPT web
Store Prompts in Git
Prompts are code; version, blame, and review them with actual code.
- Directory: `prompts/code-gen/` subdirs by use case (e.g., `prompts/code-gen/react-component/`, `prompts/code-gen/sql-migration/`)
- Format: YAML or `.prompt` text file + metadata JSON (model, temperature, version)
- Example: `prompts/code-gen/unit-test/v2.yaml` contains prompt + { model: "GPT-4o", temperature: 0, maxTokens: 2000 }
- Git history: `git log prompts/code-gen/react-component/` shows who changed what and when
Code Generation Workflow
Generate → Lint → Test → Review → Merge; every step automated except review.
- Step 1 — Generate: Call API with prompt + input schema; capture output and request metadata (tokens, cost, latency)
- Step 2 — Lint: Run eslint, prettier, or equivalent; reject if code doesn't parse
- Step 3 — Test: Run unit tests; must pass suite; fail rate >5% → review prompt
- Step 4 — Review: Human code review (same as normal code); check for bugs, performance, style
- Step 5 — Merge: Commit includes prompt version, model version, and test results
CI/CD Integration
Add "generate and validate code" as a CI step; fail the build if generated code doesn't test.
- Trigger: On PR that modifies prompts, run code generation; commit bot commits generated code
- Test gate: PR fails if generated code doesn't pass existing test suite
- Cost tracking: Log tokens used, API cost; comment on PR with "Generated 500 tokens, cost $0.03"
- Regression detection: Compare generation quality to baseline on main; alert if regression
Test Prompt Quality
Code generation quality is measurable: test coverage, pass rate, latency.
- Metric 1 — Pass rate: % of generated code that passes unit tests on first try
- Metric 2 — Test coverage: Does generated code meet coverage threshold (e.g., 80%)?
- Metric 3 — Latency: How long does generation take? (SLA: <5 seconds)
- Metric 4 — Cost: $ per generated function; budget threshold (e.g., <$0.10 per function)
- Alert: If pass rate drops <80%, auto-revert prompt to previous version
Debugging and Documentation Workflows
These are interactive; no CI automation needed, but use prompts consistently.
- Debugging: Store "debug checklist prompts" (e.g., "If getting TypeError, ask: What changed?") in prompt library
- Docs generation: Use prompts to draft API documentation from code comments; human reviews before merge
- Pattern: Dev writes function → Prompt generates initial docs → Dev reviews + edits → Commit
- Test: Generated docs build without warnings; examples in docs compile without error
Common Mistakes
- No version control for prompts—can't reproduce good results; can't roll back to stable version
- Generated code without tests—ships untested code to production
- No test automation—developers manually test generated code, inconsistently
- Prompts not reviewed—no one checks whether prompt itself is quality
- Cost not tracked—bill balloons because AI is used wastefully
Sources
- GitHub Copilot best practices guide
- OpenAI code generation documentation
- Anthropic Claude for code guide