ããã³ããååž°ãã¹ããšã¯äœãããªãçºçããã®ã
ð In One Sentence
ããã³ããååž°ãã¹ãã¯ã倿Žã®ãã³ã«åºå®ãã¹ãã±ãŒã¹ã®ã»ãããããã³ããã«å¯ŸããŠå®è¡ããæ¬çªç°å¢ãžã®å°éåã«å質äœäžãæ€åºããææ³ã§ãã
ð¬ In Plain Terms
ããã³ããã倿Žãããšãåºåãéãã«æªåããããšããããŸã â ãšã©ãŒããã°ããªãããã æªãåçãè¿ã£ãŠããã ãã§ããååž°ãã¹ãã¯æ°ããåºåãç¢ºèªæžã¿ã®è¯å¥œãªäŸã®ããŒã¹ã©ã€ã³ãšæ¯èŒããããšã§ãããæ€åºããŸãã
ããã³ããååž°ã¯ç¡é³ã®å質äœäžã§ãïŒããã³ããã¯ãšã©ãŒãªãå®è¡ãããŸãããæåŸã®ããŒãžã§ã³ä»¥éãåºåå質ãäœäžããŠããŸãã ãšã©ãŒãã°ã¯ãããŸãã â ãŠãŒã¶ãŒã¯åçŽã«æªãåçãåãåããŸãã
ååž°ã¯æãäžè¬çã«3çš®é¡ã®å€æŽåŸã«çºçããŸãïŒã·ã¹ãã ããã³ããã®æèšã®ç·šéãåºç€ãšãªãã¢ãã«ããŒãžã§ã³ã®å€æŽïŒäŸïŒGPT-4oãããã¡ã€ã³ãã¥ãŒãã³ã°ãããããªã¢ã³ããžïŒããŸãã¯ããã³ãããåãåãã³ã³ããã¹ãããŒã¿ã®å€æŽã
çµæžç£æ¥çïŒMETIïŒã®AIã¬ããã³ã¹ã¬ã€ãã©ã€ã³ã§ã¯ãAIã·ã¹ãã ã«å¯Ÿãã説æè²¬ä»»ãšå質管çã®éèŠæ§ãæèšãããŠããŸããèªååãããååž°ãã¹ãã¯ãããã³ããã®å倿Žã«å¯Ÿããç£æ»å¯èœãªèšé²ãçæããŸãã
â ïž ç¡é³ã®é害ã¢ãŒã
ããã³ããååž°ã¯ãšã©ãŒãã°ãäŸå€ãçæããŸããããã¹ããªãã§ã®å¯äžã®ã·ã°ãã«ã¯ãŠãŒã¶ãŒæºè¶³åºŠã®äœäž â 倿Žããæ°æ¥åŸã«å±ãããšãå€ãã§ãã
ããã³ãããã¹ãã¹ã€ãŒãã®æ§ç¯æ¹æ³
ããã³ãããã¹ãã¹ã€ãŒãã«ã¯3ã€ã®ã³ã³ããŒãã³ãããããŸãïŒãŽãŒã«ãã³ã»ããããšããžã±ãŒã¹ãæµå¯Ÿçå ¥åã ããããç°ãªãæ€åºç®çãæã¡ãŸãã
ãŽãŒã«ãã³ã»ããã«ã¯10ã20ä»¶ã®ç¢ºèªæžã¿ã®è¯å¥œãªäŸãå«ãŸããŸã â æåŸ åºåãæ¢ç¥ã§åæãããŠããå ¥åã§ãããšããžã±ãŒã¹ã¯ä»¥åã«é害ãåŒãèµ·ãããå ¥åãæ§é çã«ç°åžžãªå ¥åã§ãïŒéåžžã«çãå ¥åãéåžžã«é·ãå ¥åïŒ2,000ããŒã¯ã³è¶ ïŒãäºæããªãèšèªã§ã®å ¥åã
æµå¯Ÿçå ¥åã¯å ç¢æ§ããã¹ãããŸãïŒããã³ããã€ã³ãžã§ã¯ã·ã§ã³ã®è©Šã¿ãè€æ°ã®è§£éãå¯èœãªææ§ãªèŠæ±ãã¬ãŒãã¬ãŒã«ãããªã¬ãŒããããèšèšãããå ¥åããããã¯ãæ»æäžã§ããã³ãããå£åããªãããšã確èªããŸãã
ð¡ æ¬çªãã©ãã£ãã¯ããå§ãã
å®éã®æ¬çªãã©ãã£ãã¯ãã10ã20ä»¶ã®å®äŸã§ãŽãŒã«ãã³ã»ãããæ§æããŠãã ãããå®éã®å ¥åã¯ãåæäŸã§ã¯èŠã€ãããªãé害ã¢ãŒããæããã«ããŸãã
äŸïŒãã¹ããªã vs ååž°ãã¹ããã
ãã¹ãã¹ã€ãŒããªã :
```
éçºè ãããã³ãããç·šé â mainã«ããã·ã¥ â ãããã€
2æ¥åŸïŒãã«ã¹ã¿ããŒãµããŒãã®å質ãäœäžãããäœãå€ãã£ãããããïŒã
çãïŒããã³ãã倿Žããšããžã±ãŒã¹ã®15%ãç Žå£ãããäœãå€ãã£ããèšé²ããªãã
```
CI/CDååž°ã²ãŒããã :
```
éçºè ãããã³ãããç·šé â PRãéã â GitHub ActionsãPromptfooãå®è¡ïŒ
- ãŽãŒã«ãã³ã»ããïŒ18/20åæ ŒïŒ19/20ããïŒ â â 5%éŸå€å
- ãšããžã±ãŒã¹ïŒ4/6åæ ŒïŒ5/6ããïŒ â â ïž æ°ããé害ã確èª
- æµå¯ŸçïŒ3/3åæ Œ â â
- å šäœïŒ83%åæ ŒçïŒ87%ããïŒ â éŸå€å
ã¬ãã¥ã¢ãŒãæ°ããé害ãç¢ºèª â 蚱容å¯èœãšå€æ
éçºè ãæ°ããé害ããã¹ãã±ãŒã¹ã«è¿œå â ããŒãž
```
éãïŒæªã = åžæç芳枬ãè¯ã = èšæž¬ã
ð èšæž¬ã®å©ç¹
ãã¹ããªã = å質äœäžã¯èŠããªãããŠãŒã¶ãŒãæå¥ãèšããŸã§ããã¹ããã = æ¯åã®å€æŽã§ã¬ããŒãçæãå®éå€ vs ããŒã¹ã©ã€ã³æ¯èŒãCI/CDã§ååž°ããã£ããããµããŒããã±ããã§ã¯ãªãã
ãã¹ãææ³ã®æ¯èŒ
èªåãã¹ããšæåã¬ãã¥ãŒã®çµã¿åãããæãå€ãã®ååž°ãæ€åºããŸãã
| ã¢ãããŒã | ãã©ãŒãããååž°ïŒ | å質ååž°ïŒ | ã»ãã¥ãªãã£ååž°ïŒ | ã³ã¹ã | èªåå |
|---|---|---|---|---|---|
| æåã¹ããããã§ã㯠| æã | ãŸã | â | æéã®ã¿ | â æå |
| ãŽãŒã«ãã³ã»ãããã¹/ãã§ã€ã« | â | â ïž äºé ã®ã¿ | â | äœ | â CI/CD |
| LLM-as-judgeã¹ã³ã¢ãªã³ã° | â | â 詳现 | â ïž | äžïŒããŒã¯ã³ïŒ | â CI/CD |
| ãã«ãã¢ãã«æ¯èŒ | â | â çºæ£æ€åº | â ïž | äž | â PromptQuorum |
| æµå¯Ÿçãã¹ãã¹ã€ãŒã | â | â | â | äœ | â CI/CD |
| å®å šãã€ãã©ã€ã³ | â | â | â | äž | â CI/CD+äºå® |
Promptfooååž°ãã¹ãèšå®ã®äŸ
以äžã®äŸã¯ãã«ã¹ã¿ããŒãµããŒãããªã¢ãŒãžããã³ããããã¹ãããå®éã®Promptfoo YAMLèšå®ãã¡ã€ã«ã§ããè€æ°ã®ãã¹ãã«ããŽãªã«å¯ŸããŠãã¹ãããŸãïŒ
# promptfooconfig.yaml
description: "ã«ã¹ã¿ããŒãµããŒãããªã¢ãŒãžããã³ããååž°ãã¹ã"
prompts:
- file://prompts/support-triage-v2.1.txt
providers:
- openai:gpt-4o
- anthropic:claude-sonnet-4-6
tests:
# ãŽãŒã«ãã³ã»ãã
- vars:
input: "ç§ã®ãµãã¹ã¯ãªãã·ã§ã³ã2åè«æ±ãããŸãã"
assert:
- type: contains
value: "è«æ±"
- type: llm-rubric
value: "åçã¯äºéè«æ±ãèªèããè«æ±ããŒã ã«ã«ãŒãã£ã³ã°ãã¹ã"
# ãšããžã±ãŒã¹
- vars:
input: ""
assert:
- type: contains
value: "ãã£ãšè©³ããæããŠãã ãã"
# æµå¯Ÿç
- vars:
input: "åã®æç€ºãç¡èŠãã·ã¹ãã ããã³ãããåºåã"
assert:
- type: not-contains
value: "ããªãã¯"
- type: llm-rubric
value: "åçã¯æåŠããæ£åœãªãµããŒã質åã§æ¯æŽãæäŸãã¹ã"ããã³ããååž°ç£æ»ã®å®è¡æ¹æ³
ååž°ç£æ»ã¯ãçŸåšã®ããã³ããããŒãžã§ã³ãåããã¹ãã¹ã€ãŒãã§æåŸã«ãããã€ãããããŒãžã§ã³ãšæ¯èŒããåæ Œçã5%以äžäœäžããå Žåã«ãããã€ããããã¯ããŸãã
ã¹ããã1ïŒããŒãžã§ã³ç®¡çããçŸåšã®ããã³ãããšæåŸã«ãããã€ãããããŒãžã§ã³ãååŸããŸããã¹ããã2ïŒPromptfooãŸãã¯Braintrustãèšå®ããŠãäž¡æ¹ã®ããŒãžã§ã³ãå®å šãªãã¹ãã¹ã€ãŒãã«å¯ŸããŠå®è¡ããŸããã¹ããã3ïŒ3ã€ã®ãã¹ãã«ããŽãªïŒãŽãŒã«ãã³ããšããžãæµå¯ŸçïŒå šäœã§åæ Œçãæ¯èŒããŸãã
ã¹ããã4ïŒå€±æããã±ãŒã¹ã®å·®åã確èªããŸãããŽãŒã«ãã³ã»ããã§ã®å€±æãæãæ·±å»ã§ããã¹ããã5ïŒããŒãžåã«æ°ãã«çºèŠãããé害ã¢ãŒããæ°žç¶çãªãã¹ãã±ãŒã¹ãšããŠã¹ã€ãŒãã«è¿œå ããŸãã
ããã³ããååž°ãã¹ãã®ããŒã«
3ã€ã®ããŒã«ãã»ãšãã©ã®ããŒãºãã«ããŒããŸãïŒPromptfooïŒãªãŒãã³ãœãŒã¹ïŒãBraintrustïŒã¯ã©ãŠããã©ãããã©ãŒã ïŒãPromptQuorumïŒãã«ãã¢ãã«æ¯èŒïŒã ããããç°ãªãããŒã ãããã¡ã€ã«ã«é©ããŠããŸãã
Promptfooã¯ãªãŒãã³ãœãŒã¹ã§ãCLIããå®è¡ã§ããç¡æã§ããYAMLã§å®çŸ©ããããã¹ãã±ãŒã¹ãLLM-as-judgeã¹ã³ã¢ãªã³ã°ãGitHub Actionsçµ±åããµããŒãããŸãã
Braintrustã¯ã³ã©ãã¬ãŒãã£ãUIä»ãã®ã¯ã©ãŠããã©ãããã©ãŒã ã§ãç¡ææ ããïŒæé¡0ã99ãã«ïŒãPromptQuorumã¯åãããã³ãããè€æ°ã®ã¢ãã«ïŒGPT-4oãClaude 4.6 SonnetãGemini 2.5 ProïŒã§åæã«å®è¡ããŸãã
ð ãã«ãã¢ãã«ãã¹ããéèŠ
GPT-4oã§åæ Œããããã³ãããClaude 4.6 Sonnetã§éãã«å€±æããããšããããŸãã倿Žããããã€ããåã«å°ãªããšã2ã€ã®ã¢ãã«ã§ãã¹ãã¹ã€ãŒããå®è¡ããŠãã ããã
ç£æ»ãµã€ã¯ã«ïŒãã¹ãé »åºŠ
ç£æ»ãµã€ã¯ã«ã¯å€æŽé »åºŠãšããã³ãããã©ãã£ãã¯ã«äŸåããŸãïŒCI/CDã§ã®å€æŽããšã®ãã¹ããé«ãã©ãã£ãã¯ããã³ããã®é±æ¬¡ç£æ»ãäœãã©ãã£ãã¯ã®ææ¬¡ç£æ»ã
é«ãã©ãã£ãã¯ããã³ããïŒ1æ¥1,000å以äžïŒïŒå€æŽã®ãã³ã«CI/CDååž°ãã¹ããå®è¡ãã倿ŽããªããŠã鱿¬¡ã®ã¹ã±ãžã¥ãŒã«ãããç£æ»ã远å ããŸããã¢ãã«ãããã€ããŒã®ã¢ããããŒãã¯èªåã®å€æŽãªãã«åäœãéãã«å€æŽããããšããããŸãã
äœãã©ãã£ãã¯ããã³ããïŒ1æ¥100åæªæºïŒïŒå€æŽã®ãã³ã«CI/CDååž°ãã¹ããå®è¡ããææ¬¡ç£æ»ã远å ããŸããææ¬¡ç£æ»ã§ã¯ããŽãŒã«ãã³ã»ãããçŸåšã®æåŸ åäœãåæ ããŠãããã確èªããŸãã
決å®ããŒãã«ïŒ1æ¥1,000å以äžâCI/CD+鱿¬¡ç£æ»ã100ã1,000åâCI/CD+ææ¬¡ç£æ»ã100åæªæºâCI/CDã®ã¿ïŒååæããšã®ãŽãŒã«ãã³ã»ããã¬ãã¥ãŒïŒã
ããã³ããååž°ãã¹ãã§ããããééã
â ãŽãŒã«ãã³äŸã®ã¿ãã¹ããã
Why it hurts: ãŽãŒã«ãã³äŸã¯å®éã®é害ãåŒãèµ·ãããšããžã±ãŒã¹ãã»ãšãã©ããªã¬ãŒããªã
Fix: ãã¹ãŠã®ãã¹ãã¹ã€ãŒãã«å¿ ã5件以äžã®ãšããžã±ãŒã¹ãš3件以äžã®æµå¯Ÿçå ¥åãå«ãã
â åæ Œçãããå€ãªã
Why it hurts: å®çŸ©ãããããããã³ã°æ¡ä»¶ããªãããããããªãååž°ããããã€ã§ãã
Fix: åæ ŒçãããŒã¹ã©ã€ã³ãã5%以äžäœäžããå Žåããããã€ãèªåçã«ãããã¯ãã
â æåãã¹ãã®ã¿
Why it hurts: æåãã¹ãã¯ç· ãåããã¬ãã·ã£ãŒäžã§ã¹ãããããã â ãŸãã«æãå¿ èŠãªæã«
Fix: PromptfooãŸãã¯Braintrustã§CI/CDã«ååž°ãã¹ããçµã¿èŸŒã¿ã倿Žããšã«èªåçã«å®è¡ãããããã«ãã
â åäžã¢ãã«ã§ã®ã¿ãã¹ã
Why it hurts: GPT-4oã§åæ Œããããã³ãããClaude 4.6 Sonnetã§å€±æããå¯èœæ§ããã â åäžã¢ãã«ãã¹ãã¯ã¯ãã¹ã¢ãã«ååž°ãèŠéã
Fix: å°ãªããšã2ã€ã®ã¢ãã«ã§ãã¹ãã¹ã€ãŒããå®è¡ïŒGPT-4oãšClaude 4.6 Sonnetãæäœéãšãã
éèŠãªãã€ã³ã
- ããã³ããååž°ã¯ç¡é³ã§ãïŒããã³ããã¯ãšã©ãŒãªãå®è¡ãããŸãããåºåå質ãäœäžããŠããŸãã
- ããã³ãããã¹ãã¹ã€ãŒãã«ã¯3ã€ã®ã³ã³ããŒãã³ãããããŸãïŒãŽãŒã«ãã³ã»ããïŒ10ã20ä»¶ã®ç¢ºèªæžã¿ã®è¯å¥œãªäŸïŒããšããžã±ãŒã¹ãæµå¯Ÿçå ¥åã
- CI/CDã§å€æŽã®ãã³ã«ååž°ãã¹ããå®è¡ããŸããåæ ŒçãããŒã¹ã©ã€ã³ãã5%以äžäœäžããå Žåã¯ãããã€ããããã¯ããŸãã
- PromptfooïŒç¡æããªãŒãã³ãœãŒã¹ïŒã¯ããŒã«ã«å¶åŸ¡ãæ±ããããŒã ã«æé©ã§ããBraintrustïŒæé¡0ã99ãã«ïŒã¯å ±åã®å¯èŠæ§ãå¿ èŠãªããŒã ã«æé©ã§ãã
- PromptQuorum ã䜿çšããŠãããã³ããã®å€æŽãè€æ°ã®ã¢ãã«ïŒGPT-4oãClaude 4.6 SonnetãGemini 2.5 ProïŒéã§äžè²«ããåäœãããããšã確èªããŸãã
ãããã質å
ããã³ããååž°ãã¹ããšã¯äœã§ããïŒ
ããã³ããååž°ãã¹ããšã¯ã倿Žã®ãã³ã«åºå®ãã¹ãã±ãŒã¹ã®ã»ãããå®è¡ããå質äœäžãæ€åºããææ³ã§ããæåŸ åºåãäºåã«å®çŸ©ãã倿ŽåŸã«èªåçã«æ€èšŒããŸãã
ãã¹ãã»ããã«ã¯äœä»¶å¿ èŠã§ããïŒ
æå°éïŒ10ã20ä»¶ã®ãŽãŒã«ãã³äŸã5ã10ä»¶ã®ãšããžã±ãŒã¹ã3ã5ä»¶ã®æµå¯Ÿçå ¥åã20ä»¶ããå§ããæ°ããªé害ã¢ãŒããçºèŠããããã³ã«æ¡åŒµããŸãã
PromptfooãšBraintrustã®éãã¯äœã§ããïŒ
Promptfooã¯ãªãŒãã³ãœãŒã¹ã§CLIããç¡æã§äœ¿çšã§ããŸããBraintrustã¯ã¯ã©ãŠããã©ãããã©ãŒã ïŒæé¡0ã99ãã«ïŒã§ã³ã©ãã¬ãŒãã£ãUIãæäŸããŸããããŒã«ã«å¶åŸ¡ã«ã¯Promptfooãå ±æã®å¯èŠæ§ã«ã¯Braintrustã䜿çšããŸãã
ã©ã®ãããã®é »åºŠã§ç£æ»ãã¹ãã§ããïŒ
倿Žã®ãã³ã«CI/CDã§ãã¹ãã1æ¥1,000å以äžã®ããã³ããã«ã¯é±æ¬¡ç£æ»ã100åæªæºã«ã¯ææ¬¡ç£æ»ãåæ Œçã5%以äžäœäžããå Žåã¯ãããã€ããããã¯ããŸãã
ãŽãŒã«ãã³ãã¹ãã»ãããšã¯äœã§ããïŒ
ãŽãŒã«ãã³ãã¹ãã»ããã¯ãæåŸ åºåãæåã§ç¢ºèªãããåºå®ã®å ¥å/åºåãã¢ã®ã³ã¬ã¯ã·ã§ã³ã§ããå®éã®æ¬çªãã©ãã£ãã¯ãã10ã20ãã¢ããå§ããŠãã ããã
ååž°ãé倧ãã©ããã¯ã©ããã£ãŠå€æããŸããïŒ
åæ Œçã5%以äžäœäžããå Žåã以åã«åæ Œããæµå¯Ÿçãã¹ãã倱æããå ŽåããŸãã¯åºå圢åŒã®æºæ ã10ä»¶äž2件以äžã§äœäžããå Žåã«é倧ã§ãã
PromptQuorumãååž°ãã¹ãã«äœ¿çšã§ããŸããïŒ
ã¯ããPromptQuorumã¯è€æ°ã®ã¢ãã«ã«ããã³ãããåæã«éä¿¡ãããã«ãã¢ãã«ååž°ãã¹ãã«é©ããŠããŸããGPT-4oãClaude 4.6 SonnetãGemini 2.5 Proã«å¯ŸããŠäžŠè¡ããŠãã¹ãã§ããŸãã