ããã³ããå質ãšã¯?
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãããã³ããå質ãšã¯ãæ§ã ãªå ¥åãæ¡ä»¶ãã¢ãã«ç°å¢äžã§ãããã³ãããæå³ããåºåã確å®ã«çæããèœåã§ããããã¯åãªããåããã§ã¯ãªããäºæž¬å¯èœã§ã枬å®å¯èœã§ãåçŸå¯èœãªçµæãããããããšã§ãã ã»ãšãã©ã®ããŒã ã¯ããã³ããã 2ïœ3 åã®äŸã§ãã¹ãããŠãããã§è¯ãããããšå€å®ããŸããããã¯å€±æãã¿ãŒã³ã® 90% ãèŠèœãšããŠãããæ¬çªç°å¢ã§äºæããªãåäœãå質äœäžããããããŸãã ããã³ããå質ãã¬ãŒã ã¯ãŒã¯ã¯ããã®ãªã¹ã¯ãå®éçã«æž¬å®ããçããšã®æ¹åã远跡ããè€æ°ã¢ãã«éã§ã®äºææ§ãæ€èšŒããããã®æ§é ãæäŸããŸãã
ããã³ããå質ã®3ã€ã®èŠçŽ ã¯?
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãããã³ããå質ã«ã¯ 3 ã€ã®æž¬å®å¯èœãªåŽé¢ããããŸã: 粟床 â ããã³ããåºåãæå³ããçµæãšäžèŽããå²åã§ããäŸãã°ãã顧客ã®åé¡ãåé¡ãããããã³ãã㯠95% ã®ç²ŸåºŠã§æ£ããåé¡ããå¿ èŠããããŸãã äžè²«æ§ â åãå ¥åã«å¯ŸããŠãããã³ãããåãç¯å²ã®åºåãè¿ãä¿¡é Œæ§ã§ããäŸãã°ããµããŒããšãŒãžã§ã³ãããã³ããã¯åãã«ã¹ã¿ããŒãµããŒã質åã«å¯ŸããŠãããŒã³ãé·ããæ§é ãé¡äŒŒããåçãæäŸããŸãã æç€ºéµå®ç â ããã³ããã§æå®ããããã¹ãŠã®å¶çŽãšåœ¢åŒèŠä»¶ãéµå®ããåºåã®å²åã§ããäŸãã°ããJSON 圢åŒãæå€§ 500 æåãå¿ ã key ãå«ãããããã³ãã㯠100% ãããã®ã«ãŒã«ãå®ãå¿ èŠããããŸãã 3 ã€ã®åŽé¢ãã¹ãŠã枬å®ããããšã§ãããã³ããã®å šäœçãªä¿¡é Œæ§ã®å®å šãªå³ãåŸãããŸãã
æå確èªã倱æããçç±
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãå€ãã®ããŒã ãæåã¹ããããã§ãã¯ïŒã 5 åã®å ¥åã§è©ŠããŠã¿ããïŒã«äŸåããŠãããããã«ã¯éå€§ãªæ¬ é¥ããããŸã: 代衚æ§ã®äžè¶³ â æåã§éžãã 5 åã®äŸã¯ç¢ºèªãã€ã¢ã¹ã®åœ±é¿ãåãããšããžã±ãŒã¹ã察æçã·ããªãªãã»ãŒçµ¶å¯Ÿã«å«ã¿ãŸããã ã¹ã±ãŒã«æ§ããªã â 1000 ãªã¯ãšã¹ã/æ¥ãåŠçããæ¬çªã·ã¹ãã ã§ã5 åã®äŸã§ãã¹ãããããšã¯ãé£è¡æ©ãé£ã°ãåã«ã¿ã€ã€ã 5 åã ãæ€æ»ãããããªãã®ã§ãã åçŸæ§ããªã â ãããã¯ããèŠã ãããšãã䞻芳çãªå€å®ã¯ããšã³ãžãã¢éã§ç°ãªããåãããã³ããçã§ãæéãšãšãã«å€ãããŸãã é ãããã¿ãŒã³ãèŠèœãšã â 倱æã¯éåžžãæåŸ ããŠããªãã³ãŒããŒã±ãŒã¹ã§çºçããŸããæåãã¹ãã§ã¯ããããçºèŠããããšã¯ãã£ãã«ãããŸããã æ§é åããããã¹ãã»ããã¯ããããã¹ãŠã®åé¡ã解決ããŸãã
ããã³ãããã¹ãã»ããã®æ§ç¯æ¹æ³
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãæå¹ãªãã¹ãã»ãã㯠20 ã±ãŒã¹ïŒæå°éïŒã§æ§æãããŸã: 10 æ£åžžç³» â ããã³ãããæåãããšæåŸ ããã·ããªãªã§ããäŸãã°ãã顧客åé¡åé¡ãããã³ããã®å Žåãå®éã®ãµããŒããªã¯ãšã¹ãã 10 åå«ããŸãã 5 ãšããžã±ãŒã¹ â æ£åžžã ãäºæããªãã·ããªãªã§ããéåžžã«é·ãå ¥åãæ°å€å¢çå€ãç¹æ®æåãè€æ°èšèªã®æ··åšãå«ããŸãã 5 察æçå ¥å â ããã³ããã倱æããããäºæããªãåäœãããæå³çãªè©Šã¿ã§ããççŸããæç€ºãæå®³ãªè³ªåãããã³ããã€ã³ãžã§ã¯ã·ã§ã³æ»æãã·ãã¥ã¬ãŒãããŸãã ãã¹ãã»ããã®æ§ç¯: 1. å®éã®ããŒã¿ããéå§ â ãŠãŒã¶ãŒãã£ãŒãããã¯ããµããŒããã±ããããã°ãã 50ïœ100 åã®å®äŸãåéããŸã 2. 倱æãç¹å® â ã©ã®ã±ãŒã¹ã§ããã³ããã倱æãŸãã¯äœã¹ã³ã¢ãåŸãããèšé²ããŸã 3. ãã¿ãŒã³ãåæ â 倱æã«å ±éãããã¿ãŒã³ãèŠã€ãããã¹ãã»ããã«ãããã远å ããŸã 4. 宿çã«æŽæ° â æ1åãæ°ãã倱æã±ãŒã¹ã远å ãããã¹ãã»ãããé²åãããŸã ããã®ã»ã¯ã·ã§ã³ã®éèŠãªãã€ã³ãããã¹ãã»ããã¯éçã§ã¯ãªããããã³ãããåŠçããå®éã®ããŒã¿ãšäžŠè¡ããŠæé·ããå¿ èŠããããŸãã
ããã³ããåºåã®ã¹ã³ã¢ãªã³ã°æ¹æ³
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãã¹ã³ã¢ãªã³ã°æ¹æ³ã¯ 2 ã€ã®äž»ãªã¢ãããŒãããããŸã: ãã€ã㪠Pass/Fail â æãåçŽã§æãé©åãªæ¹æ³ã§ããåºåãåºæºãæºãããŠãããïŒPassïŒãããªããïŒFailïŒãå€å®ããŸããäŸ: - "顧客åé¡åé¡" ããã³ãã: åé¡ãæ£ç¢ºãªã Passãéã£ãã Fail - "ã¡ãŒã«çæ" ããã³ãã: åºåã JSON 圢åŒã§ããã¹ãŠã®å¿ é ãã£ãŒã«ããå«ããªã Pass ãã€ããªæ¹åŒã®å©ç¹: - 誰ãè©äŸ¡ããŠãã¹ã³ã¢ãåãïŒå®¢èгçïŒ - éèšããããïŒåèšãã¹æ° / ãã¹ãç·æ°ïŒ - ãã¹ãèªååã«æé© Likert ã¹ã±ãŒã«ïŒ1ïœ5 ã¬ãŒãã£ã³ã°ïŒ â æ§é ååºåãããåµé çãªã¿ã¹ã¯ïŒèšäºäœæããã¶ã€ã³èª¬æïŒã«äœ¿çšããŸãã5=å®ç§ã4=ããããªç·šéã§OKã3=å€§å¹ ãªç·šéãå¿ èŠã2=䜿çšäžå¯ã1=å®å šã«ééãã æ³šæ: Likert ã¹ã±ãŒã«ã¯äž»èгçã§ãLLM-as-Judge ã䜿ãå Žåã人éã®è©äŸ¡è éã§äžè²«æ§ããããŸãããå¯èœãªéããã€ããªã䜿çšããŠãã ããã LLM-as-Judge ã¹ã³ã¢ãªã³ã° â LLMïŒClaude ãªã©ïŒã«åºåãè©äŸ¡ãããŸããäŸ: ``` ããã³ãã: 以äžã®é¡§å®¢åé¡ãæ£ç¢ºãã©ãããè©äŸ¡ããŠãã ãããåºæºã¯ criteriaãPass ãŸã㯠Fail ã§çããŠãã ããã å ¥å: "ç§ã®è«æ±æžãééã£ãŠããŸã" ããã³ããã®åºå: "Billing Issue" ``` LLM-as-Judge ã®å©ç¹ãšéç: - â æ°çŸã±ãŒã¹ãç§åäœã§åŠç - â ãã€ããªã¹ã³ã¢ã§èªååå¯èœ - â ïž LLM èªäœã®ãã€ã¢ã¹ãå°å ¥ããå¯èœæ§ - â ïž æ1åã人éãè€æ°ã±ãŒã¹ã§ã¯ãã¹ãã§ã㯠ã¹ã³ã¢ãªã³ã°åºæºã®å®è£ : ``` Case #1 Input: "payment failed" Expected: Billing Issue Prompt output: Billing Issue Score: PASS Justification: åé¡ãå®ç§ã«äžèŽ Case #2 Input: "how do i reset password" Expected: Account Access Prompt output: Technical Issue Score: FAIL Justification: ããå ·äœçãªã«ããŽãªãŒãéžã¶ã¹ã ```
ããã³ããå質ã¯ã¢ãã«éã§ç°ãªãã?
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãã¯ããåãããã³ããã§ããã¢ãã«éã§ã¹ã³ã¢ãå€§å¹ ã«ç°ãªããŸãã å®äŸ: "顧客ãµããŒãè¿çããŸãšãã" ããã³ãã - Claude Opus 4.7: 92% ãã¹ç - GPT-4o: 78% ãã¹ç - Llama 3.2 70B: 65% ãã¹ç ãªãç°ãªãã: - èšç·ŽããŒã¿ãç°ãªã â åã¢ãã«ã¯ç°ãªãããŒã¿ã»ããã§èšç·ŽãããŠãããç¬èªã®ãã€ã¢ã¹ãšåŒ·åºŠãæã€ - ããŒã¯ã³åãç°ãªã â èšèªåŠçæ¹æ³ãç°ãªããåãããã³ããæãç°ãªãæ¹æ³ã§è§£æããã - ã¢ã©ã€ã¡ã³ãæ¹æ³ãç°ãªã â å®å šæ§ãšã¬ã€ãã³ã¹ã®æ¹æ³ãç°ãªããããã³ãããžã®å¿çæ¹æ³ã«åœ±é¿ãã å®åçãªåœ±é¿: 1. ã¢ãã«åºæã®ãã¹ãã»ãã â æ¬çªã§è€æ°ã¢ãã«ã䜿ãå Žåãåã¢ãã«çšã«å¥ã ã®ãã¹ãã»ããããŸãã¯ã¢ãã«éã§å ±æããæå°ã³ã¢ã»ãããäœæããŸã 2. ã¢ãã«åºæã®éŸå€ â Claude ã« 90% ãã¹çãæåŸ ãããªããLlama ã«ã¯ 75% ã§ã蚱容å¯èœãããããŸãã 3. ä¿¡é Œæ§ã©ã³ãã³ã° â ã¢ãã«ã®ã¹ã³ã¢ã«åºã¥ããŠãæ¬çªç°å¢ã§ã®äœ¿çšé »åºŠãã©ã³ã¯ä»ãããŸãïŒé«ã¹ã³ã¢ = ããå€ã䜿çšïŒ 4. 段éçãªå°å ¥ â æ°ã¢ãã«ã¯å°èŠæš¡ã§ãã¹ãããã¹ã³ã¢ãååã«é«ãŸããŸã§æ¬çªå±éãé å»¶ãããŸã ããã®ã»ã¯ã·ã§ã³ã®éèŠãªãã€ã³ããåãããã³ããããã¹ãŠã®ã¢ãã«ã§åãããã«ããã©ãŒãã³ã¹ãããšã¯æåŸ ããªãã§ãã ãããåã¢ãã«ã®ã¹ã³ã¢ã枬å®ããå°å ¥æŠç¥ã調æŽããŠãã ããã
ããã³ããå質è©äŸ¡ã®å§ãæ¹
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãå®è£ ã®ã¹ããããã€ã¹ãããã¬ã€ã: Week 1: ãã¬ãŒã ã¯ãŒã¯ãå®çŸ© - ããŒã å ã§ 15 åã®ã¹ã¯ã©ããããŒãã£ã³ã°ãéããŸã - 粟床ãäžè²«æ§ãæç€ºéµå®çã® 3 åŽé¢ãå®çŸ©ããŸã - ãã€ã㪠Pass/Fail ã¹ã³ã¢ãªã³ã°ãéžæããŸãïŒæå㯠Likert ã¹ã±ãŒã«ãé¿ããïŒ - äŸ: "顧客åé¡ããã³ãã" â ç²ŸåºŠãšæç€ºéµå®çã«çŠç¹ãåœãŠãŸã Week 2: ãã¹ãã»ãããæ§ç¯ - å®éã®ãŠãŒã¶ãŒããŒã¿ãã 50ïœ100 ã±ãŒã¹ãåéããŸãïŒãµããŒããã±ããããã°ïŒ - 20 ã±ãŒã¹ïŒ10 æ£åžžç³»ã5 ãšããžã5 察æçïŒãéžæããŸã - Google Sheets ã§èšé²ããŸã: - Column A: å ¥å - Column B: æåŸ ãããåºå - Column C: å®éã®ããã³ããåºå - Column D: Pass/Fail - Column E: çç± Week 3: ãã¹ããå®è¡ - ããã³ããã«å¯Ÿã㊠20 ã±ãŒã¹ãå®è¡ããŸã - åçµæãèšé²ããã¹ã³ã¢ãèšç®ããŸãïŒåèš Pass / 20ïŒ - 倱æãã¿ãŒã³ãåæããŸã Week 4: çµæãæ¹åããŠå埩 - ãã¹ãã«åºã¥ããŠããã³ãããæ¹åããŸã - æ¹åçã§åããã¹ãã»ãããåå®è¡ããŸã - ã¹ã³ã¢ã®æ¹åã远跡ããŸã é·æçãªä¿å®ïŒæ¯æïŒ - æ¬çªç°å¢ã®å€±æã±ãŒã¹ãæ°ããå ¥åãšã㊠5ïœ10 å远å - ãã¹ãã»ããã 30 ã±ãŒã¹ã«æ¡åŒµ - è€æ°ã¢ãã«ã§ãã¹ããå®è¡ - ã¹ã³ã¢æšç§»ã°ã©ããäœæ ããŒã«: - Google Sheets ïŒã·ã³ãã«ãããŒã å ±æå¯èœïŒ - Notion ïŒããæŽçãããã€ã³ã¿ãŒãã§ãŒã¹ïŒ - Humanloop ïŒå°éçãªè©äŸ¡ãã©ãããã©ãŒã ïŒ - Python ã¹ã¯ãªãã ïŒAPI çµç±ã§èªåå®è¡ïŒ
ããããããã³ããè©äŸ¡ã®èª€ã
â åºå®çãªãã¹ãã»ãã
Why it hurts: "äœæãããçµãã" ãšãããã¹ãã»ãããå®éã®ãŠãŒã¶ãŒããŒã¿ã¯é²åããŠããããã¹ãã»ãããé²åããå¿ èŠããããŸãã
Fix: æ¯æãæ¬çªç°å¢ã®å€±æã±ãŒã¹ 5ïœ10 åããã¹ãã»ããã«è¿œå ããŸããããã«ãããããã³ãããå®éã®ã·ããªãªã«å¯Ÿå¿ãç¶ããããšãä¿èšŒãããŸãã
â LLM ã¹ã³ã¢ãç¡æ¡ä»¶ã«ä¿¡é Œ
Why it hurts: LLM-as-Judge ã¯äŸ¿å©ã§ãããç¬èªã®ãã€ã¢ã¹ãå°å ¥ããŸããäŸãã°ãç¹å®ã®ã¹ã¿ã€ã«ã奜ããããããŸããã
Fix: æ1åãè€æ°ã®å®ã±ãŒã¹ïŒ5ïœ10 åïŒã人éãæ€èšŒããLLM ã¹ã³ã¢ãšæ¯èŒããŸããä¹é¢ãããã°ãLLM ã¹ã³ã¢ãªã³ã°åºæºã調æŽããŸãã
â ãã¹ãã»ãããå°ãããã
Why it hurts: 3ïœ5 ã±ãŒã¹ã§ãã¹ãããããšã¯ãçµ±èšçã«ç¡æå³ã§ããæ¬çªã§ã®ããã©ãŒãã³ã¹ãäºæž¬ããŸããã
Fix: æå°é 20 ã±ãŒã¹ïŒ10 æ£åžžç³»ã5 ãšããžã5 察æçïŒããå§ããŸããæ¬çªç°å¢ã§ã¯ 100ïœ500 ã±ãŒã¹ãç®æããŸãã
â è€æ°ã®ã¡ããªã¯ã¹ã§æ°ãæ£ãã
Why it hurts: 粟床ãé å»¶ãããŒã¯ã³äœ¿çšéãäžè²«æ§âŠâŠãã¹ãŠã远跡ãããšãä¿¡å·ã倱ãããŸãã
Fix: è€æ°ã®ã¡ããªã¯ã¹ãèšé²ããŸãããåäžã®ãå šäœçãªãã¹çããå ±åããŸãã詳现ãªã¡ããªã¯ã¹ã¯ãããã°çšã§ãã
â ã¢ãã«éã®ã¹ã³ã¢ãçŽæ¥æ¯èŒ
Why it hurts: Claude ã 95% ã§ããLlama ã 75% ãªã "倱æ" ãšå€å®ãããã¢ãã«ã®åŒ·åºŠãç°ãªããŸãã
Fix: ã¢ãã«ããšã«æåŸ å€ãèšå®ããŸããClaude ã«ã¯ 90% 以äžãLlama ã«ã¯ 75% 以äžãªã©ã§ãã
ããã³ããè©äŸ¡ã«åœ±é¿ããå°åèŠå¶
ããã®ã»ã¯ã·ã§ã³ã§ã¯âŠãããã³ããè©äŸ¡ãã¬ãŒã ã¯ãŒã¯ã¯ãããŒã«ã«ããŒã¿èŠå¶ã«ãã£ãŠå¶éãããå ŽåããããŸããäž»ãªå°åã説æããŸãã æ¥æ¬ïŒMETI ã¬ã€ãã©ã€ã³ïŒ METIïŒçµæžç£æ¥çïŒã® AI ã¬ããã³ã¹ã¬ã€ãã©ã€ã³ 2024 ã§ã¯ãæ¥æ¬äŒæ¥ã¯ AI ã·ã¹ãã ã®éææ§ãšèª¬æå¯èœæ§ã確ä¿ããå¿ èŠããããŸããããã¯: - ããã³ããè©äŸ¡çµæãææžåãã6ã¶æããšã«æ€èšŒ - LLM è©äŸ¡ã«ã¯äººéã«ããç£æ»ãã°ã远å - ããã³ããçã®å±¥æŽã远跡å¯èœã«ä¿ã€ æ¥æ¬ã§ã®å®è£ : Google Sheets ã«è©äŸ¡ãã°ãèšé²ããMETI ç£æ»æã«æç€ºã§ããããã«ããŸãã æ±ã¢ãžã¢ã»ã¢ãžã¢å€ªå¹³æŽ éåœãã·ã³ã¬ããŒã«ããªãŒã¹ãã©ãªã¢ãªã©ã®åœã : - ããŒã¿åŠçã®ç£æ»èšŒè·¡ã®ä¿æãèŠæ± - LLM ã¹ã³ã¢ãªã³ã°åºæºã®å®æã¬ãã¥ãŒïŒæäœ 6ã¶æããšïŒ - ãŠãŒã¶ãŒããŒã¿ãå«ãæ¬çªãã¹ãã»ããã®æå·å æ±ã¢ãžã¢å€ªå¹³æŽã§ã®å®è£ : ã¯ã©ãŠãã¹ãã¬ãŒãžã§è©äŸ¡ããŒã¿ãæå·åä¿åããã¢ã¯ã»ã¹ãã°ãèšé²ããŸãã ã°ããŒãã« å€ãã®åœã§ã¯ç¹å®ã®èŠå¶ããªããããæ¥çæšæºã«åŸããŸã: - AI éææ§ã¬ããŒãã幎1åçºè¡ïŒã©ã®ããã«è©äŸ¡ããããçµæã®äœ¿ç𿹿³ïŒ - ããã³ããè©äŸ¡ãã§ãã¯ãªã¹ããåŸæ¥å¡åãã«å ¬é - 誀åé¡ã倱æã®å ±åã¡ã«ããºã ãæäŸ ããã®ã»ã¯ã·ã§ã³ã®éèŠãªãã€ã³ããèŠå¶ç°å¢ã¯æ¥éã«å€åããŠããŸããå°åããšã®ã¬ã€ãã©ã€ã³ã宿çã«ç¢ºèªããè©äŸ¡ãã¬ãŒã ã¯ãŒã¯ã調æŽããŠãã ããã
é¢é£èšäº
- ããã³ããã©ã€ãã©ãªã®æ§ç¯æ¹æ³ â ããŒã éã§ãã¹ãæžã¿ããã³ãããå ±æããæ¹æ³ãè©äŸ¡ãã¬ãŒã ã¯ãŒã¯ãšãã¹ãã»ãããç管çããŸãã
- LLM ã®å¹»èŠãæžããæ¹æ³ â å¹»èŠã¯è©äŸ¡ãã¬ãŒã ã¯ãŒã¯ã®äžè¬çãªå€±æã«ããŽãªãŒã§ãããã®ã¬ã€ãã¯å¹»èŠãæ€åºããŠè»œæžããæ¹æ³ã説æããŸãã
- ããã³ããæé©åãã¬ãŒã ã¯ãŒã¯ â è©äŸ¡ãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠããã³ãããæ®µéçã«æ¹åããæ¹æ³ã
ãããã質å
ããã³ããå質ãšãã¹ãå質ã®éãã¯?
ããã³ããå質ã¯åºåã®ç²ŸåºŠã»äžè²«æ§ã枬ããŸãããã¹ãå質ã¯ãã¹ãã»ããèªäœã®æå¹æ§ïŒã«ãã¬ããžã代衚æ§ïŒã§ããè¯ãããã³ããã¯æªããã¹ãã§ãé«ã¹ã³ã¢ãåŸãããæªãããã³ããã¯è¯ããã¹ãã§äœã¹ã³ã¢ãåŸãããŸãã
LLM-as-Judge ãåžžã«æ£ç¢ºãªè©äŸ¡ãæäŸããã?
ããããLLM-as-Judge ã¯äžè²«æ§ããããŸããããã€ã¢ã¹ãå°å ¥ããå¯èœæ§ããããŸããååž°ãã¹ãïŒPass/Fail ã®çµ±èšçããªãã远跡ïŒã䜿çšããæ1åã¯äººéãè€æ°ã®ãµã³ãã«ãæ€èšŒã㊠LLM ã®è©äŸ¡ãšæ¯èŒããããšããå§ãããŸãã
ãã¹ãã»ããã®ãµã€ãºã¯ã©ã®ããããããã?
æå°é㯠20 ã±ãŒã¹ (10 æ£åžžç³», 5 ãšããž, 5 察æç) ã§ããæ¬çªç°å¢ã§ã¯ 100ïœ500 ã±ãŒã¹ãäžè¬çã§ãããã倧ããã»ããã¯ããå€ãã®å€±æã¢ãŒã ããã£ããããŸãããã¡ã³ããã³ã¹ã³ã¹ããå¢å ããŸãã
ã¹ã³ã¢ãæ°ã¢ãã«éã§å€§å¹ ã«ç°ãªãã®ã¯ãªãã?
åã¢ãã«ã®åºç€èšç·ŽããŒã¿ãã¢ã©ã€ã¡ã³ãæ¹æ³ãããŒã¯ã³åãç°ãªããããåãããã³ããã«å¯ŸããŠç°ãªãå¿çãããŸããããã¯ã¢ãã«åºæã®ãã¹ãã»ããããŸãã¯ã¢ãã«åºæã®ã¹ã³ã¢ãªã³ã°åºæºãå¿ èŠã ãããšãæå³ããŸãã
è©äŸ¡ãã¬ãŒã ã¯ãŒã¯ãã©ã®ãããã®é »åºŠã§æŽæ°ããã?
åææ®µéã§ã¯æ¯é±ã¬ãã¥ãŒããŸããå®å®ååŸã¯æ1åã®å®æã¬ãã¥ãŒããå§ãããŸããæ°ãããŠãŒã¹ã±ãŒã¹ããŠãŒã¶ãŒãã£ãŒãããã¯ããŸãã¯ã¢ãã«ã®æŽæ°ã§å€æŽãå¿ èŠã«ãªã£ãå Žåã¯è¿œå ã§ã¬ãã¥ãŒããŠãã ããã
è€æ°ã®è©äŸ¡ææšãçµã¿åãããã, åäžã®ã¡ããªã¯ã¹ã䜿çšããã?
è€æ°ã®ææšïŒç²ŸåºŠãäžè²«æ§ãé å»¶ïŒã远跡ããŸãããåäžã®ã¡ããªã¯ã¹ïŒäŸ: å šäœçãªãã¹çïŒãå ±åããŸããè€æ°ã®ææšã¯ãããã°ã«åœ¹ç«ã¡ãŸãããåäžã¡ããªã¯ã¹ã¯ã¹ããŒã¯ãã«ããŒã®æææ±ºå®ãæç¢ºã«ããŸãã
ç°ãªãããã³ããçãå¹ççã«æ¯èŒããã«ã¯ã©ãããã?
åããã¹ãã»ããããã¹ãŠã®çã§å®è¡ããçããšã®ãã¹çã䞊è¡è¿œè·¡ããŸããA/B ãã¹ãã¯åäžã®æ¹åãæ€èšŒãããšãã«æå¹ã§ããå®å šãªãã¹ãã»ããã¯çããšã®å šäœçãªããã©ãŒãã³ã¹ãæç¢ºã«ç€ºããŸãã
ããã³ããè©äŸ¡ã®çµæãæŽçã»ä¿åããã«ã¯?
Google SheetsãNotionããŸãã¯å°çšã®è©äŸ¡ããŒã«ïŒHumanloop ãªã©ïŒã䜿çšããŠããã¹ãã±ãŒã¹ãã¹ã³ã¢ãã¿ã€ã ã¹ã¿ã³ããã¢ãã«çãèšé²ããŸããGit ã§çµæãç管çããããã³ãã倿Žã®åœ±é¿ããã¬ãŒã¹ã§ããããã«ããŸãã
è©äŸ¡ãã¬ãŒã ã¯ãŒã¯ãè€æ°ã®ããŒã ã§å ±æããã«ã¯?
ãã¹ãã»ãããã¹ã³ã¢ãªã³ã°åºæºãçµæãããŒã Wiki ãŸã㯠Git ãªããžããªã«ä¿åããŸããããã«ããäžè²«æ§ãä¿èšŒãããæ°ããããŒã ã¡ã³ããŒãããã«æ¡çšã§ããŸããæ1åã®åæããŒãã£ã³ã°ã§ãã¹ããã©ã¯ãã£ã¹ãå ±æããŠãã ããã
ããã³ããè©äŸ¡ã«ã©ã®ãããã®æéããããã?
20ã±ãŒã¹ã®ãã¹ãã»ããå®è¡ã«ã¯çŽ 30 åïŒLLM API åŒã³åºããå«ãïŒããããŸããè€æ°ã¢ãã«ãè€æ°çã§ã¯ 1ïœ2 æéã®äººéã®æéãå¿ èŠã§ããèªååïŒPython ã¹ã¯ãªãããAPIïŒã§æéã 80% åæžã§ããŸãã
åèè³æ
- METI AI ã¬ããã³ã¹åçã¬ã€ãã©ã€ã³ïŒæ¥æ¬çµæžç£æ¥çïŒ â æ¥æ¬äŒæ¥ã®ããã® AI ã·ã¹ãã ã®éææ§ãšèª¬æå¯èœæ§ã¬ã€ãã©ã€ã³
- Prompt Evaluation Best PracticesïŒAnthropicïŒ â å€§èŠæš¡èšèªã¢ãã«ã®è©äŸ¡ãšæé©åã«é¢ããããã¥ã¡ã³ã
- LLM Evaluation HandbookïŒHugging FaceïŒ â ãªãŒãã³ãœãŒã¹ LLM ã®è©äŸ¡ãã¬ãŒã ã¯ãŒã¯ãšåºæº
- Test-Driven Development for LLM PromptsïŒGitHubïŒ â ããã³ããè©äŸ¡ã®ãã¹ããã©ã¯ãã£ã¹ãšäŸ
- Prompt Engineering GuideïŒOpenAIïŒ â OpenAI ã®ããã³ãããšã³ãžãã¢ãªã³ã°ãšè©äŸ¡ã¬ã€ãã©ã€ã³