Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/LoRA ํŒŒ์ธํŠœ๋‹ ๋กœ์ปฌ LLM 2026: Llama 3.3์œผ๋กœ 8 GB VRAM์—์„œ Unsloth ํŠœํ† ๋ฆฌ์–ผ
๊ณ ๊ธ‰ ๊ธฐ์ˆ 

LoRA ํŒŒ์ธํŠœ๋‹ ๋กœ์ปฌ LLM 2026: Llama 3.3์œผ๋กœ 8 GB VRAM์—์„œ Unsloth ํŠœํ† ๋ฆฌ์–ผ

ยท13๋ถ„ ์ฝ๊ธฐยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

ํŒŒ์ธํŠœ๋‹์€ LoRA(Low-Rank Adaptation)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํŠน์ • ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค โ€” ์ „์ฒด ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , ์†Œํ˜• ์–ด๋Œ‘ํ„ฐ ๋ ˆ์ด์–ด(์ „์ฒด ๊ฐ€์ค‘์น˜์˜ 0.4%)๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. Llama 3.3 8B ํŒŒ์ธํŠœ๋‹์€ Unsloth๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 8 GB VRAM๊ณผ 1~2์‹œ๊ฐ„์œผ๋กœ ์†Œ๋น„์ž ํ•˜๋“œ์›จ์–ด์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค(ํ‘œ์ค€ ํ•™์Šต ๋Œ€๋น„ 4๋ฐฐ ๋น ๋ฆ„).

ํŒŒ์ธํŠœ๋‹์€ LoRA(Low-Rank Adaptation)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํŠน์ • ๋„๋ฉ”์ธ์— ๋งž๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค โ€” ์ „์ฒด ๋ชจ๋ธ์„ ์žฌํ•™์Šตํ•˜๋Š” ๋Œ€์‹ , ์†Œํ˜• ์–ด๋Œ‘ํ„ฐ ๋ ˆ์ด์–ด(์ „์ฒด ๊ฐ€์ค‘์น˜์˜ 0.4%)๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. Llama 3.3 8B ํŒŒ์ธํŠœ๋‹์€ Unsloth๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 8 GB VRAM๊ณผ 1~2์‹œ๊ฐ„์œผ๋กœ ์†Œ๋น„์ž ํ•˜๋“œ์›จ์–ด์—์„œ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค(ํ‘œ์ค€ ํ•™์Šต ๋Œ€๋น„ 4๋ฐฐ ๋น ๋ฆ„). 2026๋…„ 4์›” ๊ธฐ์ค€, LoRA์™€ QLoRA(4๋น„ํŠธ ์–‘์žํ™” LoRA)๋Š” Ollama, LM Studio, vLLM์—์„œ ํ”„๋กœ๋•์…˜ ์ˆ˜์ค€์œผ๋กœ ์ง€์›๋ฉ๋‹ˆ๋‹ค.

Slide Deck: LoRA ํŒŒ์ธํŠœ๋‹ ๋กœ์ปฌ LLM 2026: Llama 3.3์œผ๋กœ 8 GB VRAM์—์„œ Unsloth ํŠœํ† ๋ฆฌ์–ผ

์•„๋ž˜ ์Šฌ๋ผ์ด๋“œ ๋ฑ์—์„œ ๋‹ค๋ฃจ๋Š” ๋‚ด์šฉ: LoRA๊ฐ€ ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ „์ฒด ๋ชจ๋ธ์˜ 0.4%๋กœ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•, 8 GB VRAM์—์„œ ํŒŒ์ธํŠœ๋‹์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” QLoRA 4๋น„ํŠธ ์–‘์žํ™”, LoRA vs RAG ์˜์‚ฌ๊ฒฐ์ • ๋งคํŠธ๋ฆญ์Šค, Unsloth 6๋‹จ๊ณ„ ํ•™์Šต ์„ค์ •, ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ(rank, alpha, dropout), ๊ทธ๋ฆฌ๊ณ  5๊ฐ€์ง€ ์ผ๋ฐ˜์ ์ธ ํŒŒ์ธํŠœ๋‹ ์‹ค์ˆ˜. PDF๋ฅผ LoRA ํŒŒ์ธํŠœ๋‹ ์ฐธ์กฐ ์นด๋“œ๋กœ ๋‹ค์šด๋กœ๋“œํ•˜์‹ญ์‹œ์˜ค.

Browse the slides below or download as PDF for offline reference. Download Reference Card (PDF)

Key Takeaways

  • LoRA = ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์— ์†Œํ˜• ํ•™์Šต ๊ฐ€๋Šฅ ๋ ˆ์ด์–ด๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ๊ฐ€์ค‘์น˜์˜ 1~5%๋งŒ ํ•™์Šต ๊ฐ€๋Šฅํ•˜์—ฌ VRAM๊ณผ ์‹œ๊ฐ„์„ ํฌ๊ฒŒ ์ค„์ž…๋‹ˆ๋‹ค.
  • ํŒŒ์ธํŠœ๋‹ ์š”๊ตฌ ์‚ฌํ•ญ: ๊ณ ํ’ˆ์งˆ ์˜ˆ์ œ 500~1000๊ฐœ, VRAM 8~16 GB, ํ•™์Šต ์‹œ๊ฐ„ 1~4์‹œ๊ฐ„.
  • ์ตœ์  ๋„๊ตฌ: unsloth(๊ฐ€์žฅ ๋น ๋ฆ„), Hugging Face TRL, Axolotl(๊ฐ€์žฅ ์œ ์—ฐํ•จ).
  • LoRA rank(r): ๋‚ฎ์„์ˆ˜๋ก(r=8) ์†Œํ˜•์ด๊ณ  ๋น ๋ฅด๋ฉฐ, ๋†’์„์ˆ˜๋ก(r=64) ํ‘œํ˜„๋ ฅ์ด ๊ฐ•ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’: r=16~32.
  • 2026๋…„ 4์›” ๊ธฐ์ค€, LoRA๋Š” ์ถ”๋ก  ์—”์ง„ ์ „๋ฐ˜์—์„œ ํ”„๋กœ๋•์…˜ ์ˆ˜์ค€์œผ๋กœ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์ง€์›๋ฉ๋‹ˆ๋‹ค.

LoRA๋Š” ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•ฉ๋‹ˆ๊นŒ?

LoRA๋Š” ์›๋ž˜ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ์˜†์— ์†Œํ˜• "์–ด๋Œ‘ํ„ฐ" ํ–‰๋ ฌ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์ค‘์—๋Š” ์–ด๋Œ‘ํ„ฐ๋งŒ ์—…๋ฐ์ดํŠธ๋˜๊ณ  ์›๋ž˜ ๊ฐ€์ค‘์น˜๋Š” ๋™๊ฒฐ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ: 13B ๋ชจ๋ธ์—๋Š” 130์–ต ๊ฐœ์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. LoRA๋Š” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ 5000๋งŒ ๊ฐœ(์›๋ž˜์˜ ์•ฝ 0.4%)๋งŒ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต์€ 100๋ฐฐ ๋น ๋ฆ…๋‹ˆ๋‹ค.

์ถ”๋ก  ์‹œ์—๋Š” ์–ด๋Œ‘ํ„ฐ ์ถœ๋ ฅ์ด ํ–‰๋ ฌ ๊ณฑ์…ˆ์„ ํ†ตํ•ด ๋ฉ”์ธ ๋ชจ๋ธ ์ถœ๋ ฅ๊ณผ ๋ณ‘ํ•ฉ๋ฉ๋‹ˆ๋‹ค. ์†๋„ ์ €ํ•˜๋Š” ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค(์•ฝ 5%).

๊ฒฐ๊ณผ: 8 GB VRAM๋งŒ์œผ๋กœ ๋„๋ฉ”์ธ ํŠนํ™” ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(26 GB ๋Œ€์‹ ).

LoRA๋Š” ๋™๊ฒฐ๋œ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ์˜†์— ์†Œํ˜• ํ•™์Šต ๊ฐ€๋Šฅ ์–ด๋Œ‘ํ„ฐ ํ–‰๋ ฌ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. 13B Llama ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ 0.4%๋งŒ ํ•™์Šต ์ค‘์— ์—…๋ฐ์ดํŠธ๋˜์–ด VRAM๊ณผ ์‹œ๊ฐ„์„ 100๋ฐฐ ์ค„์ž…๋‹ˆ๋‹ค.
LoRA๋Š” ๋™๊ฒฐ๋œ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ์˜†์— ์†Œํ˜• ํ•™์Šต ๊ฐ€๋Šฅ ์–ด๋Œ‘ํ„ฐ ํ–‰๋ ฌ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. 13B Llama ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ 0.4%๋งŒ ํ•™์Šต ์ค‘์— ์—…๋ฐ์ดํŠธ๋˜์–ด VRAM๊ณผ ์‹œ๊ฐ„์„ 100๋ฐฐ ์ค„์ž…๋‹ˆ๋‹ค.

QLoRA(4๋น„ํŠธ ์–‘์žํ™” LoRA)๋ž€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

QLoRA๋Š” LoRA์™€ 4๋น„ํŠธ ์–‘์žํ™”๋ฅผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค โ€” ๋ฒ ์ด์Šค ๋ชจ๋ธ์€ 4๋น„ํŠธ(QLoRA)๋กœ ๋กœ๋“œํ•˜๊ณ  ์–ด๋Œ‘ํ„ฐ๋งŒ 16๋น„ํŠธ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋กœ์จ VRAM ์š”๊ตฌ ์‚ฌํ•ญ์ด ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.

2026๋…„ 4์›” ๊ธฐ์ค€, QLoRA๋Š” ์†Œ๋น„์ž ํ•˜๋“œ์›จ์–ด์˜ ๊ธฐ๋ณธ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์œ„ ์ฝ”๋“œ ์˜ˆ์ œ์—์„œ Unsloth์˜ `load_in_4bit=True` ํ”Œ๋ž˜๊ทธ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด QLoRA๊ฐ€ ์ž๋™์œผ๋กœ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. ์™„์ „ํ•œ LoRA ๋Œ€๋น„ 2% ํ’ˆ์งˆ ์ฐจ์ด๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋„๋ฉ”์ธ ์ ์‘ ์ž‘์—…์—์„œ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.

QLoRA(4๋น„ํŠธ) ๋Œ€์‹  LoRA(16๋น„ํŠธ)๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•  ๋•Œ:

โ€ข ์ตœ๋Œ€ ์ •๋ฐ€๋„๊ฐ€ ํ•„์š”ํ•œ ์ž‘์—…(์˜๋ฃŒ, ๋ฒ•๋ฅ  ๊ณ„์•ฝ ๋ถ„์„)

โ€ข VRAM์ด 16 GB ์ด์ƒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ

โ€ข 3B ์ดํ•˜ ์†Œํ˜• ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹(์†Œํ˜• ํฌ๊ธฐ์—์„œ๋Š” QLoRA ์ ˆ์•ฝ ํšจ๊ณผ๊ฐ€ ๋ฏธ๋ฏธํ•จ)

Method7B Model VRAM13B Model VRAMQuality vs Full
Full fine-tuning28 GB52 GB100% (๊ธฐ์ค€)
LoRA (16-bit base)16 GB30 GB~97%
QLoRA (4-bit base)8 GB14 GB~95%
7B, 13B, 70B ๋ชจ๋ธ ํฌ๊ธฐ๋ณ„ ํŒŒ์ธํŠœ๋‹ ๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ VRAM ์š”๊ตฌ ์‚ฌํ•ญ. ์™„์ „ ํŒŒ์ธํŠœ๋‹์€ 7B์— 28 GB ์ด์ƒ์ด ํ•„์š”ํ•˜๊ณ , QLoRA๋Š” 8 GB๋กœ ์ค„์ž…๋‹ˆ๋‹ค. ๊ธฐ์—… ์‚ฌ์šฉ์ž์˜ ๊ฒฝ์šฐ QLoRA๋ฅผ ํ†ตํ•ด ๋“€์–ผ RTX 4090(์ด ~40 GB)์—์„œ 70B ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
7B, 13B, 70B ๋ชจ๋ธ ํฌ๊ธฐ๋ณ„ ํŒŒ์ธํŠœ๋‹ ๋ฐฉ๋ฒ•์— ๋”ฐ๋ฅธ VRAM ์š”๊ตฌ ์‚ฌํ•ญ. ์™„์ „ ํŒŒ์ธํŠœ๋‹์€ 7B์— 28 GB ์ด์ƒ์ด ํ•„์š”ํ•˜๊ณ , QLoRA๋Š” 8 GB๋กœ ์ค„์ž…๋‹ˆ๋‹ค. ๊ธฐ์—… ์‚ฌ์šฉ์ž์˜ ๊ฒฝ์šฐ QLoRA๋ฅผ ํ†ตํ•ด ๋“€์–ผ RTX 4090(์ด ~40 GB)์—์„œ 70B ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

ํŒŒ์ธํŠœ๋‹์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ, RAG๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

์˜์‚ฌ๊ฒฐ์ • ๋งคํŠธ๋ฆญ์Šค:

LoRA ํŒŒ์ธํŠœ๋‹์— ํˆฌ์žํ•˜๊ธฐ ์ „์— ๋จผ์ € ๋” ๋‚˜์€ ํ”„๋กฌํ”„ํŒ…์œผ๋กœ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค โ€” ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง์€ ๋” ๋น ๋ฅด๊ณ , ๋˜๋Œ๋ฆด ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋ชจ๋ธ์— ๊ตฌ์• ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ „์ฒด ์˜์‚ฌ๊ฒฐ์ • ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง vs ํŒŒ์ธํŠœ๋‹: ๊ฒฐ์ • ๋ฐฉ๋ฒ•์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

ํŒŒ์ธํŠœ๋‹์€ ์˜คํ”„๋ผ์ธ์—์„œ ์ฝ”๋”ฉ ์›Œํฌํ”Œ๋กœ๋ฅผ ์ƒ์‚ฐ์ ์œผ๋กœ ์œ ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ, IDE, ํŒจํ‚ค์ง€ ์บ์‹œ, ๋ฌธ์„œ ๋ฏธ๋Ÿฌ ๋“ฑ ๋” ๋„“์€ ์˜คํ”„๋ผ์ธ ์„ค์ •์— ๋Œ€ํ•ด์„œ๋Š” ์ธํ„ฐ๋„ท ์—†์ด ๋กœ์ปฌ ์ฝ”๋”ฉ LLM์„ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค.

CriteriaFine-TuningRAG
๋ฌธ์„œ ๋ณ€๊ฒฝ ๋นˆ๋„์—ฐ 1ํšŒ ์ดํ•˜์ฃผ 1ํšŒ ์ด์ƒ
์ง€์‹ ์š”๊ตฌ ์‚ฌํ•ญ๋ชจ๋ธ์ด ๊นŠ์€ ์ดํ•ด ํ•„์š”๊ฒ€์ƒ‰์œผ๋กœ ์ถฉ๋ถ„
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ๊ฐ€์šฉ์„ฑ๊ณ ํ’ˆ์งˆ ์˜ˆ์ œ 500๊ฐœ ์ด์ƒ ํ•„์š”์–ด๋–ค ๋ฌธ์„œ๋“  ์‚ฌ์šฉ ๊ฐ€๋Šฅ
๋น„์šฉ(์žฅ๊ธฐ)์ผํšŒ์„ฑ($50~200)์ง€์†์ ์ธ ์ž„๋ฒ ๋”ฉ ๋น„์šฉ
์ง€์—ฐ ์‹œ๊ฐ„๋น ๋ฆ„(๊ฒ€์ƒ‰ ์—†์Œ)๋А๋ฆผ(๊ฒ€์ƒ‰ + LLM)
์ตœ์  ์šฉ๋„์ฝ”๋“œ, ์ฐฝ์ž‘, ๋„๋ฉ”์ธ ์Šคํƒ€์ผ์ง€์‹ ๋ฒ ์ด์Šค, Q&A

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋–ป๊ฒŒ ์ค€๋น„ํ•ฉ๋‹ˆ๊นŒ?

๊ณ ํ’ˆ์งˆ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๊ฐ€ ํŒŒ์ธํŠœ๋‹ ์„ฑ๊ณต์„ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๋‚˜์œ ๋ฐ์ดํ„ฐ = ๋‚˜์œ ๋ชจ๋ธ.

์ตœ์†Œ: 500๊ฐœ ์˜ˆ์ œ. ๊ฐ ์˜ˆ์ œ = ์ž…๋ ฅ + ๊ธฐ๋Œ€ ์ถœ๋ ฅ.

์ตœ์ : 1000~5000๊ฐœ ์˜ˆ์ œ. ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„์ˆ˜๋ก ์ •ํ™•๋„๊ฐ€ ๋†’์•„์ง‘๋‹ˆ๋‹ค.

ํ˜•์‹: JSON ๋˜๋Š” JSONL. ๊ฐ ์ค„ = ํ•˜๋‚˜์˜ ํ›ˆ๋ จ ์˜ˆ์ œ.

json
[
  {"instruction": "Translate to French", "input": "Hello world", "output": "Bonjour le monde"},
  {"instruction": "Summarize", "input": "Long text...", "output": "Summary..."},
  {"instruction": "Code review", "input": "Python code...", "output": "Review comments..."}
]

# OR instruction-only format:
[
  {"text": "<|user|>Translate to French\nHello<|assistant|>Bonjour"},
  {"text": "<|user|>Summarize\nText<|assistant|>Summary"}
]
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ค€๋น„ ์›Œํฌํ”Œ๋กœ: ๋„๋ฉ”์ธ ํŠนํ™” ๋ช…๋ น/์ถœ๋ ฅ ์Œ 500๊ฐœ ์ด์ƒ ์ˆ˜์ง‘, JSONL ํ˜•์‹์œผ๋กœ ์ €์žฅ(์ค„๋‹น ํ•˜๋‚˜), SFTTrainer์— ๋กœ๋“œ. ํ’ˆ์งˆ์ด ์ˆ˜๋Ÿ‰๋ณด๋‹ค ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค โ€” ๊ณ ํ’ˆ์งˆ ์˜ˆ์ œ 100๊ฐœ๊ฐ€ ์ €ํ’ˆ์งˆ ์˜ˆ์ œ 1000๊ฐœ๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ค€๋น„ ์›Œํฌํ”Œ๋กœ: ๋„๋ฉ”์ธ ํŠนํ™” ๋ช…๋ น/์ถœ๋ ฅ ์Œ 500๊ฐœ ์ด์ƒ ์ˆ˜์ง‘, JSONL ํ˜•์‹์œผ๋กœ ์ €์žฅ(์ค„๋‹น ํ•˜๋‚˜), SFTTrainer์— ๋กœ๋“œ. ํ’ˆ์งˆ์ด ์ˆ˜๋Ÿ‰๋ณด๋‹ค ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค โ€” ๊ณ ํ’ˆ์งˆ ์˜ˆ์ œ 100๊ฐœ๊ฐ€ ์ €ํ’ˆ์งˆ ์˜ˆ์ œ 1000๊ฐœ๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.

Unsloth๋ฅผ ์‚ฌ์šฉํ•œ ํŒŒ์ธํŠœ๋‹ ์„ค์ •

Unsloth๋Š” ๊ฐ€์žฅ ๋น ๋ฅธ LoRA ํ”„๋ ˆ์ž„์›Œํฌ์ž…๋‹ˆ๋‹ค(ํ‘œ์ค€ ํ•™์Šต ๋Œ€๋น„ 4๋ฐฐ ๋น ๋ฆ„):

python
# Install unsloth
pip install unsloth[colab-new] xformers bitsandbytes

from unsloth import FastLanguageModel
from datasets import load_dataset

# Load base model with LoRA
model, tokenizer = FastLanguageModel.from_pretrained(
  model_name="unsloth/llama-3.1-8b-bnb-4bit",
  max_seq_length=2048,
  load_in_4bit=True,
  lora_r=16, lora_alpha=32,
  lora_dropout=0.05
)

# Load training data
dataset = load_dataset("json", data_files="training.jsonl")

# Configure trainer
from trl import SFTTrainer
trainer = SFTTrainer(
  model=model,
  tokenizer=tokenizer,
  train_dataset=dataset["train"],
  dataset_text_field="text",
  max_seq_length=2048,
  args=TrainingArguments(
    per_device_train_batch_size=4,
    num_train_epochs=3,
    learning_rate=2e-4,
    output_dir="output"
  )
)

# Train
trainer.train()

LoRA ํŒŒ์ธํŠœ๋‹์˜ ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ

HyperparameterRecommended ValueTypical RangeEffect
learning_rate2e-41e-5 ~ 1e-3๋‚ฎ์„์ˆ˜๋ก ์•ˆ์ •์ ์ด๋‚˜ ์ˆ˜๋ ด์ด ๋А๋ฆผ
lora_r (rank)164 ~ 64๋†’์„์ˆ˜๋ก ํ‘œํ˜„๋ ฅ ๊ฐ•ํ•˜๋‚˜ ๋А๋ฆผ
lora_alpha328 ~ 256๋†’์„์ˆ˜๋ก LoRA ํšจ๊ณผ ๊ฐ•ํ•ด์ง
num_train_epochs31 ~ 10์—ํฌํฌ ๋งŽ์„์ˆ˜๋ก ๊ณผ์ ํ•ฉ ์œ„ํ—˜ ์ฆ๊ฐ€
batch_size41 ~ 32ํด์ˆ˜๋ก ํ•™์Šต ๋น ๋ฅด๋‚˜ VRAM ๋งŽ์ด ํ•„์š”
warmup_steps1000 ~ 1000์ ์ง„์  ํ•™์Šต๋ฅ  ์ฆ๊ฐ€๋กœ ํ•™์Šต ์•ˆ์ •ํ™”

ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๊นŒ?

ํ•™์Šต ์†์‹ค: ์—ํฌํฌ๊ฐ€ ์ง„ํ–‰๋ ์ˆ˜๋ก ๊ฐ์†Œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํ‰ํƒ„ํ•˜๋ฉด ํ•™์Šต๋ฅ ์ด ๋„ˆ๋ฌด ๋‚ฎ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ฒ€์ฆ ์†์‹ค: ๊ฐ์†Œํ•ด์•ผ ํ•˜์ง€๋งŒ ํ•™์Šต ์†์‹ค๋ณด๋‹ค ๋†’๊ฒŒ ์œ ์ง€๋˜๋Š” ๊ฒƒ์ด ์ •์ƒ์ž…๋‹ˆ๋‹ค. ์ฆ๊ฐ€ํ•˜๋ฉด ๊ณผ์ ํ•ฉ์ž…๋‹ˆ๋‹ค.

์ˆ˜๋™ ํ…Œ์ŠคํŠธ: ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธ ์˜ˆ์ œ๋กœ ์‹คํ–‰ํ•˜๊ณ  ์ถœ๋ ฅ์„ ๊ธฐ๋Œ€ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•˜์‹ญ์‹œ์˜ค.

๋ฒค์น˜๋งˆํฌ ์ž‘์—…: ํ‘œ์ค€ ๋ฒค์น˜๋งˆํฌ(MMLU, HumanEval)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์„ ๋„๋ฅผ ์ธก์ •ํ•˜์‹ญ์‹œ์˜ค.

๊ฐ€์žฅ ์ผ๋ฐ˜์ ์ธ ํŒŒ์ธํŠœ๋‹ ์‹ค์ˆ˜๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

  • ํ›ˆ๋ จ ์˜ˆ์ œ ๋ถ€์กฑ. 200๊ฐœ ๋ฏธ๋งŒ์˜ ์˜ˆ์ œ๋Š” ๊ณผ์ ํ•ฉ์œผ๋กœ ์ด์–ด์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์ตœ์†Œ 500๊ฐœ๋ฅผ ์ˆ˜์ง‘ํ•˜์‹ญ์‹œ์˜ค.
  • ๋„ˆ๋ฌด ๋งŽ์€ ์—ํฌํฌ ํ•™์Šต. ๋ชจ๋ธ์ด ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ํŒจํ„ด ํ•™์Šต ๋Œ€์‹  ๋ฐ์ดํ„ฐ๋ฅผ ์•”๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์ตœ๋Œ€ 3~5 ์—ํฌํฌ์—์„œ ์ค‘๋‹จํ•˜์‹ญ์‹œ์˜ค.
  • ๋ฏธํ™•์ธ ๋ฐ์ดํ„ฐ๋กœ ๊ฒ€์ฆํ•˜์ง€ ์•Š์Œ. ํ•ญ์ƒ ๋ฐ์ดํ„ฐ๋ฅผ ํ›ˆ๋ จ/๊ฒ€์ฆ(80/20)์œผ๋กœ ๋ถ„๋ฆฌํ•˜์‹ญ์‹œ์˜ค. ๊ณผ์ ํ•ฉ์„ ํฌ์ฐฉํ•˜๊ธฐ ์œ„ํ•ด ์ž์ฃผ ๊ฒ€์ฆํ•˜์‹ญ์‹œ์˜ค.
  • ํŒŒ์ธํŠœ๋‹๊ณผ ํ‰๊ฐ€์— ๋™์ผํ•œ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ. ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋กœ ํ‰๊ฐ€ํ•˜๋ฉด ๋ณด๊ณ ๋œ ์ •ํ™•๋„๋Š” ๋ฌด์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • ์ฒดํฌํฌ์ธํŠธ ๋ฏธ์ €์žฅ. ํ•™์Šต์—๋Š” ์ˆ˜ ์‹œ๊ฐ„์ด ๊ฑธ๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ถฉ๋Œ์—์„œ ๋ณต๊ตฌํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งค ์—ํฌํฌ๋งˆ๋‹ค ์ €์žฅํ•˜์‹ญ์‹œ์˜ค.

LoRA ํŒŒ์ธํŠœ๋‹์— ๊ด€ํ•œ ์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋Š” ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ?

์ตœ์†Œ 500๊ฐœ, ์ตœ์  1000~5000๊ฐœ์ž…๋‹ˆ๋‹ค. ์ˆ˜๋Ÿ‰๋ณด๋‹ค ํ’ˆ์งˆ์ด ๋” ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ณ ํ’ˆ์งˆ ์˜ˆ์ œ 100๊ฐœ๊ฐ€ ์ €ํ’ˆ์งˆ ์˜ˆ์ œ 1000๊ฐœ๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.

๋…ธํŠธ๋ถ์—์„œ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. 4๋น„ํŠธ ์–‘์žํ™”์™€ LoRA๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค. 7B ๋ชจ๋ธ์—๋Š” 8 GB VRAM์ด ํ•„์š”ํ•˜๋ฉฐ, ํ•™์Šต ์‹œ๊ฐ„์€ CPU๋กœ 1~2์‹œ๊ฐ„(๋А๋ฆผ) ๋˜๋Š” GPU๋กœ 10~15๋ถ„ ์ •๋„ ๊ฑธ๋ฆฝ๋‹ˆ๋‹ค.

LoRA ์–ด๋Œ‘ํ„ฐ๋ฅผ ๋ฒ ์ด์Šค ๋ชจ๋ธ์— ์–ด๋–ป๊ฒŒ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๊นŒ?

unsloth ๋˜๋Š” HF transformers๋ฅผ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค: `model.merge_and_unload()`. ์ถ”๋ก ์— ๋ฐ”๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋‹จ์ผ ๋ชจ๋ธ ํŒŒ์ผ(7B์˜ ๊ฒฝ์šฐ ์•ฝ 3~4 GB)์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ LoRA ์–ด๋Œ‘ํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์ œํ•œ์ ์œผ๋กœ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์ฐจ ์ ์šฉ์„ ์œ„ํ•ด ์–ด๋Œ‘ํ„ฐ๋ฅผ ์Šคํƒํ•˜๊ฑฐ๋‚˜ ์–ด๋Œ‘ํ„ฐ ์ปดํฌ์ง€์…˜ ๊ธฐ๋ฒ•(์˜ˆ: DoRA)์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.

ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ ํ’ˆ์งˆ์ด RAG๋ณด๋‹ค ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๊นŒ?

๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์—์„œ ๊ทธ๋ ‡์Šต๋‹ˆ๋‹ค. ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์€ ๋„๋ฉ”์ธ ๊ฐœ๋…์„ ๊นŠ์ด ์ดํ•ดํ•ฉ๋‹ˆ๋‹ค. RAG๋Š” ๋ฌธ์„œ๊ฐ€ ํฌ๊ณ  ์ž์ฃผ ๋ณ€๊ฒฝ๋  ๋•Œ ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

LoRA์™€ QLoRA์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

LoRA๋Š” ๋ฒ ์ด์Šค ๋ชจ๋ธ์„ 16๋น„ํŠธ๋กœ ๋กœ๋“œํ•˜๊ณ  ์†Œํ˜• ์–ด๋Œ‘ํ„ฐ ๋ ˆ์ด์–ด๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. QLoRA๋Š” ๋ฒ ์ด์Šค ๋ชจ๋ธ์„ 4๋น„ํŠธ๋กœ ๋กœ๋“œํ•˜๊ณ  ์–ด๋Œ‘ํ„ฐ๋ฅผ 16๋น„ํŠธ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. QLoRA๋Š” VRAM์„ ์•ฝ ์ ˆ๋ฐ˜ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค: 7B์˜ ๊ฒฝ์šฐ LoRA 16 GB ๋Œ€๋น„ 8 GB. ํ’ˆ์งˆ ์ฐจ์ด๋Š” ์•ฝ 2% โ€” ๋Œ€๋ถ€๋ถ„์˜ ์ž‘์—…์—์„œ ๋ฌด์‹œํ•  ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค. Unsloth์—์„œ `load_in_4bit=True`๋กœ QLoRA๋ฅผ ํ™œ์„ฑํ™”ํ•˜์‹ญ์‹œ์˜ค.

Ollama์—์„œ LoRA ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์„ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•ฉ๋‹ˆ๊นŒ?

ํ•™์Šต ํ›„ ์–ด๋Œ‘ํ„ฐ๋ฅผ ๋ฒ ์ด์Šค ๋ชจ๋ธ์— ๋ณ‘ํ•ฉํ•˜์‹ญ์‹œ์˜ค: `model.merge_and_unload()`. llama.cpp์˜ `convert.py`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ GGUF๋กœ ๋ณ€ํ™˜ํ•˜์‹ญ์‹œ์˜ค. GGUF ํŒŒ์ผ์„ ๊ฐ€๋ฆฌํ‚ค๋Š” Ollama Modelfile์„ ์ƒ์„ฑํ•˜์‹ญ์‹œ์˜ค: `FROM ./my-finetuned-model.gguf` ๊ทธ๋Ÿฐ ๋‹ค์Œ: `ollama create my-model -f Modelfile` ๋ฐ `ollama run my-model`. ํŒŒ์ธํŠœ๋‹๋œ ๋ชจ๋ธ์€ ๋‹ค๋ฅธ Ollama ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค.

LoRA๋กœ ์†Œ๋น„์ž ํ•˜๋“œ์›จ์–ด์—์„œ Llama 3.3 70B๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

QLoRA๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. Llama 3.3 70B 4๋น„ํŠธ๋Š” ์•ฝ 40 GB VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค โ€” ๋“€์–ผ RTX 4090(2ร—24 GB) ๋˜๋Š” A100 80GB์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต ์‹œ๊ฐ„: 1000๊ฐœ ์˜ˆ์ œ๋กœ 4~8์‹œ๊ฐ„. ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ์ž์—๊ฒŒ๋Š” 7B ๋˜๋Š” 13B ๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹์ด ๋” ์‹ค์šฉ์ ์ด๋ฉฐ ๋„๋ฉ”์ธ ์ž‘์—…์—์„œ 70B ํ’ˆ์งˆ ํ–ฅ์ƒ์˜ 90% ์ด์ƒ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์ถœ์ฒ˜

  • Hu, E. et al. (2021). "LoRA: Low-Rank Adaptation of Large Language Models." https://arxiv.org/abs/2106.09685 โ€” ํ•™์Šต ๊ฐ€๋Šฅํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ 0.4%๋กœ ์™„์ „ ํŒŒ์ธํŠœ๋‹ ํ’ˆ์งˆ์— ํ•„์ ํ•จ์„ ๋ณด์—ฌ์ฃผ๋Š” LoRA ์›๋…ผ๋ฌธ.
  • Dettmers, T. et al. (2023). "QLoRA: Efficient Finetuning of Quantized LLMs." https://arxiv.org/abs/2305.14314 โ€” QLoRA ๋…ผ๋ฌธ: 4๋น„ํŠธ ์–‘์žํ™” ๋ฒ ์ด์Šค ๋ชจ๋ธ + 16๋น„ํŠธ LoRA ์–ด๋Œ‘ํ„ฐ๋กœ VRAM ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ž„.
  • Unsloth. (2026). "Unsloth: 4ร— Faster LoRA Training." https://github.com/unslothai/unsloth โ€” ๊ฐ€์žฅ ๋น ๋ฅธ LoRA ํ”„๋ ˆ์ž„์›Œํฌ, Llama 3.x, Qwen3, Mistral์„ 4๋ฐฐ ํ•™์Šต ๊ฐ€์†์œผ๋กœ ์ง€์›.
  • Hugging Face. (2025). "TRL: Transformer Reinforcement Learning." https://github.com/huggingface/trl โ€” LoRA ์–ด๋Œ‘ํ„ฐ ์ง€์›์ด ํฌํ•จ๋œ ์ง€๋„ ํŒŒ์ธํŠœ๋‹์„ ์œ„ํ•œ SFTTrainer.
  • Test PE link content
  • ํŒŒ์ธํŠœ๋‹์€ ๊ธฐ๋ฐ˜์ด ํƒ„ํƒ„ํ•  ๋•Œ ๊ฐ€์žฅ ํšจ๊ณผ์ ์ž…๋‹ˆ๋‹ค. LoRA์— ์‹œ๊ฐ„์„ ํˆฌ์žํ•˜๊ธฐ ์ „์— ๋ฒ ์ด์Šค ํ”„๋กฌํ”„ํŠธ๊ฐ€ ์ตœ์ ํ™”๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค: ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋ฏธ์กฐ์ • ๋ชจ๋ธ์˜ ์ถœ๋ ฅ ํ’ˆ์งˆ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” 80๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both โ€” you pick the backend.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

LoRA ํŒŒ์ธํŠœ๋‹ ๋กœ์ปฌ LLM 2026: 8 GB VRAM์—์„œ Unsloth ์‚ฌ์šฉ | PromptQuorum