Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/๋กœ์ปฌ LLM์— VRAM์ด ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ? 7B~70B ์ฐจํŠธ (2026)
GPU ๊ตฌ๋งค ๊ฐ€์ด๋“œ

๋กœ์ปฌ LLM์— VRAM์ด ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ? 7B~70B ์ฐจํŠธ (2026)

ยท7๋ถ„ยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

7B ๋ชจ๋ธ์—๋Š” 8GB VRAM์ด ํ•„์š”ํ•˜๊ณ , 13B~22B์—๋Š” 12~16GB, 70B์—๋Š” ์ตœ์†Œ 24GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. 2026๋…„ 4์›” ๊ธฐ์ค€, ์ด ์ˆ˜์น˜๋Š” Q4(4๋น„ํŠธ) ์–‘์žํ™”๋ฅผ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

7B ๋ชจ๋ธ์—๋Š” 8GB VRAM์ด ํ•„์š”ํ•˜๊ณ , 13B~22B์—๋Š” 12~16GB, 70B์—๋Š” ์ตœ์†Œ 24GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. 2026๋…„ 4์›” ๊ธฐ์ค€, ์ด ์ˆ˜์น˜๋Š” Q4(4๋น„ํŠธ) ์–‘์žํ™”๋ฅผ ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ์™„์ „ ์ •๋ฐ€๋„(FP32) ๋ชจ๋ธ์€ VRAM์ด 2~3๋ฐฐ ๋” ํ•„์š”ํ•˜๋ฉฐ, ์†Œ๋น„์ž์šฉ GPU์—์„œ๋Š” ๊ฑฐ์˜ ์‹ค์šฉ์ ์ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๊ณต์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค: ๋ชจ๋ธ ํฌ๊ธฐ(์‹ญ์–ต ๋‹จ์œ„) ร— 2๋ฐ”์ดํŠธ(FP32) รท ์–‘์žํ™” ๊ณ„์ˆ˜.

Key Takeaways

  • 7B ๋ชจ๋ธ: ์ตœ์†Œ 8GB(Q4), 10GB ๊ถŒ์žฅ(Q5), Q8 ์™„์ „ ์ •๋ฐ€๋„์—๋Š” 14GB.
  • 13B ๋ชจ๋ธ: ์ตœ์†Œ 10GB(Q4), 12~14GB ๊ถŒ์žฅ(Q5), Q8์—๋Š” 16GB.
  • 70B ๋ชจ๋ธ: ์ตœ์†Œ 24GB(Q4), Q5/Q8 ๋˜๋Š” ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์„ค์ •์—๋Š” 32GB ์ด์ƒ.
  • ์–‘์žํ™”(Q4, Q5, Q8)๋Š” ์™„์ „ ์ •๋ฐ€๋„(FP32) ๋Œ€๋น„ VRAM์„ 50~75% ์ ˆ๊ฐํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ค๋ฒ„ํ—ค๋“œ(KV ์บ์‹œ, ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ, ์‹œ์Šคํ…œ OS)๋ฅผ ์œ„ํ•ด ํ•ญ์ƒ 1~2GB๋ฅผ ์ถ”๊ฐ€๋กœ ํ™•๋ณดํ•˜์‹ญ์‹œ์˜ค.
  • ๋ฐฐ์น˜ ํฌ๊ธฐ โ‰  ์ถ”๋ก ๋‹น VRAM. ๋‹จ์ผ ์ถ”๋ก ์€ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๊ด€๊ณ„์—†์ด ๋™์ผํ•œ VRAM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค(๋ฐฐ์น˜๋Š” ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค).
  • ๋” ๋งŽ์€ VRAM์ด ๋‹จ์ผ ํ”„๋กฌํ”„ํŠธ ์ถ”๋ก ์„ ๋น ๋ฅด๊ฒŒ ๋งŒ๋“ค์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค์ค‘ ์‚ฌ์šฉ์ž/๋‹ค์ค‘ ์š”์ฒญ ์„ค์ •์—๋งŒ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

VRAM ๊ฒฝํ—˜ ๋ฒ•์น™ โ€” ๋น ๋ฅธ ์ฐธ์กฐ

๊ณต์‹์ด ๋ณต์žกํ•˜์‹ญ๋‹ˆ๊นŒ? ์ด ๊ฐ„๋‹จํ•œ ๊ทœ์น™์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค:

VRAM ์˜ˆ์‚ฐ์„ ํŒŒ์•…ํ•˜์…จ๋‹ค๋ฉด, ๊ฐ ๋“ฑ๊ธ‰์— ๋งž๋Š” GPU๋ฅผ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค โ†’

  • 3B ๋ชจ๋ธ (Phi, StableLM): ์ตœ์†Œ 4GB VRAM
  • 7B ๋ชจ๋ธ (Llama, Mistral, Qwen): 8GB VRAM(Q4), 10GB(Q5)
  • 13B ๋ชจ๋ธ (Llama 3.3, Mistral): ์ตœ์†Œ 12GB VRAM(Q4)
  • 22B ๋ชจ๋ธ (Qwen3, Gemma): 16GB VRAM(Q4)
  • 70B ๋ชจ๋ธ (Llama 3.3, Qwen 3.6): 24~32GB VRAM(Q4~Q5)
  • MoE ๋ชจ๋ธ: VRAM์€ ๋ฉ”๋ชจ๋ฆฌ์— ๋ณด๊ด€ํ•ด์•ผ ํ•˜๋Š” ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ ํ™•์žฅ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ: Qwen 3.6 35B-A3B(3B ํ™œ์„ฑ)๋Š” ์•ฝ 2GB์˜ ์ž‘์€ ๊ณต๊ฐ„์— ๋“ค์–ด๋งž์ง€๋งŒ, Llama 4 Scout(17B ํ™œ์„ฑ / 109B ์ด๋Ÿ‰)๋Š” ๋ชจ๋“  ์ „๋ฌธ๊ฐ€๊ฐ€ ์ƒ์ฃผํ•˜๊ธฐ ๋•Œ๋ฌธ์— Q4์—์„œ ์—ฌ์ „ํžˆ ์•ฝ 55GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
bash
# Quick VRAM formula (memorize this)
VRAM (GB) โ‰ˆ Model Size (B) รท 8  # at Q4 quantization

# Examples:
7B รท 8 = 0.875 GB per billion โ‰ˆ 8 GB total
70B รท 8 = 8.75 GB per billion โ‰ˆ 48 GB total

# For other quantizations:
Q8 (8-bit): Model Size รท 4
Q5 (5-bit): Model Size รท 5
FP32 (full): Model Size ร— 4

LLM์˜ VRAM ๊ณต์‹์ด๋ž€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

VRAM (GB) = (๋ชจ๋ธ ํฌ๊ธฐ(์‹ญ์–ต ๋‹จ์œ„) ร— 4๋ฐ”์ดํŠธ ร— ์–‘์žํ™” ๊ณ„์ˆ˜)

  • ๋ชจ๋ธ ํฌ๊ธฐ: ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜(7B, 13B, 70B ๋“ฑ)
  • 4๋ฐ”์ดํŠธ: FP32 ์ •๋ฐ€๋„(1๋ฐ”์ดํŠธ = 8๋น„ํŠธ)
  • ์–‘์žํ™” ๊ณ„์ˆ˜: 1.0(FP32), 0.5(Q8), 0.25(Q4)

์˜ˆ์‹œ: Llama 3 70B, FP32, ์–‘์žํ™” ์—†์Œ:

700์–ต ร— 4๋ฐ”์ดํŠธ = 280GB. ๋น„์‹ค์šฉ์ ์ž…๋‹ˆ๋‹ค.

Llama 3 70B, Q4(4๋น„ํŠธ) ์–‘์žํ™”:

700์–ต ร— 4๋ฐ”์ดํŠธ ร— 0.25 = 70GB ํ• ๋‹น, ์••์ถ• ํ›„ ์•ฝ 24GB ์‚ฌ์šฉ.

MoE ๋ชจ๋ธ(ํฌ์†Œํ˜•): ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์—ฐ์‚ฐ์„ ์ฒ˜๋ฆฌํ•˜์ง€๋งŒ, ๋ชจ๋“  ์ „๋ฌธ๊ฐ€๋Š” VRAM์— ๋กœ๋“œ๋œ ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ์‹œ: Llama 4 Scout๋Š” 109B์˜ ์ด ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ํ† ํฐ๋‹น 17B๊ฐ€ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. Q4์—์„œ๋„ ๋ชจ๋“  ์ „๋ฌธ๊ฐ€๋ฅผ ๋ณด๊ด€ํ•˜๊ธฐ ์œ„ํ•ด ์•ฝ 55GB์˜ VRAM์ด ํ•„์š”ํ•˜๋ฉฐ, ๊ณต๊ฒฉ์ ์ธ 1.78๋น„ํŠธ ์–‘์žํ™”(์•ฝ 20 tok/s)์—์„œ๋งŒ 24GB GPU์— ๋“ค์–ด๋งž์Šต๋‹ˆ๋‹ค. ์—ฐ์‚ฐ์€ ์ €๋ ดํ•˜์ง€๋งŒ, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์ œ์•ฝ ์กฐ๊ฑด์ž…๋‹ˆ๋‹ค.

๊ฐ ๋ชจ๋ธ ํฌ๊ธฐ์— ํ•„์š”ํ•œ VRAM์€ ์–ผ๋งˆ์ž…๋‹ˆ๊นŒ?

๋ชจ๋ธ ํฌ๊ธฐFP32 (์–‘์žํ™” ์—†์Œ)Q8 (8๋น„ํŠธ)Q5 (5๋น„ํŠธ)Q4 (4๋น„ํŠธ)๊ถŒ์žฅ GPU
3B (Phi, StableLM)12 GB6 GB4 GB3 GBRTX 2060 6 GB ๋˜๋Š” RTX 5070 12 GB
7B (Llama 3.3, Mistral)28 GB14 GB9 GB7 GBRTX 3060 12 GB ๋˜๋Š” RTX 5070 12 GB
13B (Llama 3.3, Mistral)52 GB26 GB17 GB13 GBRTX 3090 24 GB ๋˜๋Š” RTX 5080 16 GB
22B (Qwen, Gemma)88 GB44 GB28 GB22 GBRTX 4090 24 GB(Q4) ๋˜๋Š” RTX 5090 32 GB
70B (Llama 3, Qwen)280 GB140 GB88 GB70 GB2ร— RTX 4090(๊ฐ 24 GB), ๋˜๋Š” 1ร— H100 80 GB
Qwen 3.6 35B-A3B (3B ํ™œ์„ฑ, MoE)*12 GB3 GB2 GB2 GBRTX 2060 6 GB ๋˜๋Š” RTX 5070 12 GB
DeepSeek V4-Flash (13B ํ™œ์„ฑ / 284B ์ด๋Ÿ‰, MoE)*52 GB13 GB8 GB7 GBRTX 3060 12 GB ๋˜๋Š” RTX 5070 12 GB
Llama 4 Scout (17B ํ™œ์„ฑ / 109B ์ด๋Ÿ‰, MoE)โ€ 436 GB109 GB68 GB55 GB2ร— RTX 4090(48 GB) โ€” 1.78๋น„ํŠธ์—์„œ๋งŒ 24 GB์— ๋“ค์–ด๋งž์Œ(์•ฝ 20 tok/s)
gpt-oss:20b (3.6B ํ™œ์„ฑ / 21B ์ด๋Ÿ‰, MoE)*84 GB21 GB13 GB12 GBRTX 5070 12 GB ๋˜๋Š” 16 GB GPU
Kimi K2.6 (32B ํ™œ์„ฑ / 1T ์ด๋Ÿ‰, MoE)*128 GB32 GB20 GB16 GB2ร— RTX 4090 ๋˜๋Š” RTX 5090 32 GB(Q4 ์ „์šฉ)

* MoE ๋ชจ๋ธ: VRAM์€ ์ด ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ€ ์•„๋‹Œ ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ๋งŒ ๊ณ„์‚ฐ๋ฉ๋‹ˆ๋‹ค. โ€  Llama 4 Scout๋Š” 109B ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋ชจ๋‘ ์ƒ์ฃผ์‹œํ‚ค๋ฏ€๋กœ, ํ† ํฐ๋‹น 17B๋งŒ ํ™œ์„ฑํ™”๋˜๋”๋ผ๋„ Q4์—์„œ ์•ฝ 55 GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

MoE ๋ชจ๋ธ์€ ํฌ๊ธฐ์— ๋น„ํ•ด ํ›จ์”ฌ ์ ์€ VRAM์„ ํ•„์š”๋กœ ํ•ฉ๋‹ˆ๋‹ค

Mixture-of-Experts(MoE) ๋ชจ๋ธ์€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์—ฌ๋Ÿฌ "์ „๋ฌธ๊ฐ€" ์„œ๋ธŒ๋„คํŠธ์›Œํฌ์— ๋ถ„์‚ฐ์‹œํ‚ค๊ณ  ๊ฐ ํ† ํฐ์— ๋Œ€ํ•ด ์ผ๋ถ€๋งŒ ํ™œ์„ฑํ™”ํ•ฉ๋‹ˆ๋‹ค. ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์—ฐ์‚ฐ์„ ์ค„์ด๊ณ  ์ถ”๋ก ์„ ๋น ๋ฅด๊ฒŒ ํ•˜์ง€๋งŒ, ๋Œ€๋ถ€๋ถ„์˜ MoE ๋ชจ๋ธ์—์„œ ๋ชจ๋“  ์ „๋ฌธ๊ฐ€๋Š” ์—ฌ์ „ํžˆ VRAM์— ๋กœ๋“œ๋˜์–ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค โ€” ๋”ฐ๋ผ์„œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์€ ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์•„๋‹Œ ์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

๋ฐ€์ง‘ ๋ชจ๋ธ ๊ทœ์น™: VRAM = ์ด_ํŒŒ๋ผ๋ฏธํ„ฐ ร— ํŒŒ๋ผ๋ฏธํ„ฐ๋‹น_๋ฐ”์ดํŠธ

MoE ๋ชจ๋ธ ๊ทœ์น™(์—ฐ์‚ฐ): ํ™œ์„ฑ_ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ดˆ๋‹น ํ† ํฐ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค โ€” ํ•˜์ง€๋งŒ VRAM์€ ์—ฌ์ „ํžˆ ์ด ์ƒ์ฃผ ๊ฐ€์ค‘์น˜์— ๋”ฐ๋ผ ํ™•์žฅ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ: Llama 4 Scout๋Š” 109B์˜ ์ด ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ํ† ํฐ๋‹น 17B๋งŒ ํ™œ์„ฑํ™”๋ฉ๋‹ˆ๋‹ค. ํฌ๊ธฐ ๋Œ€๋น„ ๋น ๋ฅด์ง€๋งŒ, Q4์—์„œ ๋ชจ๋“  ์ „๋ฌธ๊ฐ€๋ฅผ ๋ณด๊ด€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ์ „ํžˆ ์•ฝ 55 GB์˜ VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค โ€” ๊ณต๊ฒฉ์ ์ธ 1.78๋น„ํŠธ ์–‘์žํ™”(RTX 4090์—์„œ ์•ฝ 20 tok/s)๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ํ•œ ๋‹จ์ผ 24 GB GPU๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

์ผ๋ถ€ ๋Ÿฐํƒ€์ž„์€ ๋น„ํ™œ์„ฑ ์ „๋ฌธ๊ฐ€๋ฅผ ์‹œ์Šคํ…œ RAM์œผ๋กœ ์ŠคํŠธ๋ฆฌ๋ฐํ•˜๊ฑฐ๋‚˜ ์˜คํ”„๋กœ๋“œํ•  ์ˆ˜ ์žˆ์–ด, ์†๋„๋ฅผ ํฌ์ƒํ•˜๋ฉด์„œ VRAM ์‚ฌ์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•ต์‹ฌ ๊ฒฐ๋ก : MoE ๋ชจ๋ธ์ด ํ™œ์„ฑ ํŒŒ๋ผ๋ฏธํ„ฐ ํฌ๊ธฐ์˜ VRAM์— ๋“ค์–ด๋งž๋Š”๋‹ค๊ณ  ๊ฐ€์ •ํ•˜์ง€ ๋งˆ์‹ญ์‹œ์˜ค โ€” ์„ ํƒํ•œ ์–‘์žํ™” ์ˆ˜์ค€์—์„œ ์‹ค์ œ ์˜จ๋””์Šคํฌ ํฌ๊ธฐ๋ฅผ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

์–‘์žํ™”๋Š” ์–ด๋–ป๊ฒŒ VRAM ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ž…๋‹ˆ๊นŒ?

์–‘์žํ™”๋Š” ๊ฐ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ๋น„ํŠธ ์ˆ˜๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.

  • FP32(32๋น„ํŠธ ๋ถ€๋™์†Œ์ˆ˜์ ): ์™„์ „ ์ •๋ฐ€๋„. ํŒŒ๋ผ๋ฏธํ„ฐ 1๊ฐœ = 4๋ฐ”์ดํŠธ. ์†์‹ค ์—†์Œ. ๊ฐ€์žฅ ๋А๋ฆผ.
  • Q8(8๋น„ํŠธ): ํŒŒ๋ผ๋ฏธํ„ฐ 1๊ฐœ = 1๋ฐ”์ดํŠธ. ์•ฝ 6% ์ •ํ™•๋„ ์†์‹ค. 75% VRAM ์ ˆ๊ฐ.
  • Q5(5๋น„ํŠธ): ํŒŒ๋ผ๋ฏธํ„ฐ 1๊ฐœ = 0.625๋ฐ”์ดํŠธ. ์•ฝ 2% ์ •ํ™•๋„ ์†์‹ค. 84% VRAM ์ ˆ๊ฐ.
  • Q4(4๋น„ํŠธ): ํŒŒ๋ผ๋ฏธํ„ฐ 1๊ฐœ = 0.5๋ฐ”์ดํŠธ. ์•ฝ 1% ์ •ํ™•๋„ ์†์‹ค. 87.5% VRAM ์ ˆ๊ฐ.

๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ์ž์—๊ฒŒ Q4๊ฐ€ ์ตœ์ ์ ์ž…๋‹ˆ๋‹ค: ๊ฐ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ์ •ํ™•๋„ ์†์‹ค, 87% ๋” ์ž‘์€ VRAM ๊ณต๊ฐ„.

2026๋…„ 4์›” ๊ธฐ์ค€, Q4๊ฐ€ ํ‘œ์ค€์ž…๋‹ˆ๋‹ค. ์—ฌ์œ  VRAM์ด ์žˆ๊ณ  ์•ฝ๊ฐ„์˜ ํ’ˆ์งˆ ํ–ฅ์ƒ์„ ์›ํ•˜์‹œ๋ฉด Q5์™€ Q8๋„ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

VRAM์€ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•˜์ง€๋งŒ, ํ”„๋กฌํ”„ํŠธ ์„ค๊ณ„๊ฐ€ ์ถœ๋ ฅ ํ’ˆ์งˆ์„ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์—ฐ์‡„ ์‚ฌ๊ณ (chain-of-thought) ๋ฐ ํ“จ์ƒท(few-shot) ํ”„๋กฌํ”„ํŒ…๊ณผ ๊ฐ™์€ ๊ธฐ๋ฒ•์€ ์†Œ๊ทœ๋ชจ ๋ชจ๋ธ๊ณผ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ๊ฐ„์˜ ํ’ˆ์งˆ ๊ฒฉ์ฐจ๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด๊ฐ€ ์ง€์›ํ•˜๋Š” ๋ชจ๋ธ์—์„œ ๋” ๋งŽ์€ ๊ฒƒ์„ ์–ป์œผ๋ ค๋ฉด ํ”„๋กฌํ”„ํŠธ ์—”์ง€๋‹ˆ์–ด๋ง ํˆดํ‚ท์„ ์‚ดํŽด๋ณด์‹ญ์‹œ์˜ค. 12~16 GB VRAM์ด ์žˆ๊ณ  ํ•ด๋‹น ํˆดํ‚ท์„ ์ ์šฉํ•  ๊ตฌ์ฒด์ ์ธ ์ฝ”๋”ฉ ์ž‘์—…์ด ํ•„์š”ํ•˜๋‹ค๋ฉด, ๋กœ์ปฌ LLM์œผ๋กœ GitHub Copilot ๋Œ€์ฒดํ•˜๊ธฐ์—์„œ Continue.dev + Ollama + Qwen3-Coder ์Šคํƒ์„ ์ •ํ™•ํžˆ ํ•ด๋‹น VRAM ๋“ฑ๊ธ‰์— ๋งž๊ฒŒ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์ถ”๋ก ์— ๋Œ€ํ•ด

๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰(์ดˆ๋‹น ํ† ํฐ ์ˆ˜)์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฉฐ, ๋‹จ์ผ ์ถ”๋ก  ์ง€์—ฐ ์‹œ๊ฐ„์—๋Š” ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

"2+2๋Š” ์–ผ๋งˆ์ž…๋‹ˆ๊นŒ?"๋ผ๊ณ  ๋ฌป๋Š” ๋‹จ์ผ ์‚ฌ์šฉ์ž๋Š” ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 1์ด๋“  32์ด๋“  ๋™์ผํ•œ VRAM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐ์น˜ ํฌ๊ธฐ = 32๋Š” 32๊ฐœ์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ์•ฝ 32๋ฐฐ ๋” ๋งŽ์€ VRAM์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ, 32๊ฐœ์˜ ์‘๋‹ต์„ ๋” ๋น ๋ฅด๊ฒŒ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๋‹จ์ผ ์‚ฌ์šฉ์ž(์ผ๋ฐ˜์ ์ธ ๋กœ์ปฌ LLM ์‚ฌ์šฉ): ๋ฐฐ์น˜ ํฌ๊ธฐ = 1. VRAM์€ ๋ชจ๋ธ ํฌ๊ธฐ + 1~2GB ์˜ค๋ฒ„ํ—ค๋“œ.

๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์„œ๋ฒ„: ๋ฐฐ์น˜ ํฌ๊ธฐ ร— ๋ชจ๋ธ VRAM์„ ํ• ๋‹นํ•˜์‹ญ์‹œ์˜ค. batch=4์˜ 70B ๋ชจ๋ธ์€ ์•ฝ 96GB(24GB ร— 4)๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ํฌ๊ธฐ๋ณด๋‹ค ๋” ๋งŽ์€ VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ?

์˜ˆ. ๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ์™ธ์— ์ถ”๊ฐ€ํ•˜์‹ญ์‹œ์˜ค:

  • KV ์บ์‹œ(์ปจํ…์ŠคํŠธ์šฉ ํ‚ค-๊ฐ’ ์บ์‹œ): ์•ฝ 5~10% ์ถ”๊ฐ€ VRAM.
  • ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ(ํŒŒ์ธํŠœ๋‹ ์‹œ): ๋ชจ๋ธ ํฌ๊ธฐ์˜ 2~4๋ฐฐ(ํ•™์Šต์—๋งŒ ํ•ด๋‹น, ์ถ”๋ก ์—๋Š” ๋ฌด๊ด€).
  • ์‹œ์Šคํ…œ ์˜ค๋ฒ„ํ—ค๋“œ(OS, ๋“œ๋ผ์ด๋ฒ„, Ollama/LM Studio ๋Ÿฐํƒ€์ž„): ์•ฝ 1~2GB.

๊ทœ์น™: 70B ๋ชจ๋ธ Q4(20GB) + KV ์บ์‹œ(2GB) + ์‹œ์Šคํ…œ(2GB) = ์•ฝ 24GB ํ• ๋‹น.

ํ•ญ์ƒ ์ด๋ก ์  ์ตœ์†Œ์น˜๋ณด๋‹ค ์ตœ์†Œ 1~2GB ์—ฌ์œ ๊ฐ€ ์žˆ๋Š” GPU๋ฅผ ๊ตฌ๋งคํ•˜์‹ญ์‹œ์˜ค.

VRAM์— ๊ด€ํ•œ ์ผ๋ฐ˜์ ์ธ ์˜คํ•ด

  • VRAM์ด ๋งŽ์„์ˆ˜๋ก ์ถ”๋ก ์ด ๋น ๋ฆ…๋‹ˆ๋‹ค. ํ‹€๋ ธ์Šต๋‹ˆ๋‹ค. VRAM ํฌ๊ธฐ๋Š” ์†๋„์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ(GB/์ดˆ)์ด ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฉฐ, ์ด๋Š” GPU๋งˆ๋‹ค ๊ณ ์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฐฐ์น˜ ํฌ๊ธฐ = ์ˆœ์ฐจ์  ํ† ํฐ ํ•œ๊ณ„. ํ‹€๋ ธ์Šต๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ = ๋ณ‘๋ ฌ ์š”์ฒญ. ๋‹จ์ผ ์ถ”๋ก ์€ VRAM ํฌ๊ธฐ์— ๊ด€๊ณ„์—†์ด batch=1์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • 70B ๋ชจ๋ธ์—๋Š” 24GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํ‹€๋ ธ์Šต๋‹ˆ๋‹ค. Q4๋Š” 24GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Q8์€ 48GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์–‘์žํ™”์— ๋”ฐ๋ผ ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

VRAM ๊ณ„์‚ฐ๊ธฐ

๋ชจ๋ธ ํฌ๊ธฐ์™€ ์–‘์žํ™”๋ฅผ ์„ ํƒํ•˜์—ฌ VRAM ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ถ”์ •ํ•˜์‹ญ์‹œ์˜ค.

Popular Models

Base Model

6.50 GB

Context OH

1.50 GB

Batch OH

0.00 GB

System OH

1.00 GB

Total Minimum

9.00 GB

Recommended (with 25% safety margin)

11.25 GB

๐Ÿ‘‰ Look for a GPU with at least 11.25 GB VRAM

Compatible GPUs

RTX 3060 (12 GB)

0.8 GB headroom

โš ๏ธ Tight

RTX 4070 (12 GB)

0.8 GB headroom

โš ๏ธ Tight

RTX 4070 Ti (12 GB)

0.8 GB headroom

โš ๏ธ Tight

RTX 4080 (16 GB)

4.8 GB headroom

โœ… Fits

RTX 4090 (24 GB)

12.8 GB headroom

โœ… Fits

Mac mini M5 (16 GB) (16 GB)

4.8 GB headroom

โœ… Fits

Mac mini M4 (16 GB) (16 GB)

4.8 GB headroom

โœ… Fits

MacBook Pro (24 GB) (24 GB)

12.8 GB headroom

โœ… Fits

M3 Max (36 GB) (36 GB)

24.8 GB headroom

โœ… Fits

๐Ÿ’ก Pro Tips:

  • Always use the "with safety margin" figure when buying a GPU
  • Q4 gives 90-95% quality with 25% size reduction. Q5 is better if you have room
  • Context overhead grows with conversation length. Budget 1-3 GB for typical usage
  • Batch size matters for multi-user APIs. Single-user chat can ignore batch overhead

๐Ÿ“‹ Share this configuration:

Loading...

FAQ

Mistral Small์„ 6GB GPU์—์„œ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

Q4์—์„œ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ๋นก๋นกํ•˜๊ฒŒ ์žก์œผ๋ฉด ๊ฐ„์‹ ํžˆ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์งˆ์ ์œผ๋กœ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์ตœ์†Œ 8GB๋ฅผ ๊ตฌ๋งคํ•˜์‹ญ์‹œ์˜ค. 6GB์—์„œ๋Š” OOM ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

7B ๋ชจ๋ธ์„ ํŒŒ์ธํŠœ๋‹ํ•˜๋ ค๋ฉด VRAM์ด ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ?

LoRA: 12~16GB. ์ „์ฒด ํŒŒ์ธํŠœ๋‹: 28GB ์ด์ƒ. ํŒŒ์ธํŠœ๋‹์€ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ(๋ชจ๋ธ VRAM์˜ 2~4๋ฐฐ)๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, ์ถ”๋ก ๋งŒ์ด ์•„๋‹™๋‹ˆ๋‹ค.

Llama 3 13B์— 12GB๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๊นŒ?

Q4์—์„œ๋Š” ๊ฐ„์‹ ํžˆ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. Q5 ๋˜๋Š” Q8์—์„œ๋Š” ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. 12GB๋Š” ๋น ๋“ฏํ•ฉ๋‹ˆ๋‹ค. 16GB๊ฐ€ ํŽธ์•ˆํ•ฉ๋‹ˆ๋‹ค.

70B ๋ชจ๋ธ์— 24GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ?

Q4์—์„œ๋Š” ์˜ˆ. Q5 ์ด์ƒ์—์„œ๋Š” ์•„๋‹™๋‹ˆ๋‹ค. ๋” ๋†’์€ ์–‘์žํ™”(Q5, Q8)๋Š” 70B์— 32GB ์ด์ƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋Š˜๋ฆฌ๋ฉด ๋‹จ์ผ ์ถ”๋ก ์˜ VRAM์ด ์ค„์–ด๋“ญ๋‹ˆ๊นŒ?

์•„๋‹™๋‹ˆ๋‹ค. ๋‹จ์ผ ์ถ”๋ก ์€ ํ•ญ์ƒ batch=1 VRAM์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” ์ฒ˜๋ฆฌ๋Ÿ‰(๋‹ค์ค‘ ์‚ฌ์šฉ์ž ์‹œ๋‚˜๋ฆฌ์˜ค)์—๋งŒ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.

์ •ํ™•๋„๋ฅผ ์œ„ํ•œ ์ตœ๊ณ ์˜ ์–‘์žํ™”๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

Q8์€ ๊ฑฐ์˜ ๊ฐ์ง€ํ•˜๊ธฐ ์–ด๋ ค์šด ์†์‹ค์ž…๋‹ˆ๋‹ค. Q5๋Š” ์•ฝ 2% ์†์‹ค. Q4๋Š” ์•ฝ 1% ์†์‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ Q4๊ฐ€ ์ตœ์ ์ ์ž…๋‹ˆ๋‹ค.

VRAM ์ผ๋ถ€๋ฅผ CPU RAM์œผ๋กœ ์˜คํ”„๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์˜ˆ, ๋ ˆ์ด์–ด ๋ถ„ํ• (NVLink)์„ ํ†ตํ•ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. Llama.cpp์™€ Ollama๊ฐ€ ์ด๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ์„ฑ๋Šฅ์€ 30~50% ์ €ํ•˜๋˜์ง€๋งŒ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. VRAM์ด 8GB ๋ฏธ๋งŒ์ด์‹ญ๋‹ˆ๊นŒ? **์ •ํ™•ํ•œ ํ•˜๋“œ์›จ์–ด ๋“ฑ๊ธ‰์—์„œ ๊ฐ€์žฅ ๋น ๋ฅธ ๋ชจ๋ธ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค** โ€” CPU ์ „์šฉ, 4 GB, 6 GB, 8 GB VRAM์˜ ์‹ค์ œ tok/์ดˆ ๋ฒค์น˜๋งˆํฌ.

์ฐธ๊ณ  ์ž๋ฃŒ

  • NVIDIA CUDA ๋ฉ”๋ชจ๋ฆฌ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ๊ณต์œ  ๋ฉ”๋ชจ๋ฆฌ ๋ชจ๋ธ ๋ฌธ์„œ
  • Ollama ๋ฐ LM Studio ๊ณต์‹ ๋ฌธ์„œ: ๋ชจ๋ธ VRAM ์š”๊ตฌ ์‚ฌํ•ญ ๋ฐ ์–‘์žํ™” ์‚ฌ์–‘
  • llama.cpp ํ”„๋กœ์ ํŠธ GitHub: ์–‘์žํ™” ์ˆ˜์ค€(Q4, Q5, Q8) ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ๊ณ„์‚ฐ

VRAM ์˜ˆ์‚ฐ์„ ํŒŒ์•…ํ•˜์…จ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ ํ•ฉํ•œ GPU๋ฅผ ์„ ํƒํ•˜์‹ญ์‹œ์˜ค.

๋กœ์ปฌ LLM์„ ์œ„ํ•œ ์ตœ๊ณ ์˜ ์˜ˆ์‚ฐ GPU โ†’

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both โ€” you pick the backend.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

๋กœ์ปฌ LLM์— VRAM์ด ์–ผ๋งˆ๋‚˜ ํ•„์š”ํ•ฉ๋‹ˆ๊นŒ? 7B~70B ์ฐจํŠธ (2026) | PromptQuorum