Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/2026๋…„ ๋กœ์ปฌ LLM ์˜ค๋ฅ˜ ํ•ด๊ฒฐ: Ollama, LM Studio, vLLM์˜ 10๊ฐ€์ง€ ์ฃผ์š” ๋ฌธ์ œ
Getting Started

2026๋…„ ๋กœ์ปฌ LLM ์˜ค๋ฅ˜ ํ•ด๊ฒฐ: Ollama, LM Studio, vLLM์˜ 10๊ฐ€์ง€ ์ฃผ์š” ๋ฌธ์ œ

ยท9๋ถ„ ์ฝ๊ธฐยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

๋กœ์ปฌ LLM์—์„œ ๊ฐ€์žฅ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์˜ค๋ฅ˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์ถฉ๋Œ, GPU ๋ฏธ๊ฐ์ง€, ๊ทน๋„๋กœ ๋А๋ฆฐ CPU ์ถ”๋ก , API ์—ฐ๊ฒฐ ๊ฑฐ๋ถ€, ๊ทธ๋ฆฌ๊ณ  ๋น„์ •์ƒ ์ถœ๋ ฅ์ž…๋‹ˆ๋‹ค.

๋กœ์ปฌ LLM์—์„œ ๊ฐ€์žฅ ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ์˜ค๋ฅ˜๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์ถฉ๋Œ, GPU ๋ฏธ๊ฐ์ง€, ๊ทน๋„๋กœ ๋А๋ฆฐ CPU ์ถ”๋ก , API ์—ฐ๊ฒฐ ๊ฑฐ๋ถ€, ๊ทธ๋ฆฌ๊ณ  ๋น„์ •์ƒ ์ถœ๋ ฅ์ž…๋‹ˆ๋‹ค. 2026๋…„ 4์›” ๊ธฐ์ค€, ์ด 10๊ฐ€์ง€ ์˜ค๋ฅ˜ ๋ชจ๋‘์— ๋Œ€ํ•œ ํ•ด๊ฒฐ์ฑ…์ด ์žˆ์œผ๋ฉฐ, ๋Œ€๋ถ€๋ถ„ ํ„ฐ๋ฏธ๋„ ๋ช…๋ น ํ•œ๋‘ ๊ฐœ๋งŒ์œผ๋กœ ํ•ด๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ€์ด๋“œ๋Š” Ollama(ํฌํŠธ 11434), LM Studio(ํฌํŠธ 1234), vLLM์„ ๋Œ€์ƒ์œผ๋กœ ๊ฐ ์˜ค๋ฅ˜์— ๋Œ€ํ•œ ์ •ํ™•ํ•œ ๋ช…๋ น์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Key Takeaways

  • ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ: ๋” ์ž‘์€ ์–‘์žํ™”(Q4_K_M โ†’ Q3_K_S)๋กœ ์ „ํ™˜ํ•˜๊ฑฐ๋‚˜ ๋” ์ž‘์€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
  • NVIDIA GPU ๋ฏธ๊ฐ์ง€: Linux์—์„œ ๋“œ๋ผ์ด๋ฒ„๋ฅผ 525+๋กœ, Windows์—์„œ 452+๋กœ ์—…๋ฐ์ดํŠธํ•˜์‹ญ์‹œ์˜ค. `nvidia-smi`๋กœ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
  • ๊ทน๋„๋กœ ๋А๋ฆฐ ์ถ”๋ก : CPU ์ „์šฉ์œผ๋กœ ์‹คํ–‰ ์ค‘์ž…๋‹ˆ๋‹ค. `OLLAMA_GPU_LAYERS` ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Ollama์—์„œ GPU ์˜คํ”„๋กœ๋”ฉ์„ ํ™œ์„ฑํ™”ํ•˜์‹ญ์‹œ์˜ค.
  • ์—ฐ๊ฒฐ ๊ฑฐ๋ถ€: Ollama๊ฐ€ ์‹คํ–‰ ์ค‘์ด ์•„๋‹™๋‹ˆ๋‹ค. `ollama serve`๋กœ ์‹œ์ž‘ํ•˜๊ฑฐ๋‚˜ ์„œ๋น„์Šค๋ฅผ ์žฌ์‹œ์ž‘ํ•˜์‹ญ์‹œ์˜ค.
  • ๋น„์ •์ƒ ์ถœ๋ ฅ: ์ž˜๋ชป๋œ ํ”„๋กฌํ”„ํŠธ ํ…œํ”Œ๋ฆฟ์ž…๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ณ€ํ˜•์ด ์•„๋‹Œ Instruct ๋ณ€ํ˜• ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
10๊ฐ€์ง€ ์ฃผ์š” ๋กœ์ปฌ LLM ์˜ค๋ฅ˜์™€ ์ฆ์ƒ ๋ฐ ํ•ด๊ฒฐ์ฑ… โ€” Ollama, LM Studio, vLLM ์„ค์ •์— ๋Œ€ํ•œ ๋น ๋ฅธ ์ฐธ์กฐ (2026๋…„ 4์›”).
10๊ฐ€์ง€ ์ฃผ์š” ๋กœ์ปฌ LLM ์˜ค๋ฅ˜์™€ ์ฆ์ƒ ๋ฐ ํ•ด๊ฒฐ์ฑ… โ€” Ollama, LM Studio, vLLM ์„ค์ •์— ๋Œ€ํ•œ ๋น ๋ฅธ ์ฐธ์กฐ (2026๋…„ 4์›”).

์˜ค๋ฅ˜ 1: "๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ" / OOM ์ถฉ๋Œ

๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ ์˜ค๋ฅ˜๋Š” ๋ชจ๋ธ์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฒƒ๋ณด๋‹ค ๋” ๋งŽ์€ RAM์ด ํ•„์š”ํ•˜๋‹ค๋Š” ์˜๋ฏธ์ด์ง€ ํ•˜๋“œ์›จ์–ด ์˜ค๋ฅ˜๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ์ฒ˜์Œ ์‚ฌ์šฉํ•˜๋Š” ๋ถ„๋“ค์—๊ฒŒ ๊ฐ€์žฅ ํ”ํ•œ ์˜ค๋ฅ˜์ž…๋‹ˆ๋‹ค. ์–‘์žํ™”๊ฐ€ RAM ์š”๊ตฌ ์‚ฌํ•ญ์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ฐฐ๊ฒฝ์€ LLM ์–‘์žํ™” ์„ค๋ช…์„ ์ฐธ๊ณ ํ•˜์‹ญ์‹œ์˜ค.

  • ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ RAM ํ™•์ธ: macOS/Linux์—์„œ `free -h`๋ฅผ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜, Windows์—์„œ ์ž‘์—… ๊ด€๋ฆฌ์ž โ†’ ์„ฑ๋Šฅ โ†’ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์—ฌ์‹ญ์‹œ์˜ค.
  • ๋” ์ž‘์€ ์–‘์žํ™”๋กœ ์ „ํ™˜: `Q8_0` ๋˜๋Š” `Q5_K_M`์„ `Q4_K_M`์œผ๋กœ ๊ต์ฒดํ•˜์‹ญ์‹œ์˜ค. Ollama์—์„œ๋Š” `ollama run llama3.2-instruct-q4_K_M`์„ ์‹คํ–‰ํ•˜์‹ญ์‹œ์˜ค.
  • ๋ชจ๋ธ ๋กœ๋“œ ์ „ ๋ฐฑ๊ทธ๋ผ์šด๋“œ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ข…๋ฃŒ โ€” ๋ธŒ๋ผ์šฐ์ €์™€ ๋‹ค๋ฅธ ์•ฑ์ด RAM์„ ์†Œ๋น„ํ•˜์—ฌ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค.
  • ๋” ์ž‘์€ ๋ชจ๋ธ๋กœ ์ „ํ™˜: 8GB RAM์—์„œ 8B ๋ชจ๋ธ์ด ์‹คํŒจํ•˜๋ฉด `llama3.2:3b`๋ฅผ ์‹œ๋„ํ•˜์‹ญ์‹œ์˜ค (์•ฝ 2.5GB๋งŒ ํ•„์š”).
๋ชจ๋ธ ํฌ๊ธฐ๋ณ„ ๋กœ์ปฌ LLM RAM ์š”๊ตฌ ์‚ฌํ•ญ: llama3.2 1Bโ€“3B๋Š” 8GB์— ์ ํ•ฉํ•˜๊ณ , 7Bโ€“8B ๋ชจ๋ธ์€ 16GB๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, 70B ๋ชจ๋ธ์€ Q4_K_M ์–‘์žํ™”์—์„œ 64GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๋ชจ๋ธ ํฌ๊ธฐ๋ณ„ ๋กœ์ปฌ LLM RAM ์š”๊ตฌ ์‚ฌํ•ญ: llama3.2 1Bโ€“3B๋Š” 8GB์— ์ ํ•ฉํ•˜๊ณ , 7Bโ€“8B ๋ชจ๋ธ์€ 16GB๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, 70B ๋ชจ๋ธ์€ Q4_K_M ์–‘์žํ™”์—์„œ 64GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Linux / macOS์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ RAM ํ™•์ธ

bash
# Linux
free -h

# macOS
vm_stat | grep "Pages free"

# macOS์—์„œ ๋” ์ฝ๊ธฐ ์‰ฝ๊ฒŒ
top -l 1 | grep "PhysMem"

์˜ค๋ฅ˜ 2: GPU๊ฐ€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์Œ (CPU ์ „์šฉ ์‹คํ–‰)

GPU๊ฐ€ ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉด LLM์ด ์˜ˆ์ƒ๋ณด๋‹ค 5~10๋ฐฐ ๋А๋ฆฌ๊ฒŒ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค โ€” ๋ฌด์—‡๋ณด๋‹ค ๋จผ์ € ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜๋ฅผ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. GPU๊ฐ€ ์‹œ์Šคํ…œ์—์„œ ์ธ์‹๋˜๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค:

bash
# NVIDIA โ€” GPU ์ด๋ฆ„๊ณผ ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „์ด ํ‘œ์‹œ๋˜์–ด์•ผ ํ•จ
nvidia-smi

# Linux์˜ AMD
rocm-smi

# macOS โ€” Metal ์‚ฌ์šฉ ๊ฐ€๋Šฅ ์—ฌ๋ถ€ ํ™•์ธ
system_profiler SPDisplaysDataType | grep "Metal"
CPU ์ „์šฉ vs GPU ํ™œ์„ฑ: CPU์˜ Ollama๋Š” 2โ€“8 tok/s๋ฅผ ์ œ๊ณตํ•˜๊ณ , GPU ๋ชจ๋“œ๋Š” 30โ€“120 tok/s๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ollama ps ๋˜๋Š” nvidia-smi๋กœ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
CPU ์ „์šฉ vs GPU ํ™œ์„ฑ: CPU์˜ Ollama๋Š” 2โ€“8 tok/s๋ฅผ ์ œ๊ณตํ•˜๊ณ , GPU ๋ชจ๋“œ๋Š” 30โ€“120 tok/s๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ollama ps ๋˜๋Š” nvidia-smi๋กœ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

Ollama์—์„œ GPU๋ฅผ ํ™œ์„ฑํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•์€?

  • Linux์˜ NVIDIA: NVIDIA ๋“œ๋ผ์ด๋ฒ„ 525+ ๋ฐ CUDA Toolkit 11.3+๋ฅผ ์„ค์น˜ํ•˜์‹ญ์‹œ์˜ค. Ollama๋Š” ์žฌ์‹œ์ž‘ ์‹œ CUDA๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • Windows์˜ NVIDIA: ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „์ด 452.39 ์ด์ƒ์ธ์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. Ollama๋Š” Windows ์„ค์น˜ ํ”„๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ์ž๋™์œผ๋กœ CUDA ์ง€์›์„ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค.
  • Linux์˜ AMD: ROCm 5.7+๋ฅผ ์„ค์น˜ํ•˜์‹ญ์‹œ์˜ค. ๊ฐ์ง€์— ์‹คํŒจํ•˜๋ฉด RX 6000 ์‹œ๋ฆฌ์ฆˆ ์นด๋“œ์— `HSA_OVERRIDE_GFX_VERSION=11.0.0`์„ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค.
  • Apple Silicon: Ollama๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Metal์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค โ€” ๋ณ„๋„ ์„ค์ •์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ๋ชจ๋ธ ๋กœ๋“œ ํ›„ `ollama ps`๋กœ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ์ถœ๋ ฅ์— GPU ๋ ˆ์ด์–ด๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

์˜ค๋ฅ˜ 3: ์ถ”๋ก ์ด ๋งค์šฐ ๋А๋ฆผ (์ดˆ๋‹น 5 ํ† ํฐ ๋ฏธ๋งŒ)

์ดˆ๋‹น 5 ํ† ํฐ ๋ฏธ๋งŒ์€ ๋ชจ๋ธ์ด CPU ์ „์šฉ์œผ๋กœ ์‹คํ–‰ ์ค‘์ด๊ฑฐ๋‚˜ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ VRAM์— ๋น„ํ•ด ๋ชจ๋ธ์ด ๋„ˆ๋ฌด ํฌ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. GPU์—์„œ 7B ๋ชจ๋ธ์€ 30โ€“80 tok/s๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ๋™์ผํ•œ ๋ชจ๋ธ์ด CPU์—์„œ๋Š” 3โ€“10 tok/s๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

  • GPU ํ™œ์„ฑ ์—ฌ๋ถ€ ํ™•์ธ: ๋ชจ๋ธ์ด ๋กœ๋“œ๋œ ์ƒํƒœ์—์„œ `ollama ps`๋ฅผ ์‹คํ–‰ํ•˜์‹ญ์‹œ์˜ค. ์ถœ๋ ฅ์— GPU ๋Œ€ CPU์˜ ๋ ˆ์ด์–ด ์ˆ˜๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ํฌ๊ธฐ ์ค„์ด๊ธฐ: CPU์—์„œ 13B ๋ชจ๋ธ์€ 3โ€“6 tok/s๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. 7B๋กœ ์ „ํ™˜ํ•˜๋ฉด ์†๋„๊ฐ€ ๋‘ ๋ฐฐ, 3B๋กœ ์ „ํ™˜ํ•˜๋ฉด ๋„ค ๋ฐฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • Ollama์—์„œ GPU ๋ ˆ์ด์–ด ๋Š˜๋ฆฌ๊ธฐ: `OLLAMA_GPU_LAYERS=999`๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋ชจ๋“  ๋ ˆ์ด์–ด๋ฅผ GPU๋กœ ๋ณด๋‚ด์‹ญ์‹œ์˜ค (Ollama๋Š” VRAM์— ๋งž๋Š” ์ˆ˜์ค€์œผ๋กœ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค).
  • ๋” ๋น ๋ฅธ ์–‘์žํ™” ์‚ฌ์šฉ: Q4_K_M์€ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ํ’ˆ์งˆ์„ ์œ ์ง€ํ•˜๋Š” ๊ฐ€์žฅ ๋น ๋ฅธ ์–‘์žํ™”์ž…๋‹ˆ๋‹ค. Q8_0์€ ํ’ˆ์งˆ์ด ๋” ๋†’์ง€๋งŒ ์•ฝ 30% ๋А๋ฆฝ๋‹ˆ๋‹ค.

Ollama์—์„œ GPU ๋ ˆ์ด์–ด ์„ค์ •

bash
# Ollama ์‹œ์ž‘ ์ „ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์„ค์ •
export OLLAMA_GPU_LAYERS=999
ollama serve

# ๋˜๋Š” Modelfile์—์„œ
FROM llama3.1:8b
PARAMETER num_gpu 999

์˜ค๋ฅ˜ 4: API ํ˜ธ์ถœ ์‹œ "์—ฐ๊ฒฐ ๊ฑฐ๋ถ€"

์—ฐ๊ฒฐ ๊ฑฐ๋ถ€๋Š” Ollama๊ฐ€ ์‹คํ–‰ ์ค‘์ด ์•„๋‹˜์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค โ€” `localhost:11434`์˜ API๋Š” ์„œ๋น„์Šค๊ฐ€ ํ™œ์„ฑ ์ƒํƒœ์ผ ๋•Œ๋งŒ ์‘๋‹ตํ•ฉ๋‹ˆ๋‹ค. API ํ˜ธ์ถœ ์ „์— ์„œ๋ฒ„๋ฅผ ์‹œ์ž‘ํ•˜์‹ญ์‹œ์˜ค.

bash
# Ollama ์ˆ˜๋™ ์‹œ์ž‘
ollama serve

# Linux โ€” systemd ์„œ๋น„์Šค ์žฌ์‹œ์ž‘
systemctl restart ollama

# ์‹คํ–‰ ์ค‘์ธ์ง€ ํ™•์ธ
curl http://localhost:11434
# ์˜ˆ์ƒ ๊ฒฐ๊ณผ: "Ollama is running"

์˜ค๋ฅ˜ 5: "๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์—†์Œ" ์˜ค๋ฅ˜

"๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์—†์Œ"์€ ๋ช…๋ น์˜ ๋ชจ๋ธ ์ด๋ฆ„์ด ๋‹ค์šด๋กœ๋“œ๋œ ๋ชจ๋ธ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. Ollama์˜ ๋ชจ๋ธ ์ด๋ฆ„์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜๋ฉฐ ๋ฒ„์ „ ํƒœ๊ทธ๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

bash
# ๋‹ค์šด๋กœ๋“œ๋œ ๋ชจ๋“  ๋ชจ๋ธ ๋‚˜์—ด
ollama list

# ๋ชจ๋ธ์ด ์—†๋Š” ๊ฒฝ์šฐ ๊ฐ€์ ธ์˜ค๊ธฐ
ollama pull llama3.2

# ์ •ํ™•ํ•œ ๋ชจ๋ธ ์ด๋ฆ„ ํ™•์ธ โ€” ํƒœ๊ทธ๊ฐ€ ์ค‘์š”ํ•จ
# "llama3.2"์™€ "llama3.2:3b"๋Š” ๋‹ค๋ฅธ ํ•ญ๋ชฉ

์˜ค๋ฅ˜ 6: ์†์ƒ๋œ ๋ชจ๋ธ ํŒŒ์ผ

์†์ƒ๋œ ๋ชจ๋ธ ํŒŒ์ผ์€ ์ค‘๋‹จ๋œ ๋‹ค์šด๋กœ๋“œ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค โ€” ์‚ญ์ œ ํ›„ ๋‹ค์‹œ ๊ฐ€์ ธ์™€์„œ ํ•ด๊ฒฐํ•˜์‹ญ์‹œ์˜ค. Ollama๊ฐ€ ํ•ญ์ƒ ๋ถ€๋ถ„ ๋‹ค์šด๋กœ๋“œ๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ์ง€ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

bash
# ์†์ƒ๋œ ๋ชจ๋ธ ์ œ๊ฑฐ
ollama rm llama3.2

# ๋‹ค์‹œ ๊ฐ€์ ธ์˜ค๊ธฐ
ollama pull llama3.2

# LM Studio์˜ ๊ฒฝ์šฐ: ๋ชจ๋ธ ํŒŒ์ผ์„ ์ˆ˜๋™์œผ๋กœ ์‚ญ์ œ
# ๊ธฐ๋ณธ ์œ„์น˜: ~/.cache/lm-studio/models/

์˜ค๋ฅ˜ 6b: LM Studio์—์„œ "๋ชจ๋ธ ํ•ด์„ ์‹คํŒจ"

"Failed to resolve model lmstudio-community/..."๋Š” LM Studio๊ฐ€ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์—์„œ ๋ชจ๋ธ์„ ์ฐพ์„ ์ˆ˜ ์—†์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” Hugging Face์˜ `lmstudio-community`์—์„œ ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ–ˆ์ง€๋งŒ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ ์ฐธ์กฐ๊ฐ€ ๋ณ€๊ฒฝ๋œ ๊ฒฝ์šฐ์— ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. LM Studio๊ฐ€ ๋” ์ด์ƒ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ํŒŒ์ผ๊ณผ ์ผ์น˜ํ•˜์ง€ ์•Š๋Š” ์บ์‹œ๋œ ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ ํ•ญ๋ชฉ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

  • LM Studio ์—ด๊ธฐ โ†’ My Models ํƒญ โ†’ ์‹คํŒจํ•œ ๋ชจ๋ธ์˜ ์  ์„ธ ๊ฐœ ๋ฉ”๋‰ด ํด๋ฆญ โ†’ "๋ชจ๋ธ ์‚ญ์ œ" ์„ ํƒ (ํŒŒ์ผ์€ ์œ ์ง€ํ•˜๊ณ  ๋ ˆ์ง€์ŠคํŠธ๋ฆฌ์—์„œ๋งŒ ์ œ๊ฑฐ)
  • ๋ชจ๋ธ ๋ธŒ๋ผ์šฐ์ €์—์„œ ๋™์ผํ•œ ๋ชจ๋ธ์„ ๊ฒ€์ƒ‰ํ•˜์—ฌ ๋‹ค์‹œ ๋‹ค์šด๋กœ๋“œ โ€” LM Studio๊ฐ€ ๋‹ค์‹œ ๋“ฑ๋กํ•ฉ๋‹ˆ๋‹ค
  • ๋Œ€์•ˆ: LM Studio ์ข…๋ฃŒ โ†’ `~/.cache/lm-studio/models/`๋กœ ์ด๋™ โ†’ ํŠน์ • ๋ชจ๋ธ ํด๋” ์‚ญ์ œ โ†’ ๋‹ค์‹œ ๋‹ค์šด๋กœ๋“œ
bash
# LM Studio ๋ชจ๋ธ ์บ์‹œ ์ˆ˜๋™ ์‚ญ์ œ (macOS/Linux)
rm -rf ~/.cache/lm-studio/models/lmstudio-community/<model-name>

์˜ค๋ฅ˜ 7: CUDA / ROCm ์ดˆ๊ธฐํ™” ์˜ค๋ฅ˜

CUDA ๋ฐ ROCm ์˜ค๋ฅ˜๋Š” ๋“œ๋ผ์ด๋ฒ„/๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ฒ„์ „ ๋ถˆ์ผ์น˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค โ€” ๋“œ๋ผ์ด๋ฒ„๋ฅผ ํ•„์š”ํ•œ ์ตœ์†Œ ๋ฒ„์ „์œผ๋กœ ์—…๋ฐ์ดํŠธํ•˜์‹ญ์‹œ์˜ค.

  • "CUDA ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „์ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์Œ": NVIDIA ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์‹ญ์‹œ์˜ค. llama.cpp์˜ ์ตœ์†Œ ์š”๊ตฌ ์‚ฌํ•ญ์€ CUDA 11.3 / ๋“œ๋ผ์ด๋ฒ„ 450.80์ž…๋‹ˆ๋‹ค.
  • "์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ปค๋„ ์ด๋ฏธ์ง€ ์—†์Œ": GPU ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. GTX 900 ์‹œ๋ฆฌ์ฆˆ(Maxwell) ์ดํ•˜๋Š” ์ตœ์‹  CUDA ๋นŒ๋“œ์—์„œ ์ง€์›๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • AMD ROCm "HSA_STATUS_ERROR_INVALID_ISA": Ollama ์‹œ์ž‘ ์ „ `HSA_OVERRIDE_GFX_VERSION=10.3.0`(RX 6000์˜ ๊ฒฝ์šฐ) ๋˜๋Š” `11.0.0`(RX 7000์˜ ๊ฒฝ์šฐ)์„ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค.
  • CUDA ๋ฒ„์ „ ํ™•์ธ: `nvcc --version` ๋˜๋Š” `nvidia-smi | grep CUDA`๋ฅผ ์‹คํ–‰ํ•˜์‹ญ์‹œ์˜ค.

์˜ค๋ฅ˜ 8: ๋น„์ •์ƒ, ๋ฐ˜๋ณต, ๋˜๋Š” ๋ฌด์˜๋ฏธํ•œ ์ถœ๋ ฅ

๋น„์ •์ƒ ์ถœ๋ ฅ์€ ๊ฑฐ์˜ ํ•ญ์ƒ Instruct/์ฑ„ํŒ… ๋ณ€ํ˜• ๋Œ€์‹  ๊ธฐ๋ณธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ชจ๋ธ์€ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์ด ์•„๋‹Œ ์›์‹œ ํ…์ŠคํŠธ ์™„์„ฑ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ๋ณธ ๋ชจ๋ธ(์˜ˆ: `llama3.1:8b`)์€ ๋Œ€ํ™”์šฉ์œผ๋กœ ํŒŒ์ธํŠœ๋‹๋˜์ง€ ์•Š์•˜์œผ๋ฉฐ, ์งˆ๋ฌธ์œผ๋กœ ํ”„๋กฌํ”„ํŠธํ•˜๋ฉด ํšก์„ค์ˆ˜์„ค์ฒ˜๋Ÿผ ๋ณด์ด๋Š” ์›์‹œ ์™„์„ฑ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ•ญ์ƒ Instruct ๋ณ€ํ˜•์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค: `llama3.1:8b-instruct`. ๋ชจ๋ธ ๋ณ€ํ˜•์„ ์ „ํ™˜ํ•˜๋Š” GUI ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์€ LM Studio ์„ค์น˜ ๋ฐฉ๋ฒ•์„ ์ฐธ๊ณ ํ•˜์‹ญ์‹œ์˜ค.

Ollama์—์„œ ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ธฐ๋ณธ ํƒœ๊ทธ๋Š” ์ด๋ฏธ Instruct ๋ณ€ํ˜•์„ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค. Hugging Face์—์„œ ์ˆ˜๋™์œผ๋กœ ๋‹ค์šด๋กœ๋“œํ•œ ๊ฒฝ์šฐ ํŒŒ์ผ ์ด๋ฆ„์— "Instruct" ๋˜๋Š” "chat"์ด ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

์˜ค๋ฅ˜ 9: "์ฃผ์†Œ๊ฐ€ ์ด๋ฏธ ์‚ฌ์šฉ ์ค‘" โ€” ํฌํŠธ ์ถฉ๋Œ

"์ฃผ์†Œ๊ฐ€ ์ด๋ฏธ ์‚ฌ์šฉ ์ค‘"์€ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์Šค๊ฐ€ ํฌํŠธ 11434(Ollama) ๋˜๋Š” 1234(LM Studio)๋ฅผ ์ ์œ ํ•˜๊ณ  ์žˆ์Œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ถฉ๋Œํ•˜๋Š” ํ”„๋กœ์„ธ์Šค๋ฅผ ์ฐพ์•„ ์ข…๋ฃŒํ•˜์‹ญ์‹œ์˜ค.

bash
# ํฌํŠธ 11434(Ollama)๋ฅผ ์‚ฌ์šฉ ์ค‘์ธ ํ”„๋กœ์„ธ์Šค ์ฐพ๊ธฐ
lsof -i :11434

# PID๋กœ ์ข…๋ฃŒ
kill -9 <PID>

# ๋˜๋Š” Ollama ํฌํŠธ ๋ณ€๊ฒฝ
export OLLAMA_HOST=0.0.0.0:11435
ollama serve

์˜ค๋ฅ˜ 10: ์‘๋‹ต ๋„์ค‘ ๋ชจ๋ธ ์ƒ์„ฑ ์ค‘๋‹จ

์‘๋‹ต ๋„์ค‘ ์ค‘๋‹จ์€ ์ปจํ…์ŠคํŠธ ๊ธธ์ด ํ•œ๊ณ„์— ๋„๋‹ฌํ•˜๊ฑฐ๋‚˜ `num_predict`๊ฐ€ ๋„ˆ๋ฌด ๋‚ฎ๊ฒŒ ์„ค์ •๋˜์–ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ๋งŽ์€ ์„ค์ •์—์„œ ๊ธฐ๋ณธ `num_predict`๋Š” 128 ํ† ํฐ์œผ๋กœ, 1~2 ๋ฌธ์žฅ์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค.

  • num_predict ๋Š˜๋ฆฌ๊ธฐ: ์ด ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์ƒ์„ฑํ•  ์ตœ๋Œ€ ํ† ํฐ ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ ์ข…์ข… 128์ž…๋‹ˆ๋‹ค. ๋Š˜๋ฆฌ๋Š” ๋ฐฉ๋ฒ•: Ollama์—์„œ Modelfile์— `PARAMETER num_predict 2048`์„ ์ถ”๊ฐ€ํ•˜์‹ญ์‹œ์˜ค.
  • ์ปจํ…์ŠคํŠธ ์ฐฝ ํ™•์ธ: ๋Œ€ํ™”๊ฐ€ ๋งค์šฐ ๊ธธ๋ฉด ๋ชจ๋ธ์ด ์ปจํ…์ŠคํŠธ ํ•œ๊ณ„์— ๋„๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒˆ ์„ธ์…˜์„ ์‹œ์ž‘ํ•˜๊ฑฐ๋‚˜ ๋” ํฐ ์ปจํ…์ŠคํŠธ ์ฐฝ์„ ๊ฐ€์ง„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค (Llama 3.2 3B๋Š” 128K๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค).
  • ์ค‘์ง€ ํ† ํฐ ํ™•์ธ: ์ผ๋ถ€ Modelfile์—๋Š” ์ƒ์„ฑ์„ ์ผ์ฐ ์ข…๋ฃŒํ•˜๋Š” ์ค‘์ง€ ์‹œํ€€์Šค๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ์ค‘์ง€ ํŒจํ„ด์„ ์œ„ํ•ด ์‹œ์Šคํ…œ ํ”„๋กฌํ”„ํŠธ์™€ ํ…œํ”Œ๋ฆฟ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

๊ด€๋ จ ์ž๋ฃŒ

์ถ”๊ฐ€ ๋„์›€ ๋ฐ›๋Š” ๊ณณ

๋…ธํŠธ๋ถ์˜ ํ•˜๋“œ์›จ์–ด ํŠนํ™” ๋ฌธ์ œ(์—ด ์ œํ•œ, ๋ฐฐํ„ฐ๋ฆฌ ์†Œ๋ชจ)๋Š” ๋…ธํŠธ๋ถ์—์„œ ๋กœ์ปฌ LLM ์‹คํ–‰ํ•˜๊ธฐ๋ฅผ ์ฐธ๊ณ ํ•˜์‹ญ์‹œ์˜ค. ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ ์ •๋ณด ์„ค์ • ์งˆ๋ฌธ์€ ๋กœ์ปฌ LLM ๋ณด์•ˆ ๋ฐ ๊ฐœ์ธ ์ •๋ณด ์ฒดํฌ๋ฆฌ์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์‹ญ์‹œ์˜ค. Ollama GitHub ์ด์Šˆ ํŽ˜์ด์ง€(github.com/ollama/ollama/issues)์™€ r/LocalLLaMA ์„œ๋ธŒ๋ ˆ๋”ง์€ ๋ชจ๋ธ๋ณ„ ๋ฒ„๊ทธ์— ๋Œ€ํ•œ ๊ฐ€์žฅ ํ™œ๋ฐœํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ž์›์ž…๋‹ˆ๋‹ค.

๋กœ์ปฌ LLM ๋ฌธ์ œ ํ•ด๊ฒฐ์˜ ํ”ํ•œ ์‹ค์ˆ˜

  • OOM ์˜ค๋ฅ˜๋ฅผ ํ•˜๋“œ์›จ์–ด ์˜ค๋ฅ˜๋กœ ํ˜ผ๋™ โ€” ์ด ์˜ค๋ฅ˜๋Š” ๋ชจ๋ธ์— ๋น„ํ•ด RAM์ด ๋„ˆ๋ฌด ์ž‘๋‹ค๋Š” ์˜๋ฏธ์ด์ง€ ํ•˜๋“œ์›จ์–ด๊ฐ€ ๊ณ ์žฅ๋‚ฌ๋‹ค๋Š” ๊ฒŒ ์•„๋‹™๋‹ˆ๋‹ค. ํ•ด๊ฒฐ์ฑ…: Q4_K_M ์–‘์žํ™” ๋˜๋Š” ๋” ์ž‘์€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.
  • ์‹œ์Šคํ…œ ๋ถ€ํ•˜ ํ™•์ธ ๋ฏธํก โ€” ๋‹ค๋ฅธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์ด CPU/GPU๋ฅผ ์†Œ๋น„ํ•  ๋•Œ ์ถ”๋ก  ์†๋„๊ฐ€ ํฌ๊ฒŒ ์ €ํ•˜๋ฉ๋‹ˆ๋‹ค. ๋ฒค์น˜๋งˆํฌ ์ „ ๋ธŒ๋ผ์šฐ์ €, ๋น„๋””์˜ค ํ”Œ๋ ˆ์ด์–ด, ๋ฐฑ๊ทธ๋ผ์šด๋“œ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ข…๋ฃŒํ•˜์‹ญ์‹œ์˜ค.
  • ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „ ๋น„ํ˜ธํ™˜์„ฑ ๋ฌด์‹œ โ€” NVIDIA CUDA๋Š” CUDA ๋ฆด๋ฆฌ์Šค๋ณ„๋กœ ํŠน์ • ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. `nvidia-smi` ์ถœ๋ ฅ์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. CUDA 11.x๋ฅผ ์œ„ํ•œ ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „์€ โ‰ฅ450.80์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Ollama์—์„œ ์ž˜๋ชป๋œ ๋ชจ๋ธ ์ด๋ฆ„ ์‚ฌ์šฉ โ€” `llama3.2`์™€ `llama3.2:3b`๋Š” ๋‹ค๋ฅธ Ollama ํƒœ๊ทธ์ž…๋‹ˆ๋‹ค. `ollama list`๋ฅผ ์‹คํ–‰ํ•˜์—ฌ ๋‹ค์šด๋กœ๋“œ๋œ ๋ชจ๋ธ์˜ ์ •ํ™•ํ•œ ์ด๋ฆ„์„ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.
  • ๋“œ๋ผ์ด๋ฒ„ ์—…๋ฐ์ดํŠธ ํ›„ Ollama ์žฌ์‹œ์ž‘ ๋ฏธํก โ€” Ollama๋Š” ์‹œ์ž‘ ์‹œ GPU๋ฅผ ๊ฐ์ง€ํ•ฉ๋‹ˆ๋‹ค. NVIDIA ๋˜๋Š” ROCm ๋“œ๋ผ์ด๋ฒ„๋ฅผ ์—…๋ฐ์ดํŠธํ•œ ํ›„ GPU๋ฅผ ๋‹ค์‹œ ๊ฐ์ง€ํ•˜๋„๋ก Ollama๋ฅผ ์™„์ „ํžˆ ์žฌ์‹œ์ž‘ํ•˜์‹ญ์‹œ์˜ค (`ollama serve`).
5๋‹จ๊ณ„ ๋กœ์ปฌ LLM ๋””๋ฒ„๊ทธ ํ”„๋กœ์„ธ์Šค: RAM ํ™•์ธ โ†’ GPU ํ™•์ธ โ†’ ์„œ๋ฒ„ ํ™•์ธ โ†’ ๋ชจ๋ธ ํ™•์ธ โ†’ ์ถœ๋ ฅ ํ’ˆ์งˆ ํ™•์ธ. ์ฒซ ๋ฒˆ์งธ ์‹คํŒจ ๋‹จ๊ณ„์—์„œ ์ค‘๋‹จํ•˜์‹ญ์‹œ์˜ค.
5๋‹จ๊ณ„ ๋กœ์ปฌ LLM ๋””๋ฒ„๊ทธ ํ”„๋กœ์„ธ์Šค: RAM ํ™•์ธ โ†’ GPU ํ™•์ธ โ†’ ์„œ๋ฒ„ ํ™•์ธ โ†’ ๋ชจ๋ธ ํ™•์ธ โ†’ ์ถœ๋ ฅ ํ’ˆ์งˆ ํ™•์ธ. ์ฒซ ๋ฒˆ์งธ ์‹คํŒจ ๋‹จ๊ณ„์—์„œ ์ค‘๋‹จํ•˜์‹ญ์‹œ์˜ค.

์ถœ์ฒ˜

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both โ€” you pick the backend.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

๋กœ์ปฌ LLM ์˜ค๋ฅ˜ ์ˆ˜์ •: OOM, GPU ๊ฐ์ง€, ํฌํŠธ 11434 | PromptQuorum