Skip to main content
PromptQuorumPromptQuorum
ํ™ˆ/๊ณ ๊ธ‰ ๋กœ์ปฌ LLM/Qwen ๋กœ์ปฌ ๋ฐฐํฌ ์™„์ „ ๊ฐ€์ด๋“œ 2026: ํ”„๋กœ๋•์…˜ ์„œ๋ฒ„ ๊ตฌ์ถ•
Overview & Reference

Qwen ๋กœ์ปฌ ๋ฐฐํฌ ์™„์ „ ๊ฐ€์ด๋“œ 2026: ํ”„๋กœ๋•์…˜ ์„œ๋ฒ„ ๊ตฌ์ถ•

ยท16๋ถ„ ๋ถ„๋Ÿ‰ยทHans Kuepper ์ € ยท PromptQuorum ์ฐฝ๋ฆฝ์ž, ๋ฉ€ํ‹ฐ ๋ชจ๋ธ AI ๋””์ŠคํŒจ์น˜ ๋„๊ตฌ ยท PromptQuorum

Qwen 7B ๋ฐ 14B๋Š” Ollama ๋˜๋Š” vLLM๊ณผ Docker Compose API ์„œ๋ฒ„๋ฅผ ํ†ตํ•ด ์†Œ๋น„์ž์šฉ GPU์—์„œ ์•ˆ์ •์ ์œผ๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. Qwen 32B๋Š” RTX 4090 24 GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Qwen 72B๋Š” ๋“€์–ผ GPU, 128 GB ์ด์ƒ RAM์˜ CPU ์ถ”๋ก , ๋˜๋Š” ํด๋ผ์šฐ๋“œ ๋Œ€์•ˆ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค โ€” self-hosting ๋น„์šฉ์€ ํ•˜๋“œ์›จ์–ด ๊ฐ๊ฐ€์ƒ๊ฐ ๊ธฐ์ค€ ํ•˜๋ฃจ $0.05~$0.12์ด๋ฉฐ, RunPod๋Š” ์‹œ๊ฐ„๋‹น $0.50~$1.20์ž…๋‹ˆ๋‹ค.

์ด ํŽ˜์ด์ง€์—๋Š” ํƒ€์‚ฌ ์ œํ’ˆ์— ๋Œ€ํ•œ ์ฐธ์กฐ ๋งํฌ๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. PromptQuorum์€ ์–ด๋–ค ์ œํœด ํ”„๋กœ๊ทธ๋žจ์—๋„ ๋“ฑ๋ก๋˜์–ด ์žˆ์ง€ ์•Š์Šต๋‹ˆ๋‹ค โ€” ์ด๋Š” ์ˆ˜์ˆ˜๋ฃŒ๊ฐ€ ๋ฐœ์ƒํ•˜์ง€ ์•Š๋Š” ์ผ๋ฐ˜ ๋งํฌ์ž…๋‹ˆ๋‹ค. ๋งํฌ ํด๋ฆญ ๋ฐ ์ดํ›„ ๋‹จ๊ณ„๋Š” ์ „์ ์œผ๋กœ ๊ท€ํ•˜์˜ ์ฑ…์ž„์ž…๋‹ˆ๋‹ค. ์ด ๋งํฌ๋Š” PromptQuorum์˜ ์–ด๋– ํ•œ ๋ณด์ฆ์ด๋‚˜ ๊ฒ€์ฆ์„ ๋‚˜ํƒ€๋‚ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ์š”์ 

  • Qwen3 7B์™€ 14B๋Š” ์†Œ๋น„์ž์šฉ GPU ๋ชฉํ‘œ โ€” VRAM ๊ฐ๊ฐ 8 GB, 16 GB, Docker์—์„œ Ollama๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅ
  • Qwen3 32B๋Š” RTX 4090 24 GB๊ฐ€ ํ•„์š”ํ•˜๋ฉฐ, ๋Œ€๋ถ€๋ถ„์˜ ํŒ€์—์„œ ๋‹จ์ผ ์นด๋“œ ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ ์ตœ๋Œ€ ๊ทœ๋ชจ์ž…๋‹ˆ๋‹ค
  • Qwen3 72B๋Š” RTX 4090 ๋‘ ์žฅ, ๋Œ€์šฉ๋Ÿ‰ RAM(128 GB ์ด์ƒ DDR5)์˜ CPU ๋นŒ๋“œ, ๋˜๋Š” ํด๋ผ์šฐ๋“œ ๋Œ€์—ฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค โ€” self-hosting ๋น„์šฉ์€ ๊ฐ๊ฐ€์ƒ๊ฐ ๊ธฐ์ค€ ํ•˜๋ฃจ ์•ฝ $0.05~0.12
  • Ollama + Open WebUI + Nginx๋กœ ๊ตฌ์„ฑ๋œ Docker Compose ์Šคํƒ์€ 10๋ถ„ ์ด๋‚ด์— OpenAI ํ˜ธํ™˜ API๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค
  • Qwen ์ƒ์‹œ ๊ฐ€๋™ ์„œ๋ฒ„: Minisforum UM890 Pro ($429, Qwen3 7B CPU ์‹คํ–‰) ๋˜๋Š” AOOSTAR GEM12 Pro OCuLink + RTX 4060 Ti 16 GB (์ด ์•ฝ $800)
  • ํด๋ผ์šฐ๋“œ ๋Œ€์•ˆ: RunPod A40 48 GB ($0.44/์‹œ๊ฐ„)์œผ๋กœ Qwen3 72B ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ โ€” RTX 4090 ๋‘ ์žฅ ๊ตฌ๋งค๋ณด๋‹ค ๋น„์ •๊ธฐ ์‚ฌ์šฉ ์‹œ ์ €๋ ด
  • ์ด ๊ฐ€์ด๋“œ๋Š” ํ”„๋กœ๋•์…˜ ๋ฐฐํฌ๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, Ollama ๊ธฐ์ดˆ ์„ค์ •์€ Qwen ์ž…๋ฌธ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์‹ญ์‹œ์˜ค

๐Ÿ“ ํ•œ ๋ฌธ์žฅ์œผ๋กœ

Docker Compose ์Šคํƒ์œผ๋กœ Qwen์„ ํ”„๋กœ๋•์…˜์— ๋ฐฐํฌํ•˜๋ฉด Ollama๊ฐ€ ์ถ”๋ก  ๋ฐฑ์—”๋“œ๋กœ ๋™์ž‘ํ•˜๋ฉฐ OpenAI ํ˜ธํ™˜ API ์—”๋“œํฌ์ธํŠธ๊ฐ€ ๋…ธ์ถœ๋ฉ๋‹ˆ๋‹ค.

๐Ÿ’ฌ ์‰ฝ๊ฒŒ ๋งํ•˜๋ฉด

๋งค๋ฒˆ ์ˆ˜๋™์œผ๋กœ Qwen์„ ์‹คํ–‰ํ•˜๋Š” ๋Œ€์‹ , Docker๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ํ•ญ์ƒ ์ผœ์ ธ ์žˆ๊ณ  ์š”์ฒญ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” ์˜๊ตฌ ์„œ๋ฒ„๋ฅผ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค โ€” ChatGPT API๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์ผํ•˜์ง€๋งŒ ์ž์‹ ์˜ ํ•˜๋“œ์›จ์–ด์—์„œ ํ† ํฐ ๋น„์šฉ ์—†์ด ์šด์˜๋ฉ๋‹ˆ๋‹ค.

Qwen ๋ชจ๋ธ๋ณ„ ํ•˜๋“œ์›จ์–ด ์„ฑ๋Šฅ โ€” 2026๋…„ 5์›”

GPU ๋ธŒ๋žœ๋“œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋ชจ๋ธ ํฌ๊ธฐ์— ๋งž๋Š” ํ•˜๋“œ์›จ์–ด๋ฅผ ์„ ํƒํ•˜์‹ญ์‹œ์˜ค. VRAM์ด ์ฃผ์š” ์ œ์•ฝ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ๋งž์ง€ ์•Š์œผ๋ฉด GPU ์†๋„๋กœ ์‹คํ–‰๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ๋Š” Ollama ๋ฐฐํฌ์— ์ตœ์  ํ’ˆ์งˆ-ํฌ๊ธฐ ๋น„์œจ์ธ Q4_K_M ์–‘์žํ™”๋กœ ์ธก์ •ํ•œ ์ถ”๋ก  ์†๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

๋ชจ๋ธVRAM (Q4_K_M)์ตœ์†Œ GPU์†๋„ (tok/s)CPU ๋Œ€์ฒดํ”„๋กœ๋•์…˜ ์ค€๋น„
Qwen3 7B5.2 GBRTX 3060 12 GB22โ€“28 tok/s๊ฐ€๋Šฅ (RAM 32 GB, ์•ฝ 4 tok/s)๊ฐ€๋Šฅ โ€” ๋‹จ์ผ GPU
Qwen3 14B9.4 GBRTX 4060 Ti 16 GB15โ€“20 tok/s๊ฐ€๋Šฅ (RAM 64 GB, ์•ฝ 2.5 tok/s)๊ฐ€๋Šฅ โ€” ๋‹จ์ผ GPU
Qwen3 32B20.1 GBRTX 4090 24 GB10โ€“14 tok/s์ œํ•œ์  (RAM 128 GB, ์•ฝ 1.2 tok/s)๊ฐ€๋Šฅ โ€” ๋‹จ์ผ GPU
Qwen3-Coder 32B19.8 GBRTX 4090 24 GB10โ€“13 tok/s์ œํ•œ์  (RAM 128 GB)๊ฐ€๋Šฅ โ€” ๋‹จ์ผ GPU
Qwen3 72B43.5 GBRTX 4090 ๋‘ ์žฅ (ํ•ฉ๊ณ„ 48 GB)5โ€“8 tok/s๋А๋ฆผ (RAM 128 GB, ์•ฝ 0.6 tok/s)Multi-GPU ๋˜๋Š” ํด๋ผ์šฐ๋“œ๋งŒ ๊ฐ€๋Šฅ

PCIe Gen 4 ์‹œ์Šคํ…œ ์ธก์ • ๊ธฐ์ค€. NVLink๋Š” ์ง€์› ์นด๋“œ์˜ ๋“€์–ผ GPU ๊ตฌ์„ฑ์—์„œ ์„ฑ๋Šฅ์„ ์•ฝ 15% ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. RunPod A100 80 GB ๋‹จ์ผ ์นด๋“œ์—์„œ Qwen3 72B Q4_K_M: 18โ€“22 tok/s.

Docker API ์„œ๋ฒ„ ์„ค์ • โ€” Ollama + Open WebUI + Nginx

๊ฐ€์žฅ ๋น ๋ฅธ Qwen ํ”„๋กœ๋•์…˜ ์Šคํƒ์€ ์„ธ ๊ฐ€์ง€ ์ปจํ…Œ์ด๋„ˆ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค: Ollama(์ถ”๋ก ), Open WebUI(UI), Nginx(๋ฆฌ๋ฒ„์Šค ํ”„๋ก์‹œ + ์ธ์ฆ). ์ด ์„ค์ •์€ 10๋ถ„ ์ด๋‚ด์— ์™„๋ฃŒ๋˜๋ฉฐ http://your-server:11434/v1์— ์˜๊ตฌ์ ์ธ OpenAI ํ˜ธํ™˜ API๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค.

  1. 1
    Docker ๋ฐ Docker Compose๋ฅผ ์„ค์น˜ํ•ฉ๋‹ˆ๋‹ค
    Why it matters: ์ปจํ…Œ์ด๋„ˆ๋Š” Qwen์„ ์šด์˜ ์ฒด์ œ์™€ ๊ฒฉ๋ฆฌํ•ฉ๋‹ˆ๋‹ค โ€” Python ํ™˜๊ฒฝ ์ถฉ๋Œ ์—†์Œ, ์—…๋ฐ์ดํŠธ ์šฉ์ด.
  2. 2
    Ollama + Open WebUI ์„œ๋น„์Šค๊ฐ€ ํฌํ•จ๋œ docker-compose.yml์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค
    Why it matters: Compose ํŒŒ์ผ์€ GPU ํŒจ์Šค์Šค๋ฃจ, ํฌํŠธ ๋งคํ•‘, ์žฌ์‹œ์ž‘ ์ •์ฑ…์„ ํ•œ ๊ณณ์—์„œ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  3. 3
    Ollama ์ปจํ…Œ์ด๋„ˆ ํ™˜๊ฒฝ์—์„œ OLLAMA_HOST=0.0.0.0์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค
    Why it matters: ์ด ์„ค์ • ์—†์ด๋Š” Ollama๊ฐ€ localhost์—์„œ๋งŒ ์ˆ˜์‹ ํ•˜๋ฉฐ ๋‹ค๋ฅธ ์ปจํ…Œ์ด๋„ˆ๋‚˜ ํ˜ธ์ŠคํŠธ์˜ API ์š”์ฒญ์„ ๋ฐ›์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  4. 4
    Qwen ๋ชจ๋ธ์„ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค: docker exec ollama ollama pull qwen3:7b
    Why it matters: ๋ชจ๋ธ์€ Docker ๋ณผ๋ฅจ์— ์ €์žฅ๋˜์–ด ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘ ์‹œ์—๋„ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค.
  5. 5
    ๊ณต๊ฐœ ๋ฐฐํฌ๋ฅผ ์œ„ํ•ด ๊ธฐ๋ณธ ์ธ์ฆ์ด ํฌํ•จ๋œ Nginx๋ฅผ API ๊ฒŒ์ดํŠธ์›จ์ด๋กœ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค
    Why it matters: ์ธ์ฆ ์—†์ด Ollama๋ฅผ ์ธํ„ฐ๋„ท์— ์ง์ ‘ ๋…ธ์ถœํ•˜๋ฉด ๋ˆ„๊ตฌ๋‚˜ ๊ท€ํ•˜์˜ GPU์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  6. 6
    ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘ ์ •์ฑ…์„ unless-stopped๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค
    Why it matters: ์ด๋ฅผ ํ†ตํ•ด Qwen ์„œ๋ฒ„๊ฐ€ ์‹œ์Šคํ…œ ์žฌ์‹œ์ž‘ ํ›„์—๋„ ์œ ์ง€๋ฉ๋‹ˆ๋‹ค โ€” ์ƒ์‹œ ๊ฐ€๋™ mini PC ๋ฐฐํฌ์— ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
yaml
version: "3.8"
services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    environment:
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_KEEP_ALIVE=-1
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - open_webui_data:/app/backend/data
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:

Qwen3 72B๋ฅผ ์œ„ํ•œ Multi-GPU ๊ตฌ์„ฑ

Q4_K_M์˜ Qwen3 72B๋Š” VRAM 43.5 GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค โ€” RTX 4090(24 GB) ํ•œ ์žฅ์œผ๋กœ๋Š” ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. RTX 4090 ๋‘ ์žฅ(ํ•ฉ๊ณ„ 48 GB) ๋˜๋Š” ์ „๋ฌธ๊ฐ€์šฉ ์นด๋“œ(A100 80 GB, H100 80 GB)๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. Ollama๋Š” Multi-GPU ๋ถ„์‚ฐ์„ ๋„ค์ดํ‹ฐ๋ธŒ๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉฐ ์ฝ”๋“œ ๋ณ€๊ฒฝ์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.

  • Ollama๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋“  GPU์— ์ž๋™์œผ๋กœ ๋ชจ๋ธ์„ ๋ถ„์‚ฐํ•ฉ๋‹ˆ๋‹ค โ€” compose ํ™˜๊ฒฝ์—์„œ CUDA_VISIBLE_DEVICES=0,1๋กœ ํŠน์ • ์นด๋“œ๋ฅผ ์ง€์ •ํ•˜์‹ญ์‹œ์˜ค
  • RTX 4090 ๋‘ ์žฅ์˜ ๊ฒฝ์šฐ, ๋‘ ์นด๋“œ ๋ชจ๋‘ ๋™์ผํ•œ PCIe ๋Œ€์—ญํญ ๋ ˆ๋ฒจ์— ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค โ€” PCIe Gen 4 x8 ์Šฌ๋กฏ ๋‘ ๊ฐœ๊ฐ€ ์žˆ๋Š” B650 ๋˜๋Š” Z790 ๋ฉ”์ธ๋ณด๋“œ๊ฐ€ ์ตœ์†Œ ์š”๊ตฌ์‚ฌํ•ญ์ž…๋‹ˆ๋‹ค
  • RTX 4090 ๋‘ ์žฅ ๊ฐ„์˜ NVLink๋Š” ์†Œ๋น„์ž ์นด๋“œ์—์„œ NVIDIA ๊ณต์‹ ์ง€์›์ด ์—†์ง€๋งŒ, Founders Edition RTX 4090 ์Œ์—์„œ ์„œ๋“œํŒŒํ‹ฐ NVLink ๋ธŒ๋ฆฌ์ง€๋ฅผ ํ†ตํ•ด ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค โ€” ์•ฝ 15% ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • vLLM์€ ํ…์„œ ๋ณ‘๋ ฌ์„ฑ์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์ฒด ์ถ”๋ก  ์—”์ง„์œผ๋กœ, Multi-GPU ํ™œ์šฉ ํšจ์œจ์ด ๋” ๋†’์Šต๋‹ˆ๋‹ค โ€” ๋™์‹œ ์š”์ฒญ 100๊ฐœ ์ด์ƒ์˜ ์ง€์† 70B ์ถ”๋ก  ๋ถ€ํ•˜์—์„œ๋Š” Ollama ๋Œ€์‹  vLLM์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค
  • Qwen3 72B๋ฅผ ๋น„์ •๊ธฐ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, RunPod A40 48 GB($0.44/์‹œ๊ฐ„)๊ฐ€ RTX 4090 ๋‘ ์žฅ ๋นŒ๋“œ($3,800+)๋ณด๋‹ค ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค
bash
# vLLM multi-GPU alternative (better for high-traffic 72B)
docker run --gpus all   -p 8000:8000   -e VLLM_WORKER_MULTIPROC_METHOD=spawn   vllm/vllm-openai:latest   --model Qwen/Qwen3-72B-Instruct   --tensor-parallel-size 2   --max-model-len 32768   --quantization awq

ํ”„๋กœ๋•์…˜ API ์„ค์ •

Ollama์˜ API๋Š” /v1์—์„œ OpenAI์™€ ํ˜ธํ™˜๋ฉ๋‹ˆ๋‹ค โ€” ChatGPT API๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๋ชจ๋“  ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€ ๊ธฐ๋ณธ URL๋งŒ ๋ณ€๊ฒฝํ•˜๋ฉด ๋กœ์ปฌ Qwen ๋ฐฐํฌ์—์„œ ๋ฐ”๋กœ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ํ”„๋กœ๋•์…˜ ๋™์ž‘์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ฃผ์š” ํ™˜๊ฒฝ ๋ณ€์ˆ˜:

  • OLLAMA_KEEP_ALIVE=-1 โ€” ๋น„ํ™œ์„ฑ ํ›„ ๋ชจ๋ธ์ด ์–ธ๋กœ๋“œ๋˜์ง€ ์•Š๋„๋ก ํ•ฉ๋‹ˆ๋‹ค (๊ธฐ๋ณธ๊ฐ’์€ 5๋ถ„์œผ๋กœ, ์„œ๋ฒ„ ๋ฐฐํฌ์—์„œ๋Š” ์น˜๋ช…์ )
  • OLLAMA_NUM_PARALLEL=4 โ€” ์ตœ๋Œ€ 4๊ฐœ์˜ ๋™์‹œ ์ถ”๋ก  ์š”์ฒญ์„ ํ—ˆ์šฉํ•ฉ๋‹ˆ๋‹ค. VRAM ์—ฌ์œ ๊ฐ€ ์žˆ๋‹ค๋ฉด ๋Š˜๋ฆฌ์‹ญ์‹œ์˜ค
  • OLLAMA_MAX_LOADED_MODELS=1 โ€” ์†Œํ˜• GPU ๋นŒ๋“œ์—์„œ ์Šค๋ž˜์‹ฑ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•ด VRAM์— ๋ชจ๋ธ ํ•˜๋‚˜๋งŒ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค
  • OLLAMA_FLASH_ATTENTION=1 โ€” NVIDIA Ampere/Ada GPU(RTX 3060 ์ด์ƒ)์—์„œ flash attention์„ ํ™œ์„ฑํ™”ํ•˜์—ฌ 20โ€“30% ์†๋„ ํ–ฅ์ƒ
  • OLLAMA_GPU_OVERHEAD=512 โ€” OS ๋ฐ ๋“œ๋ผ์ด๋ฒ„ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ์œ„ํ•ด VRAM 512 MB๋ฅผ ์˜ˆ์•ฝํ•ฉ๋‹ˆ๋‹ค. ์ •ํ™•ํžˆ 8 GB ๋˜๋Š” 16 GB ์นด๋“œ์—์„œ OOM ํฌ๋ž˜์‹œ๋ฅผ ์ค„์—ฌ์ค๋‹ˆ๋‹ค

โš ๏ธWarning: OLLAMA_KEEP_ALIVE=0์ด๊ฑฐ๋‚˜ ์„ค์ •ํ•˜์ง€ ์•Š์œผ๋ฉด ๊ฐ ์š”์ฒญ ํ›„ ๋ชจ๋ธ์ด ์–ธ๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค. ์ผ์‹œ ์ค‘์ง€ ํ›„ ์ฒซ ๋ฒˆ์งธ ์š”์ฒญ์€ ๋ชจ๋ธ ์žฌ๋กœ๋”ฉ์— 10โ€“30์ดˆ๊ฐ€ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค. API ์„œ๋ฒ„ ๋ฐฐํฌ์—์„œ๋Š” ํ•ญ์ƒ OLLAMA_KEEP_ALIVE=-1์„ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค.

๋น„์šฉ ๋น„๊ต: self-hosted vs Alibaba Cloud vs RunPod

ํ•˜๋ฃจ 4์‹œ๊ฐ„ ์ด์ƒ์˜ ์ง€์†์ ์ธ ์ถ”๋ก  ๋ถ€ํ•˜์—์„œ๋Š” self-hosting์ด ํด๋ผ์šฐ๋“œ๋ณด๋‹ค ์œ ๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋ฃจ 4์‹œ๊ฐ„ ๋ฏธ๋งŒ์—์„œ๋Š” ํ•˜๋“œ์›จ์–ด ๊ฐ๊ฐ€์ƒ๊ฐ ํ›„ ํด๋ผ์šฐ๋“œ GPU ๋Œ€์—ฌ๊ฐ€ ๋” ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ ํ‘œ๋Š” self-hosted ๋นŒ๋“œ์— 3๋…„ ํ•˜๋“œ์›จ์–ด ๊ฐ๊ฐ€์ƒ๊ฐ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.

์˜ต์…˜Qwen3 7B ํ•˜๋ฃจ ๋น„์šฉQwen3 72B ํ•˜๋ฃจ ๋น„์šฉ์ดˆ๊ธฐ ๋น„์šฉ์ตœ์  ์šฉ๋„
Self-hosted: mini PC RTX 3060 12 GB$0.03 (์ „๊ธฐ๋ฃŒ๋งŒ)ํ•ด๋‹น ์—†์Œ (์šฉ๋Ÿ‰ ๋ถ€์กฑ)์™„์ „ํ•œ ๋นŒ๋“œ $600โ€“900์ƒ์‹œ 7B ์ถ”๋ก , ๊ฐ€์ •/์‚ฌ๋ฌด์‹ค ์„œ๋ฒ„
Self-hosted: ์›Œํฌ์Šคํ…Œ์ด์…˜ RTX 4090$0.05ํ•ด๋‹น ์—†์Œ (๋‹จ์ผ GPU)์™„์ „ํ•œ ๋นŒ๋“œ $2,500โ€“4,000์ตœ๋Œ€ 32B ์ถ”๋ก , ์›Œํฌ์Šคํ…Œ์ด์…˜ ์ „์šฉ ์‚ฌ์šฉ
Self-hosted: RTX 4090 ๋‘ ์žฅ$0.08$0.12์™„์ „ํ•œ ๋นŒ๋“œ $5,000โ€“7,00072B ์ƒ์‹œ ๊ฐ€๋™, ์›Œํฌ์Šคํ…Œ์ด์…˜ ๋ณ‘ํ–‰ ์‚ฌ์šฉ
RunPod A40 48 GB ($0.44/์‹œ๊ฐ„)$0.44 (1์‹œ๊ฐ„)$0.44 (1์‹œ๊ฐ„)์ดˆ๊ธฐ ๋น„์šฉ $0, ์‹œ๊ฐ„์ œ ์ง€๋ถˆ๋น„์ •๊ธฐ 72B ์‚ฌ์šฉ, ํ…Œ์ŠคํŠธ, ํ•˜๋“œ์›จ์–ด ํˆฌ์ž ์—†์Œ
Alibaba Cloud PAI (GPU A10)$0.50โ€“0.80/์‹œ๊ฐ„$1.20โ€“2.00/์‹œ๊ฐ„ (A100)์ดˆ๊ธฐ ๋น„์šฉ $0 + ์‹ ๊ทœ ๊ณ„์ • ํฌ๋ ˆ๋”ง $50Qwen ์ตœ์ ํ™” ์ถ”๋ก  ํ™˜๊ฒฝ, Alibaba Cloud ์ƒํƒœ๊ณ„
Vast.ai RTX 4090 ์ŠคํŒŸ ($0.20โ€“0.35/์‹œ๊ฐ„)$0.20โ€“0.35/์‹œ๊ฐ„ํ•ด๋‹น ์—†์Œ์ดˆ๊ธฐ ๋น„์šฉ $0์ €๋ ดํ•œ ๋น„์ •๊ธฐ ์‚ฌ์šฉ, ์ค‘๋‹จ ์œ„ํ—˜ ํ—ˆ์šฉ ๊ฐ€๋Šฅ

Qwen ์ƒ์‹œ ๊ฐ€๋™ ์„œ๋ฒ„ ํ•˜๋“œ์›จ์–ด ์ถ”์ฒœ

API ์„œ๋ฒ„๋กœ Qwen3 7B๋ฅผ 24/7 ์‹คํ–‰ํ•˜๋Š” mini PC๋Š” ์ „๊ธฐ๋ฃŒ๊ฐ€ ์›” $0.50โ€“1.50 โ€” ์–ด๋–ค ํด๋ผ์šฐ๋“œ ๋Œ€์•ˆ๋ณด๋‹ค ํ›จ์”ฌ ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๊ฐ€์ง€ mini PC ๋นŒ๋“œ๊ฐ€ ๋Œ€๋ถ€๋ถ„์˜ Qwen ์ƒ์‹œ ๊ฐ€๋™ ์‚ฌ์šฉ ์‚ฌ๋ก€๋ฅผ ์ปค๋ฒ„ํ•ฉ๋‹ˆ๋‹ค:

  • ์ €๋ ดํ•œ ์˜ต์…˜ (Qwen3 7B CPU ์ถ”๋ก ): Minisforum UM890 Pro โ€” AMD Ryzen 9 8945HS, 32 GB DDR5, 512 GB NVMe. ์‹ ํ’ˆ ์•ฝ $429. Qwen3 7B๋Š” Ollama CPU ๋ฐฑ์—”๋“œ๋กœ 3โ€“5 tok/s ์‹คํ–‰. ๊ฐœ์ธ ์–ด์‹œ์Šคํ„ดํŠธ ๋ฐ ๋ฌธ์„œ ์š”์•ฝ์— ์ ํ•ฉ. ์œ ํœด ์‹œ 12W, ๋ถ€ํ•˜ ์‹œ 45W. ๋งค์šฐ ์กฐ์šฉํ•จ. ๋ฏธ๊ตญ/EU ์ฐฝ๊ณ ์—์„œ ๋ฐฐ์†ก ๊ฐ€๋Šฅ.
  • ์ถ”์ฒœ ์˜ต์…˜ (GPU Qwen3 14B): AOOSTAR GEM12 Pro OCuLink โ€” OCuLink ํฌํŠธ๋ฅผ ํ†ตํ•ด ์™ธ๋ถ€ GPU ์ง€์›. eGPU ์ธํด๋กœ์ €์˜ RTX 4060 Ti 16 GB์™€ ๊ฒฐํ•ฉ (GPU ์•ฝ $340 + ์ธํด๋กœ์ € $100). ์ด ์•ฝ $800. Qwen3 14B๋ฅผ 16โ€“18 tok/s๋กœ ์‹คํ–‰. ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‚ฌ์šฉ ์‹œ CPU ๋Œ€์ฒด๋ณด๋‹ค ํ˜„์ €ํžˆ ์šฐ์ˆ˜.
  • ๊ณ ๊ธ‰ ์‚ฌ์šฉ์ž (Qwen3 32B): RTX 4090์ด ์žฅ์ฐฉ๋œ ์ปดํŒฉํŠธ ATX ๋ฐ์Šคํฌํ†ฑ PC โ€” ์˜ˆ์‹œ: Fractal Node 804 ์ผ€์ด์Šค ($90), RTX 4090 (ํ˜„์žฌ ๊ฐ€๊ฒฉ ์•ฝ $1,900), Ryzen 9 7950X (์•ฝ $600), DDR5 64 GB (์•ฝ $180). ์ด ์•ฝ $2,800. Qwen3 32B๋ฅผ ๋ฌด๊ธฐํ•œ 10โ€“14 tok/s๋กœ ์‹คํ–‰.
Minisforum UM890 Pro ๊ตฌ๋งค (Qwen3 7B CPU ์„œ๋ฒ„) โ†’์ œํ’ˆ ๋งํฌ ยท ๊ณต๊ฐœ๋จAOOSTAR GEM12 Pro OCuLink ๊ตฌ๋งค (eGPU ์ง€์›) โ†’์ œํ’ˆ ๋งํฌ ยท ๊ณต๊ฐœ๋จ

ํŒ์ •: ๋ชจ๋ธ ํฌ๊ธฐ๋ณ„ ๋ฐฐํฌ ๋ฐฉ๋ฒ• ์„ ํƒ

ํ•˜๋“œ์›จ์–ด์˜ ์ธ์ƒ์ ์ธ ์‚ฌ์–‘์ด ์•„๋‹ˆ๋ผ ๋ชจ๋ธ ํฌ๊ธฐ์™€ ํ•˜๋ฃจ ์‚ฌ์šฉ ์‹œ๊ฐ„์— ๋”ฐ๋ผ Qwen ๋ฐฐํฌ ๋ฐฉ์‹์„ ์„ ํƒํ•˜์‹ญ์‹œ์˜ค.

Qwen ๋ฐฐํฌ ๊ฒฐ์ •

Use a local LLM if:

  • โ€ขQwen3 7B ๋˜๋Š” 14B๋ฅผ ํ•˜๋ฃจ 4์‹œ๊ฐ„ ์ด์ƒ ์‚ฌ์šฉ โ†’ mini PC ๋˜๋Š” GPU ๊ตฌ๋งค ๊ถŒ์žฅ; ํด๋ผ์šฐ๋“œ๊ฐ€ ๋” ๋น„์Œˆ
  • โ€ข์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์ฝ”๋“œ ๋˜๋Š” ๋ฌธ์„œ ์›Œํฌํ”Œ๋กœ์—์„œ ์ง€์—ฐ ์‹œ๊ฐ„ 80ms ๋ฏธ๋งŒ ํ•„์š”
  • โ€ข๋„คํŠธ์›Œํฌ ์™ธ๋ถ€๋กœ ๋‚˜๊ฐ€์„œ๋Š” ์•ˆ ๋˜๋Š” ๊ฐœ์ธ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ
  • โ€ข์ด๋ฏธ 12 GB ์ด์ƒ VRAM์˜ ๋ฐ์Šคํฌํ†ฑ GPU๋ฅผ ์œ ํœด ์ƒํƒœ๋กœ ๋ณด์œ 

Use a cloud model if:

  • โ€ขQwen3 72B ๋น„์ •๊ธฐ ์‚ฌ์šฉ (ํ•˜๋ฃจ 4์‹œ๊ฐ„ ๋ฏธ๋งŒ) โ€” RunPod A40 48 GB $0.44/์‹œ๊ฐ„์ด ๋“€์–ผ GPU ๋นŒ๋“œ๋ณด๋‹ค ํ›จ์”ฌ ์ €๋ ด
  • โ€ขํ•˜๋“œ์›จ์–ด ๊ตฌ๋งค ์ „ Qwen3 72B๋ฅผ ํ…Œ์ŠคํŠธํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ
  • โ€ข์‚ฌ์šฉ ํŒจํ„ด์ด ๋ถˆ๊ทœ์น™ํ•˜๊ณ  ์˜ˆ์ธก ๋ถˆ๊ฐ€๋Šฅ โ€” ํด๋ผ์šฐ๋“œ๋Š” ๋ฏธ์‚ฌ์šฉ ์‹œ ๋น„์šฉ์ด 0์œผ๋กœ ์ค„์–ด๋“ฆ
  • โ€ข๋ฏธ๊ตญ/EU ์™ธ ์ง€์—ญ์— ์žˆ๊ณ  ๋ฐฐ์†ก๋น„ ๋˜๋Š” ์ˆ˜์ž… ๊ด€์„ธ๋กœ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ

Quick decision:

  • โ†’๋งค์ผ Qwen3 7B: Minisforum UM890 Pro ($429)
  • โ†’๋งค์ผ Qwen3 14B: AOOSTAR + RTX 4060 Ti (์•ฝ $800)
  • โ†’๋งค์ผ Qwen3 32B: ์ปดํŒฉํŠธ ATX + RTX 4090 (์•ฝ $2,800)
  • โ†’๋น„์ •๊ธฐ Qwen3 72B: RunPod A40 48 GB ($0.44/์‹œ๊ฐ„)

๊ด€๋ จ ๊ฐ€์ด๋“œ

  • Ollama ๊ธฐ์ดˆ Qwen ์„ค์ • (์ž…๋ฌธ): /ko/power-local-llm/run-qwen-locally-guide-2026
  • ๋กœ์ปฌ LLM์šฉ GPU ๊ตฌ๋งค ๊ฐ€์ด๋“œ: /ko/power-local-llm/best-gpu-buying-guide-local-llm-2026
  • ๋ชจ๋ธ ํŒŒ์ผ์šฉ NAS ์Šคํ† ๋ฆฌ์ง€: /ko/power-local-llm/best-nas-storage-local-ai-models-2026
  • ํด๋ผ์šฐ๋“œ GPU ๋น„๊ต (์„œ๊ตฌ ๊ณต๊ธ‰์—…์ฒด): /ko/power-local-llm/cloud-gpu-rental-guide-2026

์ž์ฃผ ๋ฌป๋Š” ์งˆ๋ฌธ

RTX 4090 ํ•œ ์žฅ์œผ๋กœ Qwen3 72B๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์•„๋‹ˆ์š”. Q4_K_M ์–‘์žํ™”์˜ Qwen3 72B๋Š” VRAM 43.5 GB๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. RTX 4090์€ 24 GB์ž…๋‹ˆ๋‹ค. RTX 4090 ๋‘ ์žฅ(ํ•ฉ๊ณ„ 48 GB), A100 80 GB, ๋˜๋Š” ํด๋ผ์šฐ๋“œ GPU ๋Œ€์—ฌ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. RTX 4090 ํ•œ ์žฅ์œผ๋กœ๋Š” Q4_K_M์˜ Qwen3 32B(20.1 GB)๋ฅผ ์—ฌ์œ  ์žˆ๊ฒŒ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ๋•์…˜ Qwen ๋ฐฐํฌ์—์„œ Ollama์™€ vLLM์˜ ์ฐจ์ด์ ์€ ๋ฌด์—‡์ž…๋‹ˆ๊นŒ?

Ollama๋Š” ์„ค์ •์ด ์‰ฝ๊ณ  Multi-GPU ๋ถ„์‚ฐ์„ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค โ€” ๊ฐœ์ธ ์„œ๋ฒ„ ๋ฐ ๋™์‹œ ์‚ฌ์šฉ์ž 20๋ช… ๋ฏธ๋งŒ์˜ ํŒ€์— ์ตœ์ ์ž…๋‹ˆ๋‹ค. vLLM์€ ํ…์„œ ๋ณ‘๋ ฌ์„ฑ๊ณผ ์—ฐ์† ๋ฐฐ์นญ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋™์‹œ ๋ถ€ํ•˜์—์„œ 2โ€“4๋ฐฐ ํšจ์œจ์  โ€” ์‹œ๊ฐ„๋‹น 100๊ฐœ ์ด์ƒ์˜ ์š”์ฒญ์ด๋‚˜ ๋‹ค์ˆ˜ ์‚ฌ์šฉ์ž๋ฅผ ์œ„ํ•œ ํ”„๋กœ๋•์…˜ API์— ์ตœ์ ์ž…๋‹ˆ๋‹ค.

Ollama๋Š” Qwen์˜ Multi-GPU ์ถ”๋ก ์„ ๋„ค์ดํ‹ฐ๋ธŒ๋กœ ์ง€์›ํ•ฉ๋‹ˆ๊นŒ?

์˜ˆ, Ollama 0.3.0(2025)๋ถ€ํ„ฐ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. CUDA_VISIBLE_DEVICES=0,1๋กœ ์‚ฌ์šฉํ•  GPU๋ฅผ ์ง€์ •ํ•˜์‹ญ์‹œ์˜ค. Ollama๊ฐ€ ์ž๋™์œผ๋กœ ๋ชจ๋ธ์„ ๋ถ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. RTX 4090 ๋‘ ์žฅ์˜ Qwen3 72B์—์„œ 5โ€“8 tok/s๋ฅผ ์˜ˆ์ƒํ•˜์‹ญ์‹œ์˜ค โ€” ์†Œ๋น„์ž ๊ตฌ์„ฑ์—์„œ๋Š” NVLink ๋Œ€์‹  PCIe๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ๋ถ„์‚ฐ๋˜๋ฏ€๋กœ A100 80 GB ๋‹จ์ผ ์นด๋“œ๋ณด๋‹ค ๋А๋ฆฝ๋‹ˆ๋‹ค.

Qwen ์ถ”๋ก ์—์„œ Alibaba Cloud๊ฐ€ RunPod๋ณด๋‹ค ์ €๋ ดํ•ฉ๋‹ˆ๊นŒ?

Alibaba Cloud PAI๋Š” GPU ๋“ฑ๊ธ‰ ๋ฐ ์ง€์—ญ์— ๋”ฐ๋ผ ์‹œ๊ฐ„๋‹น $0.50โ€“2.00์ž…๋‹ˆ๋‹ค. RunPod A40 48 GB๋Š” ์‹œ๊ฐ„๋‹น $0.44์ž…๋‹ˆ๋‹ค. Alibaba Cloud๋Š” ์ผ๋ฐ˜ Ollama๋ณด๋‹ค 20โ€“30% ๋น ๋ฅผ ์ˆ˜ ์žˆ๋Š” Qwen ์‚ฌ์ „ ๊ตฌ์„ฑ ์ถ”๋ก  ํ™˜๊ฒฝ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค โ€” ์ด๋ฏธ Alibaba Cloud ์ƒํƒœ๊ณ„๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹ค๋ฉด ์‹œ๋„ํ•ด๋ณผ ๋งŒํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์ˆ˜ ๋น„์šฉ ์ธก๋ฉด์—์„œ๋Š” RunPod ์ŠคํŒŸ ์ธ์Šคํ„ด์Šค๊ฐ€ ๋” ์ €๋ ดํ•ฉ๋‹ˆ๋‹ค.

์ƒ์‹œ ๊ฐ€๋™ Qwen ์„œ๋ฒ„๋Š” ์ „๊ธฐ๋ฅผ ์–ผ๋งˆ๋‚˜ ์‚ฌ์šฉํ•ฉ๋‹ˆ๊นŒ?

CPU๋กœ Qwen3 7B๋ฅผ ์‹คํ–‰ํ•˜๋Š” Minisforum UM890 Pro๋Š” ์œ ํœด ์‹œ 12W, ๋ถ€ํ•˜ ์‹œ 45W๋ฅผ ์†Œ๋น„ํ•ฉ๋‹ˆ๋‹ค. ๋ฏธ๊ตญ ํ‰๊ท  ์ „๊ธฐ์š”๊ธˆ($0.16/kWh)์œผ๋กœ 24/7 ์šด์˜ ๋น„์šฉ์€ ์›” ์•ฝ $0.70โ€“1.80์ž…๋‹ˆ๋‹ค. RTX 4060 Ti 16 GB๋Š” ๋ถ€ํ•˜ ์‹œ 165W โ€” ์—ฌ๊ธฐ์— mini PC ์œ ํœด ์†Œ๋น„๋Ÿ‰(์•ฝ 25W)์„ ํ•ฉ์น˜๋ฉด ์ด ์•ฝ 190W๋กœ, 24/7 ์ตœ๋Œ€ ๋ถ€ํ•˜ ๊ธฐ์ค€ ์›” ์•ฝ $7โ€“8์ž…๋‹ˆ๋‹ค.

Self-hosted Qwen API๋ฅผ ChatGPT ํ˜ธํ™˜ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๊นŒ?

์˜ˆ. Ollama๋Š” http://your-server:11434/v1์—์„œ OpenAI ํ˜ธํ™˜ API๋ฅผ ๋…ธ์ถœํ•ฉ๋‹ˆ๋‹ค. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ OPENAI_API_BASE=http://your-server:11434/v1 ๋ฐ OPENAI_API_KEY=any-value๋ฅผ ์„ค์ •ํ•˜์‹ญ์‹œ์˜ค. OpenAI Chat Completions API๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ๋ชจ๋“  ๋„๊ตฌ โ€” Continue.dev, Cursor(๋กœ์ปฌ ๋ชจ๋“œ), LangChain, AutoGen โ€” ๋Š” ์ˆ˜์ • ์—†์ด ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.

์—…๋ฐ์ดํŠธ ๊ธฐ๋ก

  • 2026-05-26: ์ตœ์ดˆ ๊ฒŒ์‹œ. 2026๋…„ 5์›” ํ•˜๋“œ์›จ์–ด ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ. Newegg, Amazon ๋ฐ GPU ์‹œ์žฅ ์ถ”์ ๊ธฐ์—์„œ ๊ฐ€๊ฒฉ ๊ฒ€์ฆ.
  • ๋‹ค์Œ ๊ฒ€ํ†  ์˜ˆ์ •: 2026-11-26

โ† ๊ณ ๊ธ‰ ๋กœ์ปฌ LLM์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

Qwen 2026 ํ”„๋กœ๋•์…˜: Docker, API ์„œ๋ฒ„, Multi-GPU ์„ค์ • | PromptQuorum