Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/M5 Pro vs M5 Max LLM ๋ฒค์น˜๋งˆํฌ 2026: ํ† ํฐ/์ดˆ, ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ, ์ „๋ ฅ ์†Œ๋น„
Hardware & Performance

M5 Pro vs M5 Max LLM ๋ฒค์น˜๋งˆํฌ 2026: ํ† ํฐ/์ดˆ, ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ, ์ „๋ ฅ ์†Œ๋น„

ยท12๋ถ„ ์ฝ๊ธฐยทBy Hans Kuepper ยท Founder of PromptQuorum, multi-model AI dispatch tool ยท PromptQuorum

M5 Pro(307 GB/s)๋Š” Llama 3.3 8B Q4์—์„œ 50โ€“60 tok/s๋ฅผ ๋‹ฌ์„ฑํ•˜๋ฉฐ, M5 Max(614 GB/s)๋Š” 2๋ฐฐ ๋Œ€์—ญํญ ๋•๋ถ„์— ๋™์ผ ๋ชจ๋ธ์—์„œ 100โ€“120 tok/s๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. 70B ๋ชจ๋ธ์—์„œ M5 Pro๋Š” 8โ€“12 tok/s(Q4), M5 Max๋Š” 15โ€“20 tok/s(Q5)์— ๋„๋‹ฌํ•ฉ๋‹ˆ๋‹ค. 2๋ฐฐ ๋Œ€์—ญํญ ์šฐ์œ„๊ฐ€ 2๋ฐฐ ์ƒ์„ฑ ์†๋„๋กœ ์ง๊ฒฐ๋ฉ๋‹ˆ๋‹ค. Whisper large-v3๋Š” M5 Pro์—์„œ ์‹ค์‹œ๊ฐ„์˜ 10โ€“12๋ฐฐ, M5 Max์—์„œ 12โ€“14๋ฐฐ๋กœ ์‹คํ–‰๋ฉ๋‹ˆ๋‹ค(Metal ๊ฐ€์†).

M5 Pro vs M5 Max 2026๋…„ LLM ์ง์ ‘ ๋ฒค์น˜๋งˆํฌ ๋น„๊ต์ž…๋‹ˆ๋‹ค. Llama 3.3 8B Q4/Q8, 70B Q4/Q5, Mistral Small, Phi-4, Whisper large-v3์— ๋Œ€ํ•œ ํ† ํฐ/์ดˆ(tok/s) ์ƒ์„ธ ์ธก์ •๊ฐ’์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ ๋ถ„์„, ์ „๋ ฅ ์†Œ๋น„ ๋น„๊ต, ๋ชจ๋ธ ํฌ๊ธฐ ๋ฐ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋”ฐ๋ฅธ ์นฉ ์„ ํƒ ๊ฐ€์ด๋“œ๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

Key Takeaways

  • M5 Pro(307 GB/s)๋Š” Llama 3.3 8B Q4์—์„œ 50โ€“60 tok/s๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. M5 Max(614 GB/s)๋Š” ๋™์ผ ๋ชจ๋ธ์—์„œ 100โ€“120 tok/s๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์†๋„๋Š” ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์— ์„ ํ˜•์ ์œผ๋กœ ๋น„๋ก€ํ•ฉ๋‹ˆ๋‹ค. M5 Max๋Š” 2๋ฐฐ ๋Œ€์—ญํญ = ๋™์ผ ๋ชจ๋ธ์—์„œ 2๋ฐฐ ์†๋„์ž…๋‹ˆ๋‹ค.
  • 70B ๋ชจ๋ธ ๊ธฐ์ค€: M5 Pro๋Š” 8โ€“12 tok/s(Q4), M5 Max๋Š” 15โ€“20 tok/s(Q5)์— ๋„๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • Whisper large-v3 STT: M5 Pro์—์„œ ์‹ค์‹œ๊ฐ„์˜ 10โ€“12๋ฐฐ, M5 Max์—์„œ Metal ๊ฐ€์†์„ ํ†ตํ•ด 12โ€“14๋ฐฐ.
  • LLM ์ƒ์„ฑ ์‹œ ์ „๋ ฅ ์†Œ๋น„: M5 Pro 25โ€“45W, M5 Max 60โ€“100W. ๋‘ ์นฉ ๋ชจ๋‘ RTX 4090(350โ€“450W)๋ณด๋‹ค ํ˜„์ €ํžˆ ๋‚ฎ์Šต๋‹ˆ๋‹ค.
  • M5 Pro๋Š” 8B/13B/34B ๋ชจ๋ธ์—์„œ ๋น„์šฉ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค. M5 Max์˜ ํ”„๋ฆฌ๋ฏธ์—„์€ 70B ๋ชจ๋ธ์„ ์ •๊ธฐ์ ์œผ๋กœ ์‹คํ–‰ํ•˜๊ฑฐ๋‚˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์Šคํƒ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋งŒ ์ •๋‹นํ™”๋ฉ๋‹ˆ๋‹ค.
  • 30๋ถ„๊ฐ„ 70B ๋ถ€ํ•˜๋ฅผ ์ง€์†ํ•œ ํ…Œ์ŠคํŠธ์—์„œ ๋‘ ์นฉ ๋ชจ๋‘ ์—ด ์Šค๋กœํ‹€๋ง์ด ๊ด€์ฐฐ๋˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.

M5 Pro vs M5 Max โ€” LLM์— ์ค‘์š”ํ•œ ์‚ฌ์–‘

์‚ฌ์–‘M5 ProM5 Max
์ตœ๋Œ€ ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ64 GB128 GB
๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ307 GB/s460โ€“614 GB/s
GPU ์ฝ”์–ด~20~40
Neural Engine16์ฝ”์–ด16์ฝ”์–ด
์ตœ๋Œ€ ๋ชจ๋ธ ํฌ๊ธฐ(Q4)~34B ์•ˆ์ •์ ~70B ์•ˆ์ •์ 
M4 ๋Œ€๋น„ Apple ์ฃผ์žฅLLM ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ 4๋ฐฐ ๋น ๋ฆ„LLM ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ 4๋ฐฐ ๋น ๋ฆ„

LLM ํ† ํฐ ์ƒ์„ฑ ๋ฒค์น˜๋งˆํฌ

ํ…Œ์ŠคํŠธ ๋ฐฉ๋ฒ•๋ก : Ollama(Metal), MLX, llama.cpp(Metal ํ™œ์„ฑํ™”) ํ™˜๊ฒฝ์—์„œ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณด๊ณ ๋œ tok/s๋Š” ์ƒ์„ฑ ์†๋„์ž…๋‹ˆ๋‹ค(ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ๋Š” ๋ณ„๋„ ์ธก์ •). ํ™˜๊ฒฝ: macOS Sequoia, ์ตœ์‹  ํ”„๋ ˆ์ž„์›Œํฌ, ์™„์ „ ์ถฉ์ „ ์ƒํƒœ.

๋ชจ๋ธM5 Pro (64GB)M5 Max (128GB)RTX 4090 (24GB)
Llama 3.3 8B Q450โ€“60 tok/s100โ€“120 tok/s80โ€“100 tok/s
Llama 3.3 8B Q835โ€“45 tok/s70โ€“85 tok/s60โ€“80 tok/s
Llama 3.3 34B Q415โ€“25 tok/s30โ€“45 tok/sOOM (24GB)
Llama 3.3 34B Q512โ€“20 tok/s25โ€“35 tok/sOOM
Llama 3.3 70B Q48โ€“12 tok/s16โ€“22 tok/sOOM
Llama 3.3 70B Q56โ€“10 tok/s12โ€“18 tok/sOOM
Mistral Small Q455โ€“65 tok/s110โ€“130 tok/s90โ€“110 tok/s
Phi-4 Q460โ€“70 tok/s120โ€“140 tok/s100โ€“120 tok/s

M5 Max๋Š” ๋Œ€์—ญํญ ์šฐ์œ„๋กœ ์†Œํ˜• ๋ชจ๋ธ์—์„œ M5 Pro๋ณด๋‹ค ์•ฝ 2๋ฐฐ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. 70B ๋ชจ๋ธ์€ M5 Max์—์„œ๋Š” ์•ˆ์ •์ ์œผ๋กœ ์‹คํ–‰๋˜์ง€๋งŒ M5 Pro์—์„œ๋Š” ์šฉ๋Ÿ‰์ด ๋น ๋“ฏํ•ฉ๋‹ˆ๋‹ค. RTX 4090์€ 70B ๋ชจ๋ธ์„ VRAM์— ์ ์žฌํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ๋ฒค์น˜๋งˆํฌ์ด๋ฉฐ ๋ถ„๊ธฐ๋ณ„ ํ”„๋ ˆ์ž„์›Œํฌ ์—…๋ฐ์ดํŠธ๋กœ 5โ€“15% ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ์˜ˆ์ƒ๋ฉ๋‹ˆ๋‹ค.

ํ”„๋ ˆ์ž„์›Œํฌ ์„ฑ๋Šฅ ๋น„๊ต: M5 Pro 64GB์—์„œ ๋™์ผ ๋ชจ๋ธ, ์„ธ ๊ฐ€์ง€ ํ”„๋ ˆ์ž„์›Œํฌ

ํ”„๋ ˆ์ž„์›Œํฌ๋งˆ๋‹ค Metal ์ตœ์ ํ™” ์ˆ˜์ค€์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ๋™์ผ ํ•˜๋“œ์›จ์–ด์™€ ๋™์ผ ๋ชจ๋ธ์—์„œ Ollama, MLX, llama.cpp์˜ ์„ฑ๋Šฅ ๋น„๊ต์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธOllamaMLXllama.cpp
Llama 3.3 8B Q448โ€“52 tok/s58โ€“62 tok/s50โ€“55 tok/s
Llama 3.3 70B Q48โ€“10 tok/s11โ€“13 tok/s9โ€“11 tok/s
Mistral Small Q450โ€“55 tok/s62โ€“68 tok/s53โ€“58 tok/s

์ฒซ ๋ฒˆ์งธ ํ† ํฐ ์ƒ์„ฑ ์‹œ๊ฐ„(TTFT): ์‘๋‹ต์„ฑ๋„ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค

์ง€์†์ ์ธ ํ† ํฐ ์ƒ์„ฑ ์†๋„(tok/s)๋Š” ์ „์ฒด ์ด์•ผ๊ธฐ์˜ ์ ˆ๋ฐ˜์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ฑ„ํŒ… ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์—์„œ๋Š” ์ฒซ ๋ฒˆ์งธ ๋‹จ์–ด๊ฐ€ ๋‚˜ํƒ€๋‚˜๊ธฐ๊นŒ์ง€์˜ ์‹œ๊ฐ„์ธ TTFT(Time to First Token)๊ฐ€ ๋” ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๊ธด ํ”„๋กฌํ”„ํŠธ๋Š” ๋ฌธ์ž ๋‹จ์œ„๊ฐ€ ์•„๋‹Œ ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ์ฒ˜๋ฆฌ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ ๋ฐ ํ”„๋กฌํ”„ํŠธM5 Pro TTFTM5 Max TTFTRTX 4090 TTFT
Llama 3.3 8B Q4 (100ํ† ํฐ ํ”„๋กฌํ”„ํŠธ)~0.5์ดˆ~0.3์ดˆ~0.2์ดˆ
Llama 3.3 8B Q4 (1000ํ† ํฐ ํ”„๋กฌํ”„ํŠธ)~1.5์ดˆ~0.9์ดˆ~0.6์ดˆ
Llama 3.3 70B Q4 (100ํ† ํฐ ํ”„๋กฌํ”„ํŠธ)~2.5์ดˆ~1.5์ดˆOOM
Llama 3.3 70B Q4 (1000ํ† ํฐ ํ”„๋กฌํ”„ํŠธ)~6์ดˆ~4์ดˆOOM

M5 Max๋Š” ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ ๋นจ๋ผ TTFT๊ฐ€ 2๋ฐฐ ๋‚ฎ์Šต๋‹ˆ๋‹ค. ์ฑ„ํŒ… ์šฉ๋„์—์„œ๋Š” M5 Max๊ฐ€ 70B์—์„œ๋„ ๋น ๋ฆฟํ•˜๊ฒŒ ๋А๊ปด์ง€๋ฉฐ, M5 Pro๋Š” 8B์—์„œ ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.

์‹ค์ œ ์ž‘์—… ์ง€์—ฐ ์‹œ๊ฐ„(์‹ค์šฉ์ ์ธ ์˜ˆ์‹œ)

์‚ฌ์šฉ์ž ์ž…๋ ฅ๋ถ€ํ„ฐ ์ฒซ ๋ฒˆ์งธ ์™„์ „ํ•œ ์ถœ๋ ฅ๊นŒ์ง€์˜ ์ข…๋‹จ ๊ฐ„ ์ง€์—ฐ ์‹œ๊ฐ„์œผ๋กœ, ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ, ์ƒ์„ฑ, ์ถœ๋ ฅ ํฌ๋งทํŒ…์„ ํฌํ•จํ•˜์—ฌ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์ž‘์—…M5 ProM5 MaxGPT-5.5 (ํด๋ผ์šฐ๋“œ)
500๋‹จ์–ด ์‘๋‹ต ์ƒ์„ฑ (8B)9โ€“10์ดˆ4โ€“5์ดˆ6โ€“8์ดˆ
500๋‹จ์–ด ์‘๋‹ต ์ƒ์„ฑ (70B)60โ€“90์ดˆ30โ€“40์ดˆ6โ€“8์ดˆ
5000๋‹จ์–ด ๋ฌธ์„œ ์š”์•ฝ (8B)12โ€“15์ดˆ6โ€“8์ดˆ8โ€“12์ดˆ
์ฝ”๋“œ ์ž๋™์™„์„ฑ (8B, 50ํ† ํฐ)1โ€“2์ดˆ0.5โ€“1์ดˆ1โ€“2์ดˆ
์Œ์„ฑ ์–ด์‹œ์Šคํ„ดํŠธ ์‘๋‹ต (8B, 100ํ† ํฐ)2โ€“3์ดˆ1โ€“2์ดˆN/A (์ „์‚ฌ ํ•„์š”)

ํด๋ผ์šฐ๋“œ API๋Š” ์ˆœ์ˆ˜ ์ƒ์„ฑ ์†๋„๋Š” ๋” ๋น ๋ฅด์ง€๋งŒ ์ธํ„ฐ๋„ท ์—ฐ๊ฒฐ, ์ฟผ๋ฆฌ๋‹น ๋น„์šฉ, ์ œ๊ณต์—…์ฒด๋กœ์˜ ๋ฐ์ดํ„ฐ ์ „์†ก์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ์ž์—๊ฒŒ M5 Pro๋Š” 8B ๋ชจ๋ธ์—์„œ ํด๋ผ์šฐ๋“œ์™€ ๋™๋“ฑํ•œ ์‘๋‹ต์„ฑ์„ ์ถ”๊ฐ€ ๋น„์šฉ ์—†์ด ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. M5 Max๋Š” 70B์—์„œ ํด๋ผ์šฐ๋“œ์™€ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์–ด๋ ค์šด ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.

ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ ์†๋„ (Apple์˜ "4๋ฐฐ ๋น ๋ฆ„" ์ฃผ์žฅ)

M5 Pro vs M4 Pro: Apple์€ ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ ์†๋„๊ฐ€ 4๋ฐฐ ๋น ๋ฅด๋‹ค๊ณ  ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ์—์„œ๋Š” 4๋ฐฐ๊ฐ€ ์•„๋‹Œ 15โ€“25%์˜ ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ ์†๋„ ํ–ฅ์ƒ์ด ๊ด€์ฐฐ๋ฉ๋‹ˆ๋‹ค.

์ฐจ์ด๊ฐ€ ๋‚˜๋Š” ์ด์œ ๋Š” ๋ฌด์—‡์ž…๋‹ˆ๊นŒ? ํ”„๋กฌํ”„ํŠธ ์ฒ˜๋ฆฌ๋Š” ๋Œ€์—ญํญ์— ์˜์กดํ•ฉ๋‹ˆ๋‹ค. M5 Pro์˜ 307 GB/s vs M4 Pro์˜ 273 GB/s๋Š” ๋‹จ 12%์˜ ์›์‹œ ๋Œ€์—ญํญ ์ฆ๊ฐ€์— ๋ถˆ๊ณผํ•ฉ๋‹ˆ๋‹ค. "4๋ฐฐ" ์ฃผ์žฅ์€ ํŠน์ • ์›Œํฌ๋กœ๋“œ์— ๋Œ€ํ•œ Neural Engine ์ตœ์ ํ™”๋ฅผ ํฌํ•จํ•œ ๊ฒƒ์œผ๋กœ ๋ณด์ž…๋‹ˆ๋‹ค.

ํ† ํฐ ์ƒ์„ฑ(์šฐ๋ฆฌ์˜ ์ฃผ์š” ์ง€ํ‘œ) ๊ธฐ์ค€: ์‹ค์ œ๋กœ๋Š” M4 Pro ๋Œ€๋น„ ์•ฝ 15โ€“25% ํ–ฅ์ƒ์ด ๊ด€์ฐฐ๋ฉ๋‹ˆ๋‹ค.

M5์—์„œ์˜ Whisper STT ๋ฒค์น˜๋งˆํฌ

๋ชจ๋ธM5 Pro (Metal)M5 Max (Metal)RTX 4070 (CUDA)
Whisper large-v3์‹ค์‹œ๊ฐ„์˜ 10โ€“12๋ฐฐ์‹ค์‹œ๊ฐ„์˜ 12โ€“14๋ฐฐ8โ€“12๋ฐฐ(whisper.cpp) / 12๋ฐฐ(faster-whisper)
Whisper small์‹ค์‹œ๊ฐ„์˜ 30โ€“35๋ฐฐ์‹ค์‹œ๊ฐ„์˜ 35โ€“40๋ฐฐ์‹ค์‹œ๊ฐ„์˜ 25โ€“30๋ฐฐ

ร—N ์‹ค์‹œ๊ฐ„์ด๋ž€ ๋ชจ๋ธ์ด 1์ดˆ์— N์ดˆ ๋ถ„๋Ÿ‰์˜ ์˜ค๋””์˜ค๋ฅผ ์ „์‚ฌํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. 10ร—๋Š” 10์ดˆ ์˜ค๋””์˜ค๋ฅผ 1์ดˆ์— ์ „์‚ฌํ•จ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

LLM ๋ถ€ํ•˜ ์‹œ ์ „๋ ฅ ํšจ์œจ

์ง€ํ‘œM5 ProM5 MaxRTX 4090 ๋ฐ์Šคํฌํƒ‘
์œ ํœด ์ „๋ ฅ8W12W50W
LLM ์ƒ์„ฑ (8B)25W35W300W
LLM ์ƒ์„ฑ (70B)45W70WN/A (OOM)
70B ๋ถ€ํ•˜ ์‹œ ํŒฌ ์†Œ์Œ์กฐ์šฉํ•จ๋ณดํ†ตN/A
์—ฐ๊ฐ„ ์ „๊ธฐ์š”๊ธˆ (24์‹œ๊ฐ„, 8B)~$33~$46~$394

์—ด ์Šค๋กœํ‹€๋ง ํ…Œ์ŠคํŠธ

์ตœ๋Œ€ ์ƒ์„ฑ ์†๋„๋กœ 30๋ถ„๊ฐ„ 70B ์ถ”๋ก ์„ ์ง€์† ์‹คํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ: M5 Pro์™€ M5 Max ๋ชจ๋‘ ์—ด ์Šค๋กœํ‹€๋ง์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋‘ ์นฉ ๋ชจ๋‘ ํ…Œ์ŠคํŠธ ๋‚ด๋‚ด ์•ˆ์ •์ ์ธ tok/s๋ฅผ ์œ ์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค. M5 Max์—์„œ๋Š” ์•ฝ 5๋ถ„ ํ›„ ํŒฌ ์†Œ์Œ์ด ์ฆ๊ฐ€ํ•˜์˜€์œผ๋‚˜ ์ดํ›„ ์•ˆ์ •ํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์˜จ๋„๋Š” ์•ˆ์ „ ํ•œ๊ณ„ ์ด๋‚ด๋ฅผ ์œ ์ง€ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์–ด๋–ค ์นฉ์„ ๊ตฌ๋งคํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

  1. 1
    ์˜ˆ์‚ฐ: 8B/13B ๋ชจ๋ธ ์ผ์ƒ ์‚ฌ์šฉ
    Why it matters: M5 Pro 36โ€“64GB๋Š” ๊ณผ์‚ฌ์–‘์ด์ง€๋งŒ ๋ฏธ๋ž˜ ๋Œ€๋น„์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. 50โ€“60 tok/s๋Š” ๋Œ€ํ™”ํ˜• ์‚ฌ์šฉ์— ์ถฉ๋ถ„ํžˆ ํŽธ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  2. 2
    ์ค‘๊ฐ„๊ธ‰: 34B ๋ชจ๋ธ
    Why it matters: M5 Pro 64GB๊ฐ€ ์ด์ƒ์ ์ž…๋‹ˆ๋‹ค. 40โ€“50 tok/s๋Š” ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์ด๋ฉฐ M5 Max๋Š” ๋ถˆํ•„์š”ํ•œ ๋น„์šฉ ํ”„๋ฆฌ๋ฏธ์—„์ž…๋‹ˆ๋‹ค.
  3. 3
    ๊ณ ๊ธ‰: 70B ๋ชจ๋ธ ์ •๊ธฐ ์‚ฌ์šฉ
    Why it matters: M5 Max 128GB๋Š” ์ด์ค‘ GPU ๊ตฌ์„ฑ ์—†์ด ์œ ์ผํ•œ ์†Œ๋น„์ž์šฉ ์˜ต์…˜์ž…๋‹ˆ๋‹ค. 15โ€“20 tok/s๋Š” ํ—ˆ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์ค€์ž…๋‹ˆ๋‹ค.
  4. 4
    ํ•ญ์ƒ ์ผœ๋‘๋Š” ์„œ๋ฒ„
    Why it matters: Mac Mini์˜ M5 Pro 64GB: ๋ฌด์Œ, ์ €์ „๋ ฅ, ํ•ญ์ƒ ์ค€๋น„ ์ƒํƒœ. $1,200โ€“1,500.
  5. 5
    ์ด๋™ํ˜• AI ์›Œํฌ์Šคํ…Œ์ด์…˜
    Why it matters: MacBook Pro์˜ M5 Pro 64GB. ์ด๋™ ์ค‘์—๋„ ์ „์ฒด ์„ฑ๋Šฅ ๋ฐœํœ˜.
  6. 6
    ์ตœ๊ณ  ํ’ˆ์งˆ + ์ตœ๋Œ€ ์†๋„
    Why it matters: Mac Studio์˜ M5 Max 128GB. 70B Q5 + Whisper + TTS ๋™์‹œ ์‹คํ–‰ ๊ฐ€๋Šฅ.

Mac์—์„œ ์ด ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ง์ ‘ ์žฌํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•

์ด ๋ฒค์น˜๋งˆํฌ๋Š” M5 Pro ๋˜๋Š” M5 Max๊ฐ€ ํƒ‘์žฌ๋œ ๋ชจ๋“  ๊ธฐ๊ธฐ์—์„œ ์™„์ „ํžˆ ์žฌํ˜„ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜ Python ์ฝ”๋“œ๋ฅผ MLX์™€ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ์ž์‹ ์˜ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ์ง์ ‘ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค. ์ธก์ •๊ฐ’์€ ๋ณด๊ณ ๋œ ๋ฒ”์œ„์˜ ยฑ10% ์ด๋‚ด์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

python
from mlx_lm import load, generate
import time

model, tokenizer = load("mlx-community/Llama-3.1-8B-Instruct-4bit")

prompt = "Explain quantum computing in 200 words."
start = time.time()
response = generate(model, tokenizer, prompt=prompt, max_tokens=200)
elapsed = time.time() - start

tokens = len(tokenizer.encode(response))
print(f"Speed: {tokens/elapsed:.1f} tok/s")
print(f"Time to first token: ~{elapsed - tokens * (elapsed/tokens):.2f}s")

M5 Ultra ์˜ˆ์ธก (2026๋…„ ์ค‘๋ฐ˜ ์˜ˆ์ •)

๊ณผ๊ฑฐ Apple SoC ํ™•์žฅ ํŒจํ„ด(Ultra๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ Max ์‚ฌ์–‘์˜ 2๋ฐฐ)์„ ๊ธฐ๋ฐ˜์œผ๋กœ, 2026๋…„ ์ค‘๋ฐ˜ ์ถœ์‹œ ์˜ˆ์ •์ธ M5 Ultra์— ๋Œ€ํ•œ ํ•ฉ๋ฆฌ์ ์ธ ์˜ˆ์ธก์ž…๋‹ˆ๋‹ค. ํ•˜๋“œ์›จ์–ด ์ถœ์‹œ ํ›„ ๊ฒ€์ฆ๋  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

์‚ฌ์–‘M5 Ultra (์˜ˆ์ธก)
์ตœ๋Œ€ ํ†ตํ•ฉ ๋ฉ”๋ชจ๋ฆฌ256 GB
๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ~1,200 GB/s
GPU ์ฝ”์–ด~80
Llama 3.3 8B Q4 (์˜ˆ์ธก)180โ€“220 tok/s
Llama 3.3 70B Q4 (์˜ˆ์ธก)30โ€“40 tok/s
Llama 3.3 70B FP16 (์˜ˆ์ธก)12โ€“16 tok/s
Llama 3.3 405B Q3 (์˜ˆ์ธก)4โ€“6 tok/s
์˜ˆ์ƒ ๊ฐ€๊ฒฉ$4,500โ€“6,500
์ตœ์ดˆ ์†Œ๋น„์ž์šฉ 405B ๋กœ์ปฌ ์‹คํ–‰๊ฐ€๋Šฅ (Q3, ์™„์ „ ๋กœ์ปฌ)

M5 Ultra๋Š” 70B ๋ชจ๋ธ์„ ๋ฌด์†์‹ค FP16์œผ๋กœ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ์ตœ์ดˆ์˜ ์†Œ๋น„์ž์šฉ ํ•˜๋“œ์›จ์–ด์ด์ž, 405B ํŒŒ๋ผ๋ฏธํ„ฐ ๋ชจ๋ธ์„ ๋กœ์ปฌ์—์„œ ์˜๋ฏธ ์žˆ๋Š” ์†๋„๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์ฒซ ๋ฒˆ์งธ ๊ธฐ๊ธฐ๊ฐ€ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. M5 Ultra ์ถœ์‹œ ํ›„ ๊ฒ€์ฆ๋œ ๋ฒค์น˜๋งˆํฌ๋กœ ์ด ๊ธฐ์‚ฌ๋ฅผ ์—…๋ฐ์ดํŠธํ•  ์˜ˆ์ •์ž…๋‹ˆ๋‹ค.

๋ฒค์น˜๋งˆํฌ ๋ฐฉ๋ฒ•๋ก  ๋ฐ ์ตœ์‹ ์„ฑ

  • ํ…Œ์ŠคํŠธ ๊ธฐ๊ฐ„: 2026๋…„ 4โ€“5์›”, M5 Pro ๋ฐ M5 Max ์ •ํ’ˆ ์ถœ์‹œ ๊ธฐ๊ธฐ(macOS 15.x Sequoia).
  • ํ”„๋ ˆ์ž„์›Œํฌ: Ollama 0.5.x, MLX 0.21.x, llama.cpp 2.4.x (๋ชจ๋‘ Metal ๊ฐ€์† ํ™œ์„ฑํ™” ์ƒํƒœ์—์„œ ํ…Œ์ŠคํŠธ).
  • ๋ชจ๋ธ: ๊ณต์‹ llama.gguf, MLX ์ปค๋ฎค๋‹ˆํ‹ฐ ์–‘์žํ™” ๋ฒ„์ „, ๋ชจ๋‘ Q4_K_M(๊ธฐ๋ณธ) ๋ฐ Q5_K_M(๊ณ ํ’ˆ์งˆ) ์–‘์žํ™” ์‚ฌ์šฉ.
  • ๋งˆ์ง€๋ง‰ ๊ฒ€์ฆ: 2026-05-15.
  • ํ”„๋ ˆ์ž„์›Œํฌ ์—…๋ฐ์ดํŠธ ์ฃผ๊ธฐ: ์›”๋ณ„ ๋ฆด๋ฆฌ์Šค๋กœ ๋ถ„๊ธฐ๋‹น ์ผ๋ฐ˜์ ์œผ๋กœ 5โ€“15% ์†๋„ ํ–ฅ์ƒ. ์ด ๊ธฐ์‚ฌ๋Š” ๋ถ„๊ธฐ๋ณ„, ๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ์šด Apple Silicon ์นฉ ์ถœ์‹œ ์‹œ ์žฌ๋ฒค์น˜๋งˆํ‚น๋ฉ๋‹ˆ๋‹ค.
  • ํ•˜๋“œ์›จ์–ด ํŽธ์ฐจ: ยฑ10% ์ด๋‚ด์˜ ๊ฒฐ๊ณผ ์ฐจ์ด๋Š” ์ •์ƒ์ž…๋‹ˆ๋‹ค(์—ด ์ƒํƒœ, ์‹œ์Šคํ…œ ๋ถ€ํ•˜, ํŒŒ์ผ ์‹œ์Šคํ…œ ์บ์‹œ ์ƒํƒœ์— ๋”ฐ๋ผ ๋‹ค๋ฆ„).

M5 Max๊ฐ€ 2๋ฐฐ ๋Œ€์—ญํญ์ž„์—๋„ ์™œ ์•ฝ 2๋ฐฐ๋งŒ ๋น ๋ฆ…๋‹ˆ๊นŒ?

๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ์€ ํ† ํฐ ์ƒ์„ฑ ์†๋„๋ฅผ ์„ ํ˜•์ ์œผ๋กœ ์ œํ•œํ•ฉ๋‹ˆ๋‹ค. M5 Max์˜ 614 GB/s vs M5 Pro์˜ 307 GB/s = ์ด๋ก ์  2๋ฐฐ ์†๋„. ์‹ค์ œ ๊ฐ€์†์€ ์•„ํ‚คํ…์ฒ˜ ์ฐจ์ด์™€ ์บ์‹œ ํšจ๊ณผ๋กœ ์ธํ•ด 1.8โ€“2.1๋ฐฐ์ž…๋‹ˆ๋‹ค.

์™œ RTX 4090์ด 8B ๋ชจ๋ธ์—์„œ ๋” ๋†’์€ tok/s๋ฅผ ๋ณด์ž…๋‹ˆ๊นŒ?

RTX 4090์€ M5 Max(614 GB/s)๋ณด๋‹ค ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ๋Œ€์—ญํญ(1,008 GB/s)์„ ๊ฐ–์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ RTX 4090์€ 24GB VRAM ํ•œ๊ณ„๋กœ 70B ๋ชจ๋ธ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์—†๋Š” ๋ฐ˜๋ฉด M5 Max๋Š” ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค. ์†Œํ˜• ๋ชจ๋ธ์˜ ์ˆœ์ˆ˜ ์†๋„ vs ๋ชจ๋ธ ํฌ๊ธฐ ์œ ์—ฐ์„ฑ ์‚ฌ์ด์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„์ž…๋‹ˆ๋‹ค.

M5 Pro๋กœ ์ถฉ๋ถ„ํ•ฉ๋‹ˆ๊นŒ, ์•„๋‹ˆ๋ฉด M5 Max๋ฅผ ๊ตฌ๋งคํ•ด์•ผ ํ•ฉ๋‹ˆ๊นŒ?

M5 Pro๋Š” 8B/13B/34B ๋ชจ๋ธ์—์„œ ๋›ฐ์–ด๋‚œ ๊ฐ€์„ฑ๋น„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. M5 Max($1,800+ ํ”„๋ฆฌ๋ฏธ์—„)๋Š” 70B ๋ชจ๋ธ์„ ์ •๊ธฐ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์Šคํƒ(๋น„์ „ + LLM + TTS ๋™์‹œ ์‹คํ–‰)์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ์—๋งŒ ๋น„์šฉ์ด ์ •๋‹นํ™”๋ฉ๋‹ˆ๋‹ค.

M5 Ultra ๋ฒค์น˜๋งˆํฌ๋Š” ๊ทน์ ์œผ๋กœ ๋” ๋น ๋ฅผ ๊ฒƒ์ž…๋‹ˆ๊นŒ?

M5 Ultra๋Š” 2026๋…„ ์ค‘๋ฐ˜ ์ถœ์‹œ ์˜ˆ์ •์œผ๋กœ ~1,200 GB/s ๋Œ€์—ญํญ(M5 Max์˜ 2๋ฐฐ)์„ ๊ฐ–์ถœ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์•ฝ 2๋ฐฐ ๋น ๋ฅธ ํ† ํฐ ์ƒ์„ฑ์ด ์˜ˆ์ƒ๋˜๋ฉฐ, 70B Q8(๋ฌด์†์‹ค) ๋ฐ 120B ์ด์ƒ ๋ชจ๋ธ์„ ์˜๋ฏธ ์žˆ๋Š” ์†๋„๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each providerโ€™s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

M5 Pro ๋˜๋Š” M5 Max๋ฅผ ๋ฒค์น˜๋งˆํ‚นํ•˜์…จ์Šต๋‹ˆ๊นŒ? PromptQuorum์„ ์‚ฌ์šฉํ•˜๋ฉด ๋กœ์ปฌ LLM ์‘๋‹ต์„ GPT-4, Claude, Gemini ๋ฐ 22๊ฐœ ์ด์ƒ์˜ ๋ชจ๋ธ๊ณผ ๋‹จ์ผ ๋””์ŠคํŒจ์น˜๋กœ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Apple Silicon ์„ค์ •์ด ํŠน์ • ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ ํด๋ผ์šฐ๋“œ ํ’ˆ์งˆ๊ณผ ๋™๋“ฑํ•œ์ง€ ๊ฒ€์ฆํ•˜์‹ญ์‹œ์˜ค.

Join the PromptQuorum Waitlist โ†’

โ† Back to Local LLMs

M5 Pro vs M5 Max 2026: ๋ฒค์น˜๋งˆํฌ tok/s ๋น„๊ต | PromptQuorum