Skip to main content
PromptQuorumPromptQuorum
Home/Local LLMs/Apple Silicon의 Whisper 2026: Metal 벀치마크, Core ML μ„€μ •, M1–M5 속도 κ°€μ΄λ“œ
ν•˜λ“œμ›¨μ–΄ & μ„±λŠ₯

Apple Silicon의 Whisper 2026: Metal 벀치마크, Core ML μ„€μ •, M1–M5 속도 κ°€μ΄λ“œ

Β·14λΆ„ 읽기·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

M5 Pro의 Whisper large-v3: 10–12Γ— μ‹€μ‹œκ°„ 속도. Metal GPU μžλ™ ν™œμ„±ν™”. Large-v3-turboλŠ” 14–18Γ—μ—μ„œ 속도와 정확도λ₯Ό κ· ν˜• 있게 μ œκ³΅ν•©λ‹ˆλ‹€. 무료, μ™„μ „ μ˜€ν”„λΌμΈ.

Apple Siliconμ—μ„œμ˜ Whisper μŒμ„± 인식: M1λΆ€ν„° M5 MaxκΉŒμ§€μ˜ Metal 및 Core ML 벀치마크. μ„€μ • κ°€μ΄λ“œ, λͺ¨λΈ 선택, μ‹€μ‹œκ°„ 전사.

전체 벀치마크 ν‘œ: Apple Silicon(M1–M5)μ—μ„œμ˜ Whisper μ„±λŠ₯

μΉ©TinyBaseSmallMediumLarge-v3
β€”32Γ—20Γ—12Γ—5Γ—β€”
β€”38Γ—24Γ—16Γ—7Γ—β€”
β€”45Γ—30Γ—22Γ—10Γ—β€”
β€”55Γ—38Γ—28Γ—14Γ—β€”
β€”36Γ—23Γ—14Γ—6Γ—β€”
β€”42Γ—28Γ—20Γ—9Γ—β€”
β€”50Γ—35Γ—26Γ—12Γ—β€”
β€”60Γ—42Γ—32Γ—17Γ—β€”
β€”40Γ—26Γ—16Γ—7Γ—β€”
β€”46Γ—32Γ—22Γ—10Γ—β€”
β€”55Γ—40Γ—30Γ—14Γ—β€”
β€”44Γ—30Γ—18Γ—8Γ—β€”
β€”50Γ—36Γ—26Γ—12Γ—β€”
β€”60Γ—44Γ—34Γ—16Γ—β€”
β€”48Γ—34Γ—22Γ—10Γ—β€”
β€”55Γ—40Γ—30Γ—14Γ—β€”
β€”65Γ—48Γ—38Γ—18Γ—β€”

Γ—N μ‹€μ‹œκ°„ = 1초 μ•ˆμ— N초 λΆ„λŸ‰μ˜ μ˜€λ””μ˜€λ₯Ό 전사함. Metal 가속을 μ‚¬μš©ν•œ whisper.cpp 벀치마크. M1 Pro 이상 λͺ¨λ“  λͺ¨λΈμ—μ„œ large-v3λ₯Ό μ‹€μ‹œκ°„ μ΄μƒμ˜ μ†λ„λ‘œ μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

Whisper λͺ¨λΈ 크기 β€” μ–΄λ–€ 것을 선택해야 ν• κΉŒμš”?

λͺ¨λΈνŒŒλΌλ―Έν„°λ””μŠ€ν¬ 크기RAM μ‚¬μš©λŸ‰μ˜μ–΄ WER졜적 μš©λ„
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”β€”

WER(단어 였λ₯˜μœ¨)은 μ˜μ–΄ LibriSpeech ν…ŒμŠ€νŠΈ μ„ΈνŠΈ κΈ°μ€€μž…λ‹ˆλ‹€. Large-v3-turbo와 distil-large-v3λŠ” λŒ€λΆ€λΆ„μ˜ Macμ—μ„œ μ‹€μ‹œκ°„ 처리λ₯Ό μœ„ν•œ 졜적의 κ· ν˜•μ„ μ œκ³΅ν•©λ‹ˆλ‹€ β€” large-v3 ν’ˆμ§ˆμ˜ 4–6Γ— 속도.

Metal vs Core ML vs Apple Neural Engine: μ–΄λ–€ λ°±μ—”λ“œλ₯Ό μ„ νƒν• κΉŒμš”?

Apple Silicon은 Whisper에 μ„Έ κ°€μ§€ 가속 경둜λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€. 각각 μž₯단점이 μžˆμŠ΅λ‹ˆλ‹€.

Metal(whisper.cpp 경유) β€” ꢌμž₯: Apple Metal GPU ν”„λ ˆμž„μ›Œν¬ μ‚¬μš©, λͺ¨λ“  M μ‹œλ¦¬μ¦ˆ μΉ©κ³Ό ν˜Έν™˜, M5 Proμ—μ„œ large-v3 10–12Γ— μ‹€μ‹œκ°„, make WHISPER_METAL=1둜 μ„€μ •. 졜적 μš©λ„: λŒ€λΆ€λΆ„μ˜ μ‚¬μš©μž, κ°€μž₯ κ°„λ‹¨ν•œ μ„€μ •, κ²€μ¦λœ μ„±λŠ₯.

Core ML(Apple Core ML ν˜•μ‹ 경유) β€” κ³ κΈ‰: Apple λ¨Έμ‹ λŸ¬λ‹ ν”„λ ˆμž„μ›Œν¬ μ‚¬μš©, 일뢀 μ—°μ‚°μ—μ„œ Neural Engine(ANE) ν™œμš© κ°€λŠ₯, 일뢀 μ›Œν¬λ‘œλ“œμ—μ„œ 15–20% 빠름, λͺ¨λΈ λ³€ν™˜ ν•„μš”(10–15λΆ„ μ„€μ •). 졜적 μš©λ„: μ΅œλŒ€ 속도λ₯Ό μ›ν•˜λŠ” κ³ κΈ‰ μ‚¬μš©μž.

Apple Neural Engine(ANE) β€” μ œν•œμ  μ‚¬μš©: λͺ¨λ“  M μ‹œλ¦¬μ¦ˆ 칩의 μ „μš© AI 가속기, 직접 μ ‘κ·Ό λΆˆκ°€(Core ML 경유 ν•„μš”), μ•„ν‚€ν…μ²˜ 뢈일치둜 Whisperκ°€ ANEλ₯Ό μ™„μ „νžˆ ν™œμš©ν•˜μ§€ λͺ»ν•¨, μ†Œν˜• λͺ¨λΈ(tiny, base)μ—μ„œ κ°€μž₯ 효과적. 졜적 μš©λ„: 배터리 ꡬ동 λ…ΈνŠΈλΆμ—μ„œμ˜ Whisper tiny/base.

선택 κΈ°μ€€: 초기 μ„€μ • β†’ Metal(whisper.cpp). large-v3 μ΅œλŒ€ 속도 β†’ Metal(whisper.cpp). 배터리 ꡬ동 λ…ΈνŠΈλΆ, base λͺ¨λΈ β†’ ANE 포함 Core ML. ν”„λ‘œλ•μ…˜ μ„œλ²„ β†’ Metal(검증됨, μ•ˆμ •μ ). μ‹€μ‹œκ°„ 전사 β†’ 슀트리밍 λͺ¨λ“œμ˜ Metal. Mac μΈμŠ€ν„΄μŠ€ ν΄λΌμš°λ“œ 배포 β†’ Metal(μ»¨ν…Œμ΄λ„ˆν™” κ°€λŠ₯).

  • Metal(whisper.cpp): 더 빠름, κ΄‘λ²”μœ„ν•œ ν˜Έν™˜μ„±, κ°€μž₯ κ°„λ‹¨ν•œ μ„€μ •
  • Core ML: Neural Engine μ΅œμ ν™”, 일뢀 μ›Œν¬λ‘œλ“œμ—μ„œ 15–20% 속도 ν–₯상(λ³€ν™˜ ν•„μš”)
  • Apple Neural Engine: λŒ€ν˜• λͺ¨λΈμ—μ„œλŠ” 이점 μ œν•œμ , λ…ΈνŠΈλΆμ˜ tiny/base에 졜적

μ„€μ •: Metal 가속 whisper.cpp

  1. 1
    μ˜μ‘΄μ„± μ„€μΉ˜
    Why it matters: xcode-select --install (Xcode 도ꡬ) brew install ffmpeg (μ˜€λ””μ˜€ λ³€ν™˜)
  2. 2
    Metal 포함 whisper.cpp 볡제 및 λΉŒλ“œ
    Why it matters: git clone https://github.com/ggerganov/whisper.cpp cd whisper.cpp make WHISPER_METAL=1 ./main -h | grep -i metal
  3. 3
    λͺ¨λΈ λ‹€μš΄λ‘œλ“œ
    Why it matters: bash ./models/download-ggml-model.sh small (466 MB, μ‹€μ‹œκ°„) bash ./models/download-ggml-model.sh large-v3 (3 GB, 졜고 ν’ˆμ§ˆ) bash ./models/download-ggml-model.sh large-v3-turbo (1.6 GB, κ· ν˜•)
  4. 4
    μ˜€λ””μ˜€ 파일 전사
    Why it matters: ./main -m models/ggml-large-v3.bin -f /path/to/audio.wav ./main -m models/ggml-large-v3.bin -f audio.wav -oj (JSON) ./main -m models/ggml-large-v3.bin -f audio.wav -l en (μ–Έμ–΄ μ§€μ •)
  5. 5
    λΉ„WAV μ˜€λ””μ˜€ λ¨Όμ € λ³€ν™˜
    Why it matters: ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav ./main -m models/ggml-large-v3.bin -f output.wav

μ‹€μ‹œκ°„ 슀트리밍 전사(라이브 마이크)

λ§ˆμ΄ν¬μ—μ„œ μ‹€μ‹œκ°„ 전사 β€” μŒμ„± μ–΄μ‹œμŠ€ν„΄νŠΈ, 회의 전사, μ ‘κ·Όμ„± λ„κ΅¬μš©.

μ˜΅μ…˜ 1: whisper.cpp 슀트림 λͺ¨λ“œ

./stream -m models/ggml-small.bin --step 500 --length 5000

# --step 500: 500msλ§ˆλ‹€ 처리

# --length 5000: 졜근 5초 μ»¨ν…μŠ€νŠΈ μœ μ§€

μ˜΅μ…˜ 2: faster-whisperλ₯Ό μ‚¬μš©ν•œ Python(μ•„λž˜ μ½”λ“œ 블둝 μ°Έμ‘°)

M5 Proμ—μ„œμ˜ μ§€μ—°: small λͺ¨λΈ ~200ms, large-v3-turbo ~400–600ms, large-v3 ~800ms–1.2s μ‹€μ‹œκ°„ μ§€μ—°.

python
import sounddevice as sd
import numpy as np
from faster_whisper import WhisperModel

model = WhisperModel("large-v3-turbo", device="cpu", compute_type="int8")
buffer = []
chunk_duration = 3
sample_rate = 16000

def callback(indata, frames, time, status):
    buffer.append(indata.copy())
    if len(buffer) * 1024 / sample_rate >= chunk_duration:
        audio = np.concatenate(buffer).flatten().astype(np.float32)
        segments, _ = model.transcribe(audio, beam_size=5)
        for segment in segments:
            print(segment.text)
        buffer.clear()

with sd.InputStream(callback=callback, channels=1, samplerate=sample_rate):
    print("Listening... (Ctrl+C to stop)")
    while True:
        sd.sleep(1000)

μŒμ„± μ–΄μ‹œμŠ€ν„΄νŠΈ νŒŒμ΄ν”„λΌμΈ: Whisper + Ollama + Piper TTS

Apple Siliconμ—μ„œ μ™„μ „νžˆ 둜컬둜 μ‹€ν–‰λ˜λŠ” μŒμ„± μ–΄μ‹œμŠ€ν„΄νŠΈμ˜ 전체 μ½”λ“œμž…λ‹ˆλ‹€.

python
import sounddevice as sd
import numpy as np
import requests
import subprocess
from faster_whisper import WhisperModel

WHISPER_MODEL = "large-v3-turbo"
OLLAMA_URL = "http://localhost:11434/api/chat"
LLM_MODEL = "llama3.1:8b"
SAMPLE_RATE = 16000

whisper = WhisperModel(WHISPER_MODEL, device="cpu", compute_type="int8")

def record_audio(duration=5):
    print("Listening...")
    audio = sd.rec(int(duration * SAMPLE_RATE),
                   samplerate=SAMPLE_RATE,
                   channels=1,
                   dtype=np.float32)
    sd.wait()
    return audio.flatten()

def transcribe(audio):
    segments, _ = whisper.transcribe(audio, beam_size=5)
    return " ".join([seg.text for seg in segments])

def llm_respond(user_text):
    response = requests.post(OLLAMA_URL, json={
        "model": LLM_MODEL,
        "messages": [{"role": "user", "content": user_text}],
        "stream": False
    })
    return response.json()["message"]["content"]

def speak(text):
    subprocess.run(
        ["piper", "--model", "en_US-amy-medium.onnx"],
        input=text.encode(),
        check=True
    )

while True:
    audio = record_audio(duration=5)
    user_text = transcribe(audio)
    print(f"You: {user_text}")
    if not user_text.strip():
        continue
    response = llm_respond(user_text)
    print(f"AI: {response}")
    speak(response)

Mac λͺ¨λΈλ³„ 졜적 Whisper μ„€μ •

Mac κ΅¬μ„±κΆŒμž₯ λͺ¨λΈμ‹€μ‹œκ°„ λ°°μœ¨μ‚¬μš© 사둀
β€”β€”β€”β€”
β€”β€”β€”β€”
β€”β€”β€”β€”
β€”β€”β€”β€”
β€”β€”β€”β€”
β€”β€”β€”β€”
β€”β€”β€”β€”

μ‹€μ‹œκ°„ μŒμ„± μ–΄μ‹œμŠ€ν„΄νŠΈμš©: μ΅œμ € 지연을 μœ„ν•΄ small λ˜λŠ” large-v3-turboλ₯Ό μ‚¬μš©ν•˜μ‹­μ‹œμ˜€. 회의/팟캐슀트 μ „μ‚¬μš©: 졜고 정확도λ₯Ό μœ„ν•΄ large-v3λ₯Ό μ‚¬μš©ν•˜μ‹­μ‹œμ˜€(1–2초 μ§€μ—° ν—ˆμš© κ°€λŠ₯).

둜컬 Whisper vs ν΄λΌμš°λ“œ μŒμ„± 인식 μ„œλΉ„μŠ€

μ§€ν‘œWhisper 둜컬(M5 Pro)Google Speech-to-TextOpenAI Whisper APIAssemblyAI
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”
β€”β€”β€”β€”β€”

μ›” λΉ„μš©(ν•˜λ£¨ 8μ‹œκ°„): Whisper 둜컬 $0, Google $345, OpenAI $86, AssemblyAI $156. κ°œμΈμ •λ³΄μ— λ―Όκ°ν•œ μž‘μ—…(의료, 법λ₯ , μ €λ„λ¦¬μ¦˜)의 경우 둜컬 Whisperκ°€ μœ μΌν•œ μ„ νƒμž…λ‹ˆλ‹€. λŒ€λŸ‰ 전사(ν΄λΌμš°λ“œ μ›” $100 이상)의 경우 둜컬 Mac이 12κ°œμ›” 내에 λΉ„μš©μ„ νšŒμˆ˜ν•©λ‹ˆλ‹€.

WhisperλŠ” ν΄λΌμš°λ“œ API보닀 λΉ λ¦…λ‹ˆκΉŒ?

M5 Proμ—μ„œ 둜컬 μ‹€ν–‰: 10Γ— μ‹€μ‹œκ°„(μ§€μ—° 100ms). ν΄λΌμš°λ“œ API: λ„€νŠΈμ›Œν¬λ‘œ μΈν•œ 100–500ms μ§€μ—°. 둜컬이 더 λΉ λ₯΄κ³  λ¬΄λ£Œμž…λ‹ˆλ‹€.

WhisperλŠ” μ—¬λŸ¬ ν™”μžλ₯Ό μ²˜λ¦¬ν•  수 μžˆμŠ΅λ‹ˆκΉŒ?

예, νƒ€μž„μŠ€νƒ¬ν”„λ‘œ ν™”μžλ₯Ό λΆ„λ¦¬ν•©λ‹ˆλ‹€. ν™”μž 신원을 ν™•μΈν•˜λ €λ©΄ ν›„μ²˜λ¦¬ λ˜λŠ” ν™”μž 뢄리(diarization) 도ꡬλ₯Ό μ‚¬μš©ν•˜μ‹­μ‹œμ˜€.

μ–Έμ–΄ 지원은 μ–΄λ–»κ²Œ λ©λ‹ˆκΉŒ?

μžλ™ 감지λ₯Ό ν¬ν•¨ν•œ 99개 μ–Έμ–΄λ₯Ό μ§€μ›ν•©λ‹ˆλ‹€. μ–Έμ–΄λ§ˆλ‹€ 정확도가 λ‹€λ¦…λ‹ˆλ‹€ β€” μ˜μ–΄λŠ” 2.5% WER, 기타 μ–Έμ–΄λŠ” 5–15% WERμž…λ‹ˆλ‹€.

속도 λŒ€λΉ„ ν’ˆμ§ˆ λΉ„μœ¨μ΄ κ°€μž₯ 쒋은 Whisper λͺ¨λΈμ€ λ¬΄μ—‡μž…λ‹ˆκΉŒ?

Large-v3-turbo λ˜λŠ” distil-large-v3μž…λ‹ˆλ‹€. 두 λͺ¨λΈ λͺ¨λ‘ large-v3 μ •ν™•λ„μ˜ μ•½ 95%λ₯Ό 4–6Γ— μ†λ„λ‘œ λ‹¬μ„±ν•©λ‹ˆλ‹€. λŒ€λΆ€λΆ„μ˜ μ‹€μ‹œκ°„ μ‚¬μš© 사둀에 ꢌμž₯λ©λ‹ˆλ‹€.

WhisperλŠ” 얡양이 κ°•ν•œ μ˜μ–΄λ‚˜ 비원어민 ν™”μžλ₯Ό μ²˜λ¦¬ν•  수 μžˆμŠ΅λ‹ˆκΉŒ?

예, λ‹€λ§Œ WER이 μƒμŠΉν•©λ‹ˆλ‹€. μ˜μ–΄ 원어민: μ•½ 2.5%. κ°•ν•œ μ–΅μ–‘/비원어민: 5–12%. Large-v3λŠ” μ†Œν˜• λͺ¨λΈλ³΄λ‹€ 얡양을 더 잘 μ²˜λ¦¬ν•©λ‹ˆλ‹€.

WhisperλŠ” 팟캐슀트 및 μŒμ•… 전사에 μ ν•©ν•©λ‹ˆκΉŒ?

팟캐슀트: 예, μŒμ„± μ½˜ν…μΈ μ— νƒμ›”ν•©λ‹ˆλ‹€. 가사가 μžˆλŠ” μŒμ•…: 뢀적합 β€” WhisperλŠ” μŒμ„±μš©μœΌλ‘œ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€. μŒμ•…μ—λŠ” μ „λ¬Έ λͺ¨λΈμ„ μ‚¬μš©ν•˜μ‹­μ‹œμ˜€.

기술 μš©μ–΄μ— λŒ€ν•œ Whisper의 μ •ν™•λ„λŠ” μ–΄λ–»μŠ΅λ‹ˆκΉŒ?

κ°€λ³€μ μž…λ‹ˆλ‹€. 일반적인 기술 μš©μ–΄: μ–‘ν˜Έ. κ³ λ„λ‘œ μ „λ¬Έν™”λœ μš©μ–΄: 잘λͺ» 전사될 수 μžˆμŠ΅λ‹ˆλ‹€. 정확도λ₯Ό 높이렀면 --prompt ν”Œλž˜κ·Έμ— μ˜ˆμƒ μ–΄νœ˜λ₯Ό μ§€μ •ν•˜μ‹­μ‹œμ˜€.

ν•œ Macμ—μ„œ μ—¬λŸ¬ Whisper μΈμŠ€ν„΄μŠ€λ₯Ό μ‹€ν–‰ν•  수 μžˆμŠ΅λ‹ˆκΉŒ?

예, λ©”λͺ¨λ¦¬μ— 따라 μ œν•œλ©λ‹ˆλ‹€. M5 Pro 36GB: large-v3 μΈμŠ€ν„΄μŠ€ 2개 λ™μ‹œ μ‹€ν–‰ κ°€λŠ₯. M5 Max 128GB: 4–6개의 μΈμŠ€ν„΄μŠ€ λ˜λŠ” LLM/TTS와 ν•¨κ»˜ ν•˜λ‚˜μ˜ μΈμŠ€ν„΄μŠ€.

A Note on Third-Party Facts

This article references third-party AI models, benchmarks, prices, and licenses. The AI landscape changes rapidly. Benchmark scores, license terms, model names, and API prices can shift between the time of writing and the time you read this. Before making deployment or compliance decisions based on this article, verify current figures on each provider’s official source: Hugging Face model cards for licenses and benchmarks, provider websites for API pricing, and EUR-Lex for current GDPR and EU AI Act text. This article reflects publicly available information as of May 2026.

Run PromptQuorum with a local LLM, your own API keys, or both β€” you pick the backend.

Join the PromptQuorum Waitlist β†’

← Back to Local LLMs

Apple Silicon Whisper STT 2026: Metal 벀치마크 M1–M5