Whisper + Home Assistant 2026: Local STT Guide

Local Whisper gives Home Assistant private speech-to-text with no cloud: you pick a Whisper model size for your accuracy, speed, and hardware trade-off, then connect it to Assist over the Wyoming protocol. This guide covers why local STT matters, the Whisper model sizes, Wyoming setup, hardware needs, and how to tune accuracy.

Key Takeaways

Whisper is an open speech-to-text model that runs locally — no audio leaves your hardware
Use the Whisper (faster-whisper) add-on; it connects to Assist over Wyoming
Model sizes range tiny → base → small → medium → large; bigger is more accurate, slower
On CPU-only hardware, prefer tiny/base/small; a GPU makes medium/large practical
Whisper is multilingual, so non-English commands transcribe without a cloud service
Tune accuracy with a better microphone and the right model before going larger

Why Use Local Speech-to-Text?

Local speech-to-text keeps your voice recordings on your own hardware, so no audio is uploaded to a third party. It also works offline and has no per-request cost.

Privacy: cloud assistants transmit and may retain recordings; local Whisper does not — see smart home privacy risks.
Offline: transcription works during internet outages.
No fees: there is no usage charge for local transcription.

Which Whisper Model Size Should You Use?

Pick the smallest Whisper model that gives acceptable accuracy on your hardware — tiny/base/small for CPU-only, medium/large when you have a GPU. Larger models improve accuracy on accents and noisy audio at the cost of speed.

Use small as the default on a mini PC CPU; move to medium/large only if accuracy is lacking.
Use tiny/base on a Raspberry Pi to keep latency usable.

Model	Relative accuracy	Relative speed	Best for
tiny	Lowest	Fastest	Low-power CPU, short commands
base	Low	Very fast	Raspberry Pi, simple phrases
small	Good	Fast	Mini PC CPU, everyday use
medium	High	Moderate	GPU or strong CPU
large	Highest	Slowest	GPU, accents/noisy rooms

Wyoming Setup

The Whisper add-on exposes a Wyoming endpoint that Assist uses for speech-to-text. Setup is install → pick model → select in pipeline.

1
Install the Whisper (faster-whisper) add-on from the add-on store.
2
Set the model size in the add-on configuration and start it.
3
The add-on registers as a Wyoming speech-to-text service automatically.
4
In Settings → Voice assistants, set Whisper as the STT engine for your Assist pipeline.
5
Test transcription from the Assist debug tools before adding voice hardware.

Hardware Needs

Whisper runs on CPU for small models and benefits from a GPU for medium/large models. Match model size to the box that hosts it.

Raspberry Pi: stick to tiny/base for acceptable latency.
Mini PC (CPU): small works well; medium is possible but slower — see best hardware for a local smart home.
With a GPU/NPU: medium and large become practical for high accuracy.
You can run Whisper on a separate, more powerful machine via Wyoming if your hub is a Pi.

Tuning Accuracy

Improve a good microphone and the right model before reaching for the largest Whisper. Audio quality often matters more than model size for home commands.

Use a quality microphone or voice-satellite hardware close to the speaker.
Reduce background noise where the microphone sits.
Set the correct language in the add-on to avoid mis-transcription.
Step up one model size at a time and re-test rather than jumping to large.

Frequently Asked Questions

Which Whisper model should I use for Home Assistant?

Use small as the default on a mini PC CPU, tiny or base on a Raspberry Pi, and medium or large only if you have a GPU and need higher accuracy on accents or noisy rooms. Step up one size at a time and re-test.

Do I need a GPU for local Whisper?

No for small and below — those run on CPU. A GPU mainly makes medium and large models fast enough for real-time use. You can also offload Whisper to a more powerful machine over the Wyoming protocol.

How accurate is local Whisper offline?

Accuracy is strong with the right model and a good microphone; larger models handle accents and noise better. For clear home commands, the small model on a mini PC is usually accurate enough, and it runs fully offline.

Is local Whisper multilingual?

Yes. Whisper supports many languages, so non-English commands transcribe locally without any cloud service. Set the language in the add-on configuration for best results.

Local Speech-to-Text for Smart Homes: Whisper + HA (2026)

How do I add local speech-to-text to Home Assistant?

Why Use Local Speech-to-Text?

Which Whisper Model Size Should You Use?

Wyoming Setup

Hardware Needs

Tuning Accuracy

Frequently Asked Questions

Which Whisper model should I use for Home Assistant?

Do I need a GPU for local Whisper?

How accurate is local Whisper offline?

Is local Whisper multilingual?

Local Speech-to-Text for Smart Homes: Whisper + HA (2026)

How do I add local speech-to-text to Home Assistant?

Why Use Local Speech-to-Text?

Which Whisper Model Size Should You Use?

Wyoming Setup

Hardware Needs

Tuning Accuracy

Frequently Asked Questions

Which Whisper model should I use for Home Assistant?

Do I need a GPU for local Whisper?

How accurate is local Whisper offline?

Is local Whisper multilingual?

Related Reading