Key Takeaways
- Whisper is an open speech-to-text model that runs locally β no audio leaves your hardware
- Use the Whisper (faster-whisper) add-on; it connects to Assist over Wyoming
- Model sizes range tiny β base β small β medium β large; bigger is more accurate, slower
- On CPU-only hardware, prefer tiny/base/small; a GPU makes medium/large practical
- Whisper is multilingual, so non-English commands transcribe without a cloud service
- Tune accuracy with a better microphone and the right model before going larger
Why Use Local Speech-to-Text?
Local speech-to-text keeps your voice recordings on your own hardware, so no audio is uploaded to a third party. It also works offline and has no per-request cost.
- Privacy: cloud assistants transmit and may retain recordings; local Whisper does not β see smart home privacy risks.
- Offline: transcription works during internet outages.
- No fees: there is no usage charge for local transcription.
Which Whisper Model Size Should You Use?
Pick the smallest Whisper model that gives acceptable accuracy on your hardware β tiny/base/small for CPU-only, medium/large when you have a GPU. Larger models improve accuracy on accents and noisy audio at the cost of speed.
- Use small as the default on a mini PC CPU; move to medium/large only if accuracy is lacking.
- Use tiny/base on a Raspberry Pi to keep latency usable.
| Model | Relative accuracy | Relative speed | Best for |
|---|---|---|---|
| tiny | Lowest | Fastest | Low-power CPU, short commands |
| base | Low | Very fast | Raspberry Pi, simple phrases |
| small | Good | Fast | Mini PC CPU, everyday use |
| medium | High | Moderate | GPU or strong CPU |
| large | Highest | Slowest | GPU, accents/noisy rooms |
Wyoming Setup
The Whisper add-on exposes a Wyoming endpoint that Assist uses for speech-to-text. Setup is install β pick model β select in pipeline.
- 1Install the Whisper (faster-whisper) add-on from the add-on store.
- 2Set the model size in the add-on configuration and start it.
- 3The add-on registers as a Wyoming speech-to-text service automatically.
- 4In Settings β Voice assistants, set Whisper as the STT engine for your Assist pipeline.
- 5Test transcription from the Assist debug tools before adding voice hardware.
Hardware Needs
Whisper runs on CPU for small models and benefits from a GPU for medium/large models. Match model size to the box that hosts it.
- Raspberry Pi: stick to tiny/base for acceptable latency.
- Mini PC (CPU): small works well; medium is possible but slower β see best hardware for a local smart home.
- With a GPU/NPU: medium and large become practical for high accuracy.
- You can run Whisper on a separate, more powerful machine via Wyoming if your hub is a Pi.
Tuning Accuracy
Improve a good microphone and the right model before reaching for the largest Whisper. Audio quality often matters more than model size for home commands.
- Use a quality microphone or voice-satellite hardware close to the speaker.
- Reduce background noise where the microphone sits.
- Set the correct language in the add-on to avoid mis-transcription.
- Step up one model size at a time and re-test rather than jumping to large.
FAQ
Which Whisper model should I use for Home Assistant?
Use small as the default on a mini PC CPU, tiny or base on a Raspberry Pi, and medium or large only if you have a GPU and need higher accuracy on accents or noisy rooms. Step up one size at a time and re-test.
Do I need a GPU for local Whisper?
No for small and below β those run on CPU. A GPU mainly makes medium and large models fast enough for real-time use. You can also offload Whisper to a more powerful machine over the Wyoming protocol.
How accurate is local Whisper offline?
Accuracy is strong with the right model and a good microphone; larger models handle accents and noise better. For clear home commands, the small model on a mini PC is usually accurate enough, and it runs fully offline.
Is local Whisper multilingual?
Yes. Whisper supports many languages, so non-English commands transcribe locally without any cloud service. Set the language in the add-on configuration for best results.