Skip to main content
PromptQuorumPromptQuorum
Home/Smart Home/Local Speech-to-Text for Smart Homes: Whisper + HA (2026)
Local AI & LLMs in the Smart Home

Local Speech-to-Text for Smart Homes: Whisper + HA (2026)

Β·8 min readΒ·By Hans Kuepper Β· Founder of PromptQuorum, multi-model AI dispatch tool Β· PromptQuorum

Local Whisper gives Home Assistant private speech-to-text: install the Whisper add-on, pick a model size that fits your hardware, and connect it to Assist over the Wyoming protocol. Smaller models are faster; larger models are more accurate. Nothing is sent to a cloud service.

Local Whisper gives Home Assistant private speech-to-text with no cloud: you pick a Whisper model size for your accuracy, speed, and hardware trade-off, then connect it to Assist over the Wyoming protocol. This guide covers why local STT matters, the Whisper model sizes, Wyoming setup, hardware needs, and how to tune accuracy.

Key Takeaways

  • Whisper is an open speech-to-text model that runs locally β€” no audio leaves your hardware
  • Use the Whisper (faster-whisper) add-on; it connects to Assist over Wyoming
  • Model sizes range tiny β†’ base β†’ small β†’ medium β†’ large; bigger is more accurate, slower
  • On CPU-only hardware, prefer tiny/base/small; a GPU makes medium/large practical
  • Whisper is multilingual, so non-English commands transcribe without a cloud service
  • Tune accuracy with a better microphone and the right model before going larger

Why Use Local Speech-to-Text?

Local speech-to-text keeps your voice recordings on your own hardware, so no audio is uploaded to a third party. It also works offline and has no per-request cost.

  • Privacy: cloud assistants transmit and may retain recordings; local Whisper does not β€” see smart home privacy risks.
  • Offline: transcription works during internet outages.
  • No fees: there is no usage charge for local transcription.

Which Whisper Model Size Should You Use?

Pick the smallest Whisper model that gives acceptable accuracy on your hardware β€” tiny/base/small for CPU-only, medium/large when you have a GPU. Larger models improve accuracy on accents and noisy audio at the cost of speed.

  • Use small as the default on a mini PC CPU; move to medium/large only if accuracy is lacking.
  • Use tiny/base on a Raspberry Pi to keep latency usable.
ModelRelative accuracyRelative speedBest for
tinyLowestFastestLow-power CPU, short commands
baseLowVery fastRaspberry Pi, simple phrases
smallGoodFastMini PC CPU, everyday use
mediumHighModerateGPU or strong CPU
largeHighestSlowestGPU, accents/noisy rooms

Wyoming Setup

The Whisper add-on exposes a Wyoming endpoint that Assist uses for speech-to-text. Setup is install β†’ pick model β†’ select in pipeline.

  1. 1
    Install the Whisper (faster-whisper) add-on from the add-on store.
  2. 2
    Set the model size in the add-on configuration and start it.
  3. 3
    The add-on registers as a Wyoming speech-to-text service automatically.
  4. 4
    In Settings β†’ Voice assistants, set Whisper as the STT engine for your Assist pipeline.
  5. 5
    Test transcription from the Assist debug tools before adding voice hardware.

Hardware Needs

Whisper runs on CPU for small models and benefits from a GPU for medium/large models. Match model size to the box that hosts it.

  • Raspberry Pi: stick to tiny/base for acceptable latency.
  • Mini PC (CPU): small works well; medium is possible but slower β€” see best hardware for a local smart home.
  • With a GPU/NPU: medium and large become practical for high accuracy.
  • You can run Whisper on a separate, more powerful machine via Wyoming if your hub is a Pi.

Tuning Accuracy

Improve a good microphone and the right model before reaching for the largest Whisper. Audio quality often matters more than model size for home commands.

  • Use a quality microphone or voice-satellite hardware close to the speaker.
  • Reduce background noise where the microphone sits.
  • Set the correct language in the add-on to avoid mis-transcription.
  • Step up one model size at a time and re-test rather than jumping to large.

FAQ

Which Whisper model should I use for Home Assistant?

Use small as the default on a mini PC CPU, tiny or base on a Raspberry Pi, and medium or large only if you have a GPU and need higher accuracy on accents or noisy rooms. Step up one size at a time and re-test.

Do I need a GPU for local Whisper?

No for small and below β€” those run on CPU. A GPU mainly makes medium and large models fast enough for real-time use. You can also offload Whisper to a more powerful machine over the Wyoming protocol.

How accurate is local Whisper offline?

Accuracy is strong with the right model and a good microphone; larger models handle accents and noise better. For clear home commands, the small model on a mini PC is usually accurate enough, and it runs fully offline.

Is local Whisper multilingual?

Yes. Whisper supports many languages, so non-English commands transcribe locally without any cloud service. Set the language in the add-on configuration for best results.

← Back to Smart Home