How to Add a Custom Jarvis Voice to Home Assistant with Piper TTS
Set up Piper TTS with a custom Jarvis voice model in Home Assistant. Covers model installation, configuration, multi-room speaker setup, and 9 ready-to-use voice automations.
How to Add a Custom Jarvis Voice to Home Assistant with Piper TTS
Cloud TTS services cost money every month and depend on someone else's servers. Piper TTS runs locally, is free, and with the right voice model, sounds better than most cloud options. This guide sets up a custom Jarvis-style voice on Home Assistant with no cloud dependency.
This is running in production on my system with 7 distributed speakers across the house, triggered by ELK M1 alarm events, garage doors, and presence detection.
What is Piper TTS?
Piper is a local text-to-speech engine built for Home Assistant. It runs on the same hardware as your HA instance (a Pi 4/5 handles it fine) and supports custom ONNX voice models. There are dozens of built-in voices, but you can also load community models — including voices that sound like Jarvis from the MCU.
Key advantages over cloud TTS:
Step 1: Install the Piper Add-on
If you are running Home Assistant OS or Supervised:
1. Go to Settings > Add-ons > Add-on Store
2. Search for "Piper"
3. Install the Piper add-on
4. Start it
Piper runs as a local service on port 10200. Home Assistant discovers it automatically.
If you are running HA Container or Core, you can run Piper as a standalone Docker container:
docker run -d --name piper \
-p 10200:10200 \
-v /path/to/models:/share/piper \
rhasspy/wyoming-piper
Step 2: Download the Jarvis Voice Model
The custom Jarvis voice model is a community-trained ONNX model. The one I use is from the `jgkawell/jarvis` repository on HuggingFace.
Download two files:
Place both files in your Piper models directory:
After copying the files, restart the Piper add-on. The new voice will be available in the Piper configuration.
Step 3: Configure the TTS Entity
Go to Settings > Devices & Services. You should see Piper listed. Click Configure and set:
The TTS entity will be `tts.piper`. Test it immediately:
service: tts.speak
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Good evening, sir. All systems operational."
options:
voice: jarvis-high
If you hear the voice through your speaker, you are good. If not, check that your media player entity supports TTS playback (HomePods, Chromecast, and Sonos all work; some TV media players do not).
Step 4: Multi-Room Audio with Snapcast (Optional)
If you want synchronized voice announcements across multiple rooms, you need a multi-room audio system. The simplest approach for Home Assistant is Snapcast.
Architecture:
For a wired speaker system (like ELK SP12F speakers distributed through the house), you can use a single amplifier with distributed audio. My system uses 7 speakers wired back to a central amp location.
The Piper TTS output routes to the Snapcast server, which pushes audio to all clients simultaneously. Latency is under 20ms.
Step 5: Voice Automations
Here are 9 automations running in production. All live in a single Home Assistant package file.
Welcome Home
automation:
trigger:
entity_id: person.baily_aldea
to: "home"
condition:
entity_id: alarm_control_panel.elkm1_area_1
state: "armed_away"
action:
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Welcome home, sir. Disarming the security system."
options:
voice: jarvis-high
target:
entity_id: alarm_control_panel.elkm1_area_1
data:
code: "1234"
System Armed
automation:
trigger:
entity_id: alarm_control_panel.elkm1_area_1
to: "armed_away"
action:
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Security system armed. All perimeter zones secured."
options:
voice: jarvis-high
Alarm Triggered
automation:
trigger:
entity_id: alarm_control_panel.elkm1_area_1
to: "triggered"
action:
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Warning. Security breach detected. Alarm has been triggered."
options:
voice: jarvis-high
Garage Door
automation:
trigger:
entity_id: cover.garage_bay_1
to: "open"
action:
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Garage bay one is now open."
options:
voice: jarvis-high
Good Night
automation:
trigger:
entity_id: alarm_control_panel.elkm1_area_1
to: "armed_home"
condition:
after: "21:00:00"
before: "05:00:00"
action:
target:
entity_id: tts.piper
data:
media_player_entity_id: media_player.b_room
message: "Good night, sir. System armed in stay mode. All perimeter zones active."
options:
voice: jarvis-high
The full set includes 9 automations: welcome home, system armed (away and stay), alarm triggered, person arrived, garage open/close, good night, and disarm confirmation.
Comparison: Piper vs Cloud TTS
| Feature | Piper TTS | ElevenLabs | Google Cloud TTS |
|---------|-----------|------------|-----------------|
| Monthly cost | $0 | $5-22 | Pay per character |
| Latency | < 1s local | 1-3s network | 1-2s network |
| Internet required | No | Yes | Yes |
| Custom voices | ONNX models | Voice cloning | WaveNet presets |
| Quality (subjective) | Good-Great | Excellent | Great |
| Rate limits | None | 10k-100k chars/mo | Varies |
Piper is not quite ElevenLabs quality, but it is close enough that the latency and cost advantages make it the clear winner for home automation. You do not need movie-quality voice synthesis to announce "garage door open."
Troubleshooting
**No audio output**: Check that the target media player supports TTS. Some media players (especially TV-based ones) reject TTS audio. Use a dedicated speaker entity.
**Voice sounds robotic**: Make sure you are using the `-high` quality model, not the medium or low variants. The high model is larger but significantly better.
**Long pauses before speech**: On a Pi 4, the first TTS call after a restart takes 2-3 seconds while the model loads into memory. Subsequent calls are under a second. On a Pi 5, even the first call is fast.
**Model not found**: Verify the `.onnx` and `.onnx.json` files are both in `/share/piper/` and the filenames match exactly. Restart the Piper add-on after adding new models.
Get the Full Voice Pack
The [Jarvis Voice Pack](https://beslain.gumroad.com/l/ha-jarvis-voice-pack) includes the pre-configured voice model, all 9 automations as a ready-to-install HA package, multi-room Snapcast setup guide, and the message templates I use in production.
Related guides:
Enjoyed this guide?
Get more like it delivered weekly. Real configs, tested YAML, zero fluff.
Join 0+ smart home builders. No spam, unsubscribe anytime.