I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr

Most speech-to-text tools assume you want a GUI. A tray icon. A settings panel. I didn't. I wanted something that runs in the background, takes commands from a socket, and types out whatever I just said. No window. No Electron. Just a binary.

That's Voicr.

What It Actually Does

Voicr is a terminal-first STT tool written entirely in Rust. You run it as a daemon, send it commands over a Unix socket, and it transcribes your mic input. Or you skip the daemon entirely and do one-shot transcriptions:

voicr transcribe --auto-stop

That's it. Records until silence, prints the text, exits.

The daemon mode is where it gets useful. Start it once:

voicr daemon &

Then toggle recording from anywhere:

voicr send toggle   # start
voicr send toggle   # stop + transcribe

I have this bound to Super+Space in my hotkey daemon. Press once to start, press again to get the text. That's the whole workflow.

Why Rust

Partly because I wanted the challenge. Mostly because I needed a single binary I could drop on any Linux machine without worrying about runtimes or virtual environments.

The audio handling and ML inference are built on top of transcribe-rs. I handled the daemon protocol, socket layer, history storage (SQLite), and the CLI on top of it.

Model Options

Voicr supports several models depending on what you need:

Moonshine (31MB to 192MB) for fast, low-latency transcription
Parakeet V3 (478MB) is the recommended starting point, good accuracy, fast enough
SenseVoice (160MB) if you need multilingual support
GigaAM v3 (225MB) for better accuracy at a reasonable size
Whisper variants if you want the familiar option (requires Vulkan at build time)

You set your model once and forget it:

voicr model download parakeet-tdt-0.6b-v3
voicr config set model.selected parakeet-tdt-0.6b-v3

If you're on a machine with limited RAM, there's an unload_timeout setting that drops the model from memory after a few minutes of inactivity.

The Part I Use Every Day

The hotkey integration. With sxhkd on Linux:

super + space
    echo '{"cmd":"toggle"}' | nc -U /tmp/voicr.sock

Pair that with xdotool to auto-type the transcription wherever your cursor is:

nc -U /tmp/voicr.sock | while IFS= read -r line; do
  text=$(echo "$line" | jq -r 'select(.type=="transcription") | .text')
  [ -n "$text" ] && xdotool type --clearmodifiers "$text"
done

Press key, speak, text appears in whatever app is focused. I use this for quick notes, terminal commands I don't want to type, and the occasional long dictation.

Running It as a Service

For persistent use, the systemd unit is straightforward:

[Service]
ExecStart=/usr/local/bin/voicr daemon
Restart=on-failure

systemctl --user enable --now voicr

It loads the model at startup and stays ready.

Install

curl -L https://github.com/habitual69/voicr/releases/latest/download/voicr-linux-x86_64 -o voicr
chmod +x voicr && sudo mv voicr /usr/local/bin/

voicr model download parakeet-tdt-0.6b-v3
voicr transcribe

macOS and Windows binaries are also on the releases page. Daemon mode is Linux/macOS only (Unix socket).

Full source at github.com/habitual69/voicr.

That's Voicr.

What It Actually Does

voicr transcribe --auto-stop

That's it. Records until silence, prints the text, exits.

The daemon mode is where it gets useful. Start it once:

voicr daemon &

Then toggle recording from anywhere:

voicr send toggle   # start
voicr send toggle   # stop + transcribe

I have this bound to Super+Space in my hotkey daemon. Press once to start, press again to get the text. That's the whole workflow.

Why Rust

Partly because I wanted the challenge. Mostly because I needed a single binary I could drop on any Linux machine without worrying about runtimes or virtual environments.

The audio handling and ML inference are built on top of transcribe-rs. I handled the daemon protocol, socket layer, history storage (SQLite), and the CLI on top of it.

Model Options

Voicr supports several models depending on what you need:

Moonshine (31MB to 192MB) for fast, low-latency transcription
Parakeet V3 (478MB) is the recommended starting point, good accuracy, fast enough
SenseVoice (160MB) if you need multilingual support
GigaAM v3 (225MB) for better accuracy at a reasonable size
Whisper variants if you want the familiar option (requires Vulkan at build time)

You set your model once and forget it:

voicr model download parakeet-tdt-0.6b-v3
voicr config set model.selected parakeet-tdt-0.6b-v3

If you're on a machine with limited RAM, there's an unload_timeout setting that drops the model from memory after a few minutes of inactivity.

The Part I Use Every Day

The hotkey integration. With sxhkd on Linux:

super + space
    echo '{"cmd":"toggle"}' | nc -U /tmp/voicr.sock

Pair that with xdotool to auto-type the transcription wherever your cursor is:

nc -U /tmp/voicr.sock | while IFS= read -r line; do
  text=$(echo "$line" | jq -r 'select(.type=="transcription") | .text')
  [ -n "$text" ] && xdotool type --clearmodifiers "$text"
done

Press key, speak, text appears in whatever app is focused. I use this for quick notes, terminal commands I don't want to type, and the occasional long dictation.

Running It as a Service

For persistent use, the systemd unit is straightforward:

[Service]
ExecStart=/usr/local/bin/voicr daemon
Restart=on-failure

systemctl --user enable --now voicr

It loads the model at startup and stays ready.

Install

curl -L https://github.com/habitual69/voicr/releases/latest/download/voicr-linux-x86_64 -o voicr
chmod +x voicr && sudo mv voicr /usr/local/bin/

voicr model download parakeet-tdt-0.6b-v3
voicr transcribe

macOS and Windows binaries are also on the releases page. Daemon mode is Linux/macOS only (Unix socket).

Full source at github.com/habitual69/voicr.

I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr

What It Actually Does

Why Rust

Model Options

The Part I Use Every Day

Running It as a Service

Install

Arbind Singh

Comments

Leave a comment

Lovable Leaks Source Code: The $6.6B BOLA Vulnerability

How 84 Malicious TanStack Packages Hit npm in 6 Minutes

Google Released Gemma 4 for Free. Here Is Why That Makes Sense.

I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr

What It Actually Does

Why Rust

Model Options

The Part I Use Every Day

Running It as a Service

Install

Arbind Singh

Comments

Leave a comment

Lovable Leaks Source Code: The $6.6B BOLA Vulnerability

How 84 Malicious TanStack Packages Hit npm in 6 Minutes

Google Released Gemma 4 for Free. Here Is Why That Makes Sense.