ArbindBuilds LogoArbindBuilds
Blog
CheatsheetsProjectsLinksAbout
Hire Me

ArbindBuilds

Build. Design. Repeat.

© 2026 ArbindBuilds.
All rights reserved.

Site Map

  • Home
  • Blog
  • Projects
  • About
  • Uses

Content

  • Cheatsheets
  • AI Tools
  • AI Prompts
  • Links

Products

  • Speakify
  • Gumroad Store
  • GitHub
  • Twitter / X

Made with care in Assam, India.

  1. Home/
  2. Blog/
  3. I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr
development
Arbind Singh·March 27, 2026·3 min read·

I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr

Most speech-to-text tools assume you want a GUI. A tray icon, a settings panel, a splash screen. I didn't.

I Built a Headless Speech-to-Text Tool in Rust — Meet Voicr

Most speech-to-text tools assume you want a GUI. A tray icon. A settings panel. I didn't. I wanted something that runs in the background, takes commands from a socket, and types out whatever I just said. No window. No Electron. Just a binary.

That's Voicr.

What It Actually Does

Voicr is a terminal-first STT tool written entirely in Rust. You run it as a daemon, send it commands over a Unix socket, and it transcribes your mic input. Or you skip the daemon entirely and do one-shot transcriptions:

voicr transcribe --auto-stop

That's it. Records until silence, prints the text, exits.

The daemon mode is where it gets useful. Start it once:

voicr daemon &

Then toggle recording from anywhere:

voicr send toggle   # start
voicr send toggle   # stop + transcribe

I have this bound to Super+Space in my hotkey daemon. Press once to start, press again to get the text. That's the whole workflow.

Why Rust

Partly because I wanted the challenge. Mostly because I needed a single binary I could drop on any Linux machine without worrying about runtimes or virtual environments.

The audio handling and ML inference are built on top of transcribe-rs. I handled the daemon protocol, socket layer, history storage (SQLite), and the CLI on top of it.

Model Options

Voicr supports several models depending on what you need:

  • Moonshine (31MB to 192MB) for fast, low-latency transcription
  • Parakeet V3 (478MB) is the recommended starting point, good accuracy, fast enough
  • SenseVoice (160MB) if you need multilingual support
  • GigaAM v3 (225MB) for better accuracy at a reasonable size
  • Whisper variants if you want the familiar option (requires Vulkan at build time)

You set your model once and forget it:

voicr model download parakeet-tdt-0.6b-v3
voicr config set model.selected parakeet-tdt-0.6b-v3

If you're on a machine with limited RAM, there's an unload_timeout setting that drops the model from memory after a few minutes of inactivity.

The Part I Use Every Day

The hotkey integration. With sxhkd on Linux:

super + space
    echo '{"cmd":"toggle"}' | nc -U /tmp/voicr.sock

Pair that with xdotool to auto-type the transcription wherever your cursor is:

nc -U /tmp/voicr.sock | while IFS= read -r line; do
  text=$(echo "$line" | jq -r 'select(.type=="transcription") | .text')
  [ -n "$text" ] && xdotool type --clearmodifiers "$text"
done

Press key, speak, text appears in whatever app is focused. I use this for quick notes, terminal commands I don't want to type, and the occasional long dictation.

Running It as a Service

For persistent use, the systemd unit is straightforward:

[Service]
ExecStart=/usr/local/bin/voicr daemon
Restart=on-failure
systemctl --user enable --now voicr

It loads the model at startup and stays ready.

Install

curl -L https://github.com/habitual69/voicr/releases/latest/download/voicr-linux-x86_64 -o voicr
chmod +x voicr && sudo mv voicr /usr/local/bin/

voicr model download parakeet-tdt-0.6b-v3
voicr transcribe

macOS and Windows binaries are also on the releases page. Daemon mode is Linux/macOS only (Unix socket).

Full source at github.com/habitual69/voicr.

Arbind Singh

Arbind Singh

ArbindBuilds is my digital space where I showcase my projects, share insightful blogs, and document my work and ideas.

Comments

Leave a comment

0/500 characters

READ NEXT

Lovable Leaks Source Code: The $6.6B BOLA Vulnerability

An 8 million user platform ignored a critical BOLA vulnerability for 48 days. How a $6.6B AI app builder leaked source code, credentials, and user data.

Read →

How 84 Malicious TanStack Packages Hit npm in 6 Minutes

On May 11, 2026, an attacker published 84 malicious versions across 42 @tanstack/* packages in under 6 minutes. Not a typo. Here is the exact chain that made it possible. 42 @tanstack packages compromised via GitHub Actions cache poisoning and OIDC token extraction

Read →

Google Released Gemma 4 for Free. Here Is Why That Makes Sense.

Gemma 4 dropped April 2, 2026 under Apache 2.0 with full commercial rights. This is what the architecture actually does and what Google is really after.

Read →

Tagged

saasnextjsaibuildinpublicvoicrTTSelevenlabs
← Back to Blog