/flowLive

Flow

Hold a hotkey, speak, release. The text lands in whatever app you're in. Entirely offline, $0/mo.

Role

Creator · Lead engineer

Year

2026

Status

Live

View live ↗

github.com

Local-first · macOS

whisper.cpp · Metal · Groq

Hold · record

Dictating

Whisper · transcribe

LLM · format

At a glance

Tauri 2 (Rust)React 19whisper.cpp (Metal)GroqOllamaSQLite

Cost: $0
Engines: Local + cloud
Runs offline: Yes
Latency: < 1s warm

01 — Overview

Overview

Good dictation lives behind a subscription. Wispr Flow is $15/mo, and every word you speak goes through someone else's cloud. Flow replicates that experience entirely on the machine: hold a global hotkey, speak, release, and formatted text lands in whatever app has focus.

It's built by composing open-source inference offline. whisper.cpp on Metal handles transcription, a local Ollama pass cleans the text, and the result is injected without any per-app integration. The whole thing is free and works with no network.

“A $15/mo cloud product, replicated entirely offline by composing open-source inference on the machine.”

02 — Architecture

Architecture

Steps 01–03 run while the hotkey is held: capture audio and transcribe it. The moment the key is released, steps 04–06 fire: an LLM formatting pass, clipboard injection into the focused app, then restoring the prior clipboard.

while holding

01 — Captureglobal hotkey held → cpal audio capture

02 — Transcribewhisper.cpp (Metal, cached) or Groq cloud

03 — Releasehotkey released → pipeline kicks off

release

format + inject

04 — Format passOllama local or Groq: strip filler · punctuate · expand

05 — Injectclipboard set → synthesized ⌘V into focused app

06 — Restoreprevious clipboard restored · history → SQLite

Holding the hotkey captures and transcribes. Releasing it triggers the format pass and the injection back into your app.

03 — Key features

Key features

/01
Clipboard injection, not the Accessibility API
Text lands via clipboard plus a synthesized ⌘V, and the previous clipboard is restored after. It's the only approach that works across native apps, Electron, web views and terminals without wiring per-app accessibility hooks.
/02
Warm Whisper, sub-second after the first run
A WhisperCache holds the Metal-initialized model between dictations. Only the first dictation after launch pays the model-load cost; every call after that returns in under a second on M-series.
/03
Hot path never touches SQLite
Dictionary, snippets and style presets are denormalized into an in-memory PipelineConfig that rebuilds on each DB mutation. The dictation path reads the snapshot, so a write to the management UI never slows a dictation.
/04
Prompt regression suite from source
A test suite extracts the system prompt straight out of the Rust source and asserts against it, so a casual edit to the formatting prompt can't silently drift the model's behavior.

04 — Technical decisions

Technical decisions

05 — What I'd do differently

What I'd do differently

/01
Move API keys into the macOS Keychain. Cloud engine keys currently sit in plain config; Keychain-backed storage is the right home for a credential on a desktop app, and it's a small migration done early.
/02
Switch to streaming transcription instead of record-then-transcribe. Today the audio is captured fully, then transcribed. Streaming the audio into Whisper as it's spoken would cut the felt latency on longer dictations close to zero.