jgvilchezc.dev
← Back to work
/flowLive

Flow

Hold a hotkey, speak, release. The text lands in whatever app you're in. Entirely offline, $0/mo.

Role
Creator · Lead engineer
Year
2026
Status
Live
View live ↗
Local-first · macOS
Hold · record
Dictating
Whisper · transcribe
LLM · format
José Gabriel Vilchez Carrasquero
At a glance
Cost
$0
Engines
Local + cloud
Runs offline
Yes
Latency
< 1s warm
01Overview

Overview

Good dictation lives behind a subscription. Wispr Flow is $15/mo, and every word you speak goes through someone else's cloud. Flow replicates that experience entirely on the machine: hold a global hotkey, speak, release, and formatted text lands in whatever app has focus.

It's built by composing open-source inference offline. whisper.cpp on Metal handles transcription, a local Ollama pass cleans the text, and the result is injected without any per-app integration. The whole thing is free and works with no network.

A $15/mo cloud product, replicated entirely offline by composing open-source inference on the machine.

02Architecture

Architecture

Steps 01–03 run while the hotkey is held: capture audio and transcribe it. The moment the key is released, steps 04–06 fire: an LLM formatting pass, clipboard injection into the focused app, then restoring the prior clipboard.

while holding
01Captureglobal hotkey held → cpal audio capture
02Transcribewhisper.cpp (Metal, cached) or Groq cloud
03Releasehotkey released → pipeline kicks off
release
format + inject
04Format passOllama local or Groq: strip filler · punctuate · expand
05Injectclipboard set → synthesized ⌘V into focused app
06Restoreprevious clipboard restored · history → SQLite

Steps 01–03 run while the hotkey is held: capture audio and transcribe it. The moment the key is released, steps 04–06 fire: an LLM formatting pass, clipboard injection into the focused app, then restoring the prior clipboard.

03Key features

Key features

  • /01

    Clipboard injection, not the Accessibility API

    Text lands via clipboard plus a synthesized ⌘V, and the previous clipboard is restored after. It's the only approach that works across native apps, Electron, web views and terminals without wiring per-app accessibility hooks.

  • /02

    Warm Whisper, sub-second after the first run

    A WhisperCache holds the Metal-initialized model between dictations. Only the first dictation after launch pays the model-load cost; every call after that returns in under a second on M-series.

  • /03

    Hot path never touches SQLite

    Dictionary, snippets and style presets are denormalized into an in-memory PipelineConfig that rebuilds on each DB mutation. The dictation path reads the snapshot, so a write to the management UI never slows a dictation.

  • /04

    Prompt regression suite from source

    A test suite extracts the system prompt straight out of the Rust source and asserts against it, so a casual edit to the formatting prompt can't silently drift the model's behavior.

04Technical decisions

Technical decisions

05What I'd do differently

What I'd do differently

  • /01

    Move API keys into the macOS Keychain. Cloud engine keys currently sit in plain config; Keychain-backed storage is the right home for a credential on a desktop app, and it's a small migration done early.

  • /02

    Switch to streaming transcription instead of record-then-transcribe. Today the audio is captured fully, then transcribed. Streaming the audio into Whisper as it's spoken would cut the felt latency on longer dictations close to zero.

Flow concept key visual: a glowing coral voice waveform on the left resolving into clean, structured lines of formatted text on the right — the metaphor for raw speech becoming written text.
Flow concept: a glowing coral listening orb pulsing over a blurred macOS desktop — Flow injects formatted text into whatever app is focused, anywhere on the system.
Flow's Home screen: a dictation feed where each raw utterance is rewritten into clean, formatted text, with word-count, words-per-minute and day-streak stats above.
Flow's Insights screen: a words-per-minute gauge, totals for words dictated and words corrected, current and longest streaks, and a per-app usage breakdown.
Flow's Settings screen: the speech-to-text engine toggle between local whisper.cpp (offline) and Groq cloud, downloadable Whisper models, and the smart-formatting LLM selector.