Where my ideas actually happen
I keep a running list of where I've had ideas I cared about in the last year. Almost none of them happened at my desk.
Walking the dog. Three minutes into a shower. Driving back from dropping my kid off. Standing in the kitchen waiting for water to boil. One especially good one came to me on a treadmill, which I resent.
The pattern is obvious once you write it down. My desk is where I execute. Somewhere else is where I think.
What I tried
For a while I tried to be disciplined about it. Pull out the phone, unlock it, find Notes, tap into a new note, type with my thumbs while walking. By the third sentence I'd lost the back half of the idea — the part that made it interesting, the connective tissue between the thing I started saying and the thing I was reaching toward.
Voice memos were better. But then I had eighty-seven of them named "New Recording 23" and I never went back.
I tried Siri. I tried Otter. I tried dictating into Notes. Every one of them had at least one of these problems: it sent my voice somewhere, it was slow, it stopped listening after a sentence, it didn't work offline, or it produced a transcript I had to clean up by hand before it was useful to anyone, including me.
The numbers I keep coming back to
People speak around 150 words per minute. Most people type closer to 40 or 50. That's a 3x throughput gap on the input side, and it widens further if you account for the fact that you can speak while walking and you cannot, in any meaningful sense, type while walking.
The interesting thing isn't the speed. It's that the slower interface has eaten the entire computing stack. Every productivity tool I use assumes I am sitting in a chair with my hands on a keyboard. The moments I'm most likely to have a thought worth keeping are the moments I am specifically not doing that.
Why now
Two things changed recently that didn't used to be true.
Transcription got good enough to run on the device in your pocket. Not "good enough for a demo" — good enough that I trust it with my actual thinking. No upload, no latency, no API key, no monthly bill, no surprise outage.
And small language models got good enough to take a messy transcript — the "uhh" and the false starts and the "wait, scratch that" — and pull out the actual structure. A 1.5B model running locally can turn a rambling four-minute voice memo into a clean note with the action items pulled out. I wrote about the bash dictation version of this — the same shape works for prose.
Neither of those was true five years ago. Both are true now, on hardware most people already own.
What this actually buys you
A capture loop where the latency between thought and saved-thought is roughly zero. You speak. It transcribes. Local model cleans it up and routes it. Nothing leaves the device unless you ask it to.
That's it. That's the whole pitch. Voice isn't replacing the keyboard for editing a spreadsheet or writing code. Voice is for the part of the day when there isn't a keyboard, which, if I'm honest, is most of the day where the good stuff happens.
The thing I didn't expect going in: once capture stops being friction, I started having more ideas, not just catching more of them. Apparently some of those ideas weren't showing up before because some part of me knew I wouldn't be able to hold onto them.