Lifecycle
Understanding the journey from voice to action. Every recording flows through distinct phases, each with natural extension points where you can plug in custom logic.
Overview
Talkie processes voice in two main flows: Dictation (real-time, handled by TalkieAgent) and Memos (deliberate recordings, handled by the main app). Both share similar phases but differ in timing and intent.
Dictation
Press hotkey, speak, release. Text appears where your cursor is. Fast, in-flow, ephemeral.
~500ms to paste
Memo
Deliberate recording that becomes a searchable note. Triggers workflows for processing, summarizing, extracting.
Permanent, indexed
Dictation Lifecycle
The dictation flow is optimized for speed. From hotkey press to text appearing, every millisecond counts. Here's the complete journey:
Capture
~50ms setupAudio flows from microphone to temporary file. Context is captured to know where you were when you started speaking.
Inspect or modify the capture context. Could auto-route based on which app is active.
When: After context captured, before audio starts
Transcription
~300-800msAudio is sent to TalkieEngine for local Whisper transcription. The audio file is saved permanently first—your recording is never lost.
Transform or validate the transcript. Apply custom corrections, filter content, or route differently based on what was said.
When: After transcription, before routing
Routing
~50msThe transcript reaches its destination—pasted into the active app, copied to clipboard, or routed to the scratchpad for editing.
Intercept before delivery. Could trigger different behavior based on keywords, app context, or custom rules.
When: After routing decision, before text delivery
Storage
~10msThe dictation is saved to the local database with full metadata. Available for search, review, and later processing.
React to completed dictations. Could trigger follow-up actions, sync to external services, or update statistics.
When: After database write, before state reset
Memo Lifecycle
Memos are deliberate recordings that persist and get processed. Unlike dictation, memos trigger workflows that can summarize, extract tasks, or integrate with other systems.
Creation
When you create a memo (via the main Talkie app), the recording follows a similar path but ends differently:
Workflow Execution
Workflows are multi-step pipelines that process memo content. Each step can transform, extract, or route the content.
Step types include LLM prompts (transform via AI), file actions (save to disk), clipboard (copy result), and webhooks (POST to external URL).
Extension Points
These are the natural seams in the lifecycle where custom logic could be injected. They represent moments where the flow pauses, a decision is made, or data transforms.
| Hook | Phase | Use Cases |
|---|---|---|
| onCaptureStart | Capture | Auto-route based on app, disable in certain contexts |
| onTranscriptionComplete | Transcription | Custom corrections, keyword detection, content filtering |
| beforeRoute | Routing | Intercept commands ("hey talkie"), transform output |
| onDictationStored | Storage | Sync to external service, trigger notifications |
| onMemoCreated | Memo | Auto-categorize, trigger custom workflows |
| beforeWorkflowStep | Workflow | Inject data, modify prompts, skip steps conditionally |
These extension points aren't implemented yet—this is documenting where they could exist. The lifecycle naturally pauses at these moments, making them ideal for hooks.