Deep Dive

Lifecycle

Understanding the journey from voice to action. Every recording flows through distinct phases, each with natural extension points where you can plug in custom logic.

Overview

Talkie processes voice in two main flows: Dictation (real-time, handled by TalkieAgent) and Memos (deliberate recordings, handled by the main app). Both share similar phases but differ in timing and intent.

Dictation

Press hotkey, speak, release. Text appears where your cursor is. Fast, in-flow, ephemeral.

~500ms to paste

Memo

Deliberate recording that becomes a searchable note. Triggers workflows for processing, summarizing, extracting.

Permanent, indexed

Dictation Lifecycle

The dictation flow is optimized for speed. From hotkey press to text appearing, every millisecond counts. Here's the complete journey:

Capture

~50ms setup

Audio flows from microphone to temporary file. Context is captured to know where you were when you started speaking.

1
Hotkey detected
Carbon event handler fires immediately
2
Context capturedHook
Which app, window, selected text
3
Audio capture starts
TalkieAgent begins recording via AudioCapture
4
State broadcast
XPC notifies main app; UI updates
onCaptureStart

Inspect or modify the capture context. Could auto-route based on which app is active.

When: After context captured, before audio starts

Transcription

~300-800ms

Audio is sent to TalkieEngine for local Whisper transcription. The audio file is saved permanently first—your recording is never lost.

5
Hotkey released
Stop capture, transition to transcribing
6
Audio saved
Copied to permanent storage before processing
7
Transcription requestHook
Sent to TalkieEngine via XPC
8
Text returnedHook
Whisper model returns transcript
onTranscriptionComplete

Transform or validate the transcript. Apply custom corrections, filter content, or route differently based on what was said.

When: After transcription, before routing

Routing

~50ms

The transcript reaches its destination—pasted into the active app, copied to clipboard, or routed to the scratchpad for editing.

9
Routing decisionHook
Paste, clipboard, or scratchpad
10
Text delivered
Keyboard simulation or clipboard write
11
Sound feedback
Confirmation that delivery succeeded
beforeRoute

Intercept before delivery. Could trigger different behavior based on keywords, app context, or custom rules.

When: After routing decision, before text delivery

Storage

~10ms

The dictation is saved to the local database with full metadata. Available for search, review, and later processing.

12
Record createdHook
LiveDictation saved to GRDB
13
Context enrichment
Async enhancement with bridge mapping
14
XPC notification
Main app notified of new dictation
15
State reset
Ready for next dictation
onDictationStored

React to completed dictations. Could trigger follow-up actions, sync to external services, or update statistics.

When: After database write, before state reset

Memo Lifecycle

Memos are deliberate recordings that persist and get processed. Unlike dictation, memos trigger workflows that can summarize, extract tasks, or integrate with other systems.

Creation

When you create a memo (via the main Talkie app), the recording follows a similar path but ends differently:

RecordAudio captured via AVAudioRecorder
TranscribeSent to TalkieEngine via EngineClient
Save as MemoStored in GRDB with audio file reference
Trigger WorkflowsAuto-run workflows execute in order

Workflow Execution

Workflows are multi-step pipelines that process memo content. Each step can transform, extract, or route the content.

Workflow Execution Flow
1.Load workflow definition (JSON)
2.Build context from memo (transcript, title, date)
3.Execute steps sequentiallyHook
4.Each step output becomes available to next step
5. Save workflow run record with results

Step types include LLM prompts (transform via AI), file actions (save to disk), clipboard (copy result), and webhooks (POST to external URL).

Extension Points

These are the natural seams in the lifecycle where custom logic could be injected. They represent moments where the flow pauses, a decision is made, or data transforms.

HookPhaseUse Cases
onCaptureStartCaptureAuto-route based on app, disable in certain contexts
onTranscriptionCompleteTranscriptionCustom corrections, keyword detection, content filtering
beforeRouteRoutingIntercept commands ("hey talkie"), transform output
onDictationStoredStorageSync to external service, trigger notifications
onMemoCreatedMemoAuto-categorize, trigger custom workflows
beforeWorkflowStepWorkflowInject data, modify prompts, skip steps conditionally

These extension points aren't implemented yet—this is documenting where they could exist. The lifecycle naturally pauses at these moments, making them ideal for hooks.