# Yakety Architecture ## Overview Yakety is a real-time voice transcription application built with a cross-platform C/C++ core and platform-specific UI layers. The system follows a layered architecture with clear separation between business logic, platform abstraction, and native implementations. The core design prioritizes low-latency audio processing, efficient memory management, and responsive user interaction through a unified hotkey system. The application operates in two modes: console CLI for development/testing and GUI tray application for production use. Both modes share the same core transcription pipeline but differ in their initialization and user interaction patterns. The system integrates OpenAI's Whisper.cpp for speech recognition, providing local processing without cloud dependencies. ## Component Map ### Core Business Logic (`src/`) - **Main Application**: `main.c` (lines 329-390) - Entry point and initialization flow - **Audio Processing**: `audio.c`, `audio.h` - Real-time audio capture and buffering - **Transcription Engine**: `transcription.cpp`, `transcription.h` - Whisper.cpp integration - **Model Management**: `models.c`, `models.h` - Model loading and fallback logic - **Input Handling**: `keylogger.h` - Cross-platform hotkey detection - **Menu System**: `menu.c`, `menu.h` - Tray/menubar interface ### Platform Abstraction Layer (`src/`) - **Application Framework**: `app.h` - Cross-platform app lifecycle management - **Preferences**: `preferences.c`, `preferences.h` - Configuration persistence - **Utilities**: `utils.h` - Platform-agnostic helper functions - **Dialog System**: `dialog.h` - Native dialog abstractions ### macOS Implementation (`src/mac/`) - **App Backend**: `app.m` - NSApplication integration and event loop - **UI Dialogs**: `dialogs/*.swift` - SwiftUI-based native dialogs - **System Integration**: `menu.m`, `clipboard.m`, `overlay.m` - Cocoa services - **Input Capture**: `keylogger.c` - Carbon event monitoring - **Threading**: `dispatch.m`, `dispatch.h` - GCD-based async execution ### Windows Implementation (`src/windows/`) - **App Backend**: `app.c` - Win32 application and message loop - **UI Components**: `dialog.c`, `overlay.c` - Win32 GUI elements - **System Services**: `menu.c`, `clipboard.c` - Windows shell integration - **Input Capture**: `keylogger.c` - Low-level keyboard hooks ### Build System - **CMake Configuration**: `CMakeLists.txt` (lines 1-535) - Cross-platform build - **Whisper Integration**: `cmake/BuildWhisper.cmake` - Whisper.cpp compilation - **Platform Setup**: `cmake/PlatformSetup.cmake` - Platform-specific configuration ## Key Files ### Core Headers and Data Structures **`src/app.h`** - Application lifecycle management - `APP_ENTRY_POINT` macro (lines 7-43): Platform-specific main() generation - `app_main()` function (line 46): Unified entry point for CLI and GUI modes - `AppReadyCallback` typedef (line 48): Deferred initialization pattern **`src/keylogger.h`** - Input event handling - `KeyCombination` struct (lines 17-20): Multi-key hotkey support - `KeyCallback` typedef (line 8): Event handler function signature - `KeyInfo` struct (lines 11-14): Platform-agnostic key representation **`src/transcription.h`** - Speech processing interface - `transcription_process()` (line 15): Main audio-to-text pipeline - `transcription_init()` (line 8): Whisper model initialization - Thread-safe C/C++ boundary with extern "C" wrapper **`src/models.h`** - Model management - `models_load()` (line 7): Unified model loading with fallback logic - `models_get_current_path()` (line 10): Active model path resolution **`src/model_definitions.h`** - Model catalog and metadata - `ModelInfo` struct (lines 6-12): Model metadata for UI and downloads - `DOWNLOADABLE_MODELS[]` array (lines 15-30): Available models with URLs - `SUPPORTED_LANGUAGES[]` array (lines 49-65): Language configuration ### Implementation Files **`src/main.c`** - Application bootstrap and flow control - `on_app_ready()` (lines 254-283): Deferred initialization sequence - `setup_keylogger()` (lines 120-148): Permission handling and hotkey setup - `process_recorded_audio()` (lines 169-215): Complete transcription pipeline - `AppState` struct (lines 28-31): Recording state management **`src/transcription.cpp`** - Whisper.cpp integration - `whisper_context *ctx` (line 17): Global Whisper model instance - `utils_mutex_t *ctx_mutex` (line 18): Thread safety for model access - `null_log_callback()` (lines 29-34): Whisper log suppression **`src/preferences.c`** - Configuration persistence - Cross-platform config file handling with JSON-like key-value storage - `KeyCombination` serialization for hotkey preferences - Platform-specific config directory resolution ## Data Flow ### Initialization Sequence 1. **Entry Point**: `main()` → `app_main()` (main.c:329) 2. **Core Setup**: Logging, preferences, signal handlers (main.c:340-360) 3. **Platform Init**: `app_init()` calls platform-specific initialization 4. **Deferred Loading**: `on_app_ready()` callback triggered after event loop starts 5. **Model Loading**: `models_load()` → `transcription_init()` (models.c:24-42) 6. **UI Setup**: Menu creation, keylogger initialization with permissions 7. **Ready State**: Application monitoring for hotkey events ### Transcription Pipeline 1. **Input Trigger**: Hotkey press detected by platform keylogger (keylogger.c) 2. **Recording Start**: `on_key_press()` → `audio_recorder_start()` (main.c:217-231) 3. **Audio Capture**: Platform-specific audio recording via miniaudio 4. **Recording Stop**: `on_key_release()` → `process_recorded_audio()` (main.c:233-250) 5. **Audio Processing**: `audio_recorder_get_samples()` retrieves float buffer 6. **Speech Recognition**: `transcription_process()` → Whisper inference (transcription.cpp:15) 7. **Text Output**: `clipboard_copy()` → `clipboard_paste()` for immediate insertion 8. **UI Feedback**: Overlay shows "Recording" and "Transcribing" states ### Cross-Platform Abstraction - **Main Thread Dispatch**: macOS uses `dispatch_async(dispatch_get_main_queue())`, Windows uses `PostMessage()` - **Event Loop Integration**: macOS `NSRunLoop`, Windows `GetMessage()/DispatchMessage()` - **Permission Handling**: macOS Accessibility API, Windows UAC/Admin privileges - **Resource Management**: macOS app bundles with `Info.plist`, Windows resource files (.rc) ### Threading Model - **Main Thread**: UI, event handling, clipboard operations - **Audio Thread**: Real-time audio capture (managed by miniaudio) - **Background Thread**: Whisper inference (CPU/GPU intensive) - **Synchronization**: Mutex protection for Whisper context, atomic operations for state flags The architecture emphasizes minimal latency for the complete transcription cycle while maintaining thread safety and platform compatibility across macOS and Windows environments.