Skip to content

Instantly share code, notes, and snippets.

@badlogic
Last active July 27, 2025 19:31
Show Gist options
  • Save badlogic/1c916697aaf69cba837b9083105ec1bc to your computer and use it in GitHub Desktop.
Save badlogic/1c916697aaf69cba837b9083105ec1bc to your computer and use it in GitHub Desktop.

Revisions

  1. badlogic revised this gist Jun 14, 2025. No changes.
  2. badlogic revised this gist Jun 14, 2025. 1 changed file with 42 additions and 19 deletions.
    61 changes: 42 additions & 19 deletions 01-update-docs.md
    Original file line number Diff line number Diff line change
    @@ -32,7 +32,7 @@ Each generated file MUST start with:
    ## Process

    You will:
    1. **Analyze the codebase systematically** across 6 key areas (merging development+patterns)
    1. **Analyze the codebase systematically** across 7 key areas (merging development+patterns)
    2. **Create or update docs** in `docs/*.md` with concrete file references
    3. **Synthesize final documentation** into a minimal, LLM-friendly README.md
    4. **Eliminate all duplication** across files
    @@ -52,7 +52,7 @@ Issue the following Task calls in parallel:
    **Project Overview** (`docs/project-overview.md`):
    STRUCTURE:
    - Overview: What the project is, core purpose, key value proposition (2-3 paragraphs)
    - Key Files: Main entry points (src/main.c, src/app.h, CMakeLists.txt)
    - Key Files: Main entry points and core configuration files
    - Technology Stack: Core technologies with specific file examples
    - Platform Support: Requirements with platform-specific file locations

    @@ -65,33 +65,42 @@ STRUCTURE:

    **Build System** (`docs/build-system.md`):
    STRUCTURE:
    - Overview: CMake system with file references (CMakeLists.txt, CMakePresets.json)
    - Overview: Build system with file references to main build configuration
    - Build Workflows: Common tasks with specific commands and config files
    - Platform Setup: Platform-specific requirements with file paths
    - Reference: Build targets, presets, and troubleshooting with file locations

    **Testing** (`docs/testing.md`):
    STRUCTURE:
    - Overview: Testing approach with test file locations (src/tests/*)
    - Overview: Testing approach with test file locations
    - Test Types: Different test categories with specific file examples
    - Running Tests: Commands with file paths and expected outputs
    - Reference: Test file organization, CMake test targets
    - Reference: Test file organization and build system test targets

    **Development** (`docs/development.md`):
    STRUCTURE:
    - Overview: Development environment, code style, patterns (merge with old patterns.md)
    - Overview: Development environment, code style, patterns (merge with old patterns.md if exists)
    - Code Style: Conventions with specific file examples (show actual code from codebase)
    - Common Patterns: Implementation patterns with file references (singleton pattern in src/audio.h, platform abstraction in src/mac/, src/windows/)
    - Common Patterns: Implementation patterns with file references and examples from the codebase
    - Workflows: Development tasks with concrete file locations and examples
    - Reference: File organization, naming conventions, common issues with specific files

    **Deployment** (`docs/deployment.md`):
    STRUCTURE:
    - Overview: Packaging and distribution with script references
    - Package Types: Different packages with CMake targets and output locations
    - Package Types: Different packages with build targets and output locations
    - Platform Deployment: Platform-specific packaging with file paths
    - Reference: Deployment scripts, output locations, server configurations

    **Files Catalog** (`docs/files.md`):
    STRUCTURE:
    - Overview: Comprehensive file catalog with descriptions and relationships (2-3 paragraphs)
    - Core Source Files: Main application logic with purpose descriptions
    - Platform Implementation: Platform-specific code with interface mappings
    - Build System: Build configuration and helper modules
    - Configuration: Assets, scripts, configs - Supporting files and their roles
    - Reference: File organization patterns, naming conventions, dependency relationships

    ## Critical Requirements

    ### LLM-OPTIMIZED FORMAT
    @@ -110,22 +119,22 @@ STRUCTURE:
    ### FILE REFERENCE FORMAT
    Always include specific file references:
    ```
    **Audio System** - Core implementation in src/audio.h (lines 15-45), platform backends in src/mac/audio.m and src/windows/audio.c
    **Core System** - Core implementation in src/core.h (lines 15-45), platform backends in src/platform/
    **Build Configuration** - Main CMakeLists.txt (lines 67-89), presets in CMakePresets.json
    **Build Configuration** - Main build file (lines 67-89), configuration files
    **Model Management** - Interface in src/models.h, implementation in src/models.c (model_download function at line 134)
    **Module Management** - Interface in src/module.h, implementation in src/module.c (key_function at line 134)
    ```

    ### PRACTICAL EXAMPLES
    Use actual code from the codebase:
    ```c
    // From src/audio.h:23-27
    // From src/example.h:23-27
    typedef struct {
    bool recording;
    float *buffer;
    int sample_rate;
    } AudioState;
    bool active;
    void *data;
    int count;
    } ExampleState;
    ```

    ## Final Steps
    @@ -134,7 +143,7 @@ After all tasks complete:

    1. **Read all `docs/*.md` files** and create README.md with:
    - Project description (2-3 sentences max)
    - Key entry points (src/main.c, CMakeLists.txt, etc.)
    - Key entry points and core configuration files
    - Quick build commands
    - Documentation links with brief descriptions of what LLMs will find useful
    - Keep it under 50 lines total
    @@ -164,12 +173,26 @@ Each agent must:

    **Success criteria**: Each file should be a practical reference that helps LLMs quickly understand the codebase and find the right files for specific tasks.

    **Special note for development.md**: Merge content from both old development.md and patterns.md into a single comprehensive development guide with implementation patterns.
    **Special note for development.md**: Merge content from both old development.md and patterns.md (if they exist) into a single comprehensive development guide with implementation patterns.

    The coordinating agent must:
    1. Wait for all agents to complete
    2. Read all generated files
    3. Remove any duplication found
    4. Create a minimal, LLM-optimized README.md with key file references
    5. **Update README.md timestamp** with current UTC time
    6. Delete docs/patterns.md since it's merged into development.md
    6. Delete docs/patterns.md if it exists since it's merged into development.md

    ## Files Agent Instructions

    The Files agent should create a minimal, token-efficient file catalog:

    1. **Discover files**: Use Glob and LS to find all source files, configs, and build files
    2. **Group by function**: Organize files into logical categories (core, platform, build, tests, config)
    3. **Brief descriptions**: One line per significant file describing its primary purpose
    4. **Key entry points**: Highlight main files, build configs, and important headers
    5. **Dependencies**: Note major relationships between file groups

    **Format**: Concise lists with file paths and single-sentence descriptions. Focus on helping LLMs quickly locate functionality, not comprehensive documentation.

    **Success criteria**: LLMs can quickly find "where is the main entry point", "which files handle X", "what are the key headers" without reading detailed descriptions.
  3. badlogic created this gist Jun 14, 2025.
    175 changes: 175 additions & 0 deletions 01-update-docs.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,175 @@
    # Update Documentation

    You will generate LLM-optimized documentation with concrete file references and flexible formatting.

    ## Your Task

    Create documentation that allows humans and LLMs to:
    - **Understand project purpose** - what the project does and why
    - **Get architecture overview** - how the system is organized
    - **Build on all platforms** - build instructions with file references
    - **Add features/subsystems** - following established patterns with examples
    - **Debug applications** - troubleshoot issues with specific file locations
    - **Test and add tests** - run existing tests and create new ones
    - **Deploy and distribute** - package and deploy the software

    ## Required Documentation Structure

    Each document MUST include:
    1. **Timestamp Header** - Hidden comment with last update timestamp
    2. **Brief Overview** (2-3 paragraphs max)
    3. **Key Files & Examples** - Concrete file references for each major topic
    4. **Common Workflows** - Practical guidance with file locations
    5. **Reference Information** - Quick lookup tables with file paths

    ## Timestamp Format

    Each generated file MUST start with:
    ```
    <!-- Generated: YYYY-MM-DD HH:MM:SS UTC -->
    ```

    ## Process

    You will:
    1. **Analyze the codebase systematically** across 6 key areas (merging development+patterns)
    2. **Create or update docs** in `docs/*.md` with concrete file references
    3. **Synthesize final documentation** into a minimal, LLM-friendly README.md
    4. **Eliminate all duplication** across files

    ## Analysis Methodology

    For each area, agents should:
    1. **Examine key files**: Look for build configs, test files, deployment scripts, main source files
    2. **Extract file references**: Note specific files, line numbers, and examples
    3. **Identify patterns**: Find repeated structures, naming conventions, common workflows
    4. **Make content LLM-friendly**: Token-efficient, reference-heavy, practical examples

    ## Specific File Requirements

    Issue the following Task calls in parallel:

    **Project Overview** (`docs/project-overview.md`):
    STRUCTURE:
    - Overview: What the project is, core purpose, key value proposition (2-3 paragraphs)
    - Key Files: Main entry points (src/main.c, src/app.h, CMakeLists.txt)
    - Technology Stack: Core technologies with specific file examples
    - Platform Support: Requirements with platform-specific file locations

    **Architecture** (`docs/architecture.md`):
    STRUCTURE:
    - Overview: High-level system organization (2-3 paragraphs)
    - Component Map: Major components with their source file locations
    - Key Files: Core headers and implementations with brief descriptions
    - Data Flow: How information flows with specific function/file references

    **Build System** (`docs/build-system.md`):
    STRUCTURE:
    - Overview: CMake system with file references (CMakeLists.txt, CMakePresets.json)
    - Build Workflows: Common tasks with specific commands and config files
    - Platform Setup: Platform-specific requirements with file paths
    - Reference: Build targets, presets, and troubleshooting with file locations

    **Testing** (`docs/testing.md`):
    STRUCTURE:
    - Overview: Testing approach with test file locations (src/tests/*)
    - Test Types: Different test categories with specific file examples
    - Running Tests: Commands with file paths and expected outputs
    - Reference: Test file organization, CMake test targets

    **Development** (`docs/development.md`):
    STRUCTURE:
    - Overview: Development environment, code style, patterns (merge with old patterns.md)
    - Code Style: Conventions with specific file examples (show actual code from codebase)
    - Common Patterns: Implementation patterns with file references (singleton pattern in src/audio.h, platform abstraction in src/mac/, src/windows/)
    - Workflows: Development tasks with concrete file locations and examples
    - Reference: File organization, naming conventions, common issues with specific files

    **Deployment** (`docs/deployment.md`):
    STRUCTURE:
    - Overview: Packaging and distribution with script references
    - Package Types: Different packages with CMake targets and output locations
    - Platform Deployment: Platform-specific packaging with file paths
    - Reference: Deployment scripts, output locations, server configurations

    ## Critical Requirements

    ### LLM-OPTIMIZED FORMAT
    - **Token efficient**: Avoid redundant explanations, focus on essential information
    - **Concrete file references**: Always include specific file paths, line numbers when helpful
    - **Flexible formatting**: Use subsections, code blocks, examples instead of rigid step-by-step
    - **Pattern examples**: Show actual code from the codebase, not generic examples

    ### NO DUPLICATION
    - Each piece of information appears in EXACTLY ONE file
    - Build information only in build-system.md
    - Code style and patterns only in development.md
    - Deployment information only in deployment.md
    - Cross-references using: "See [docs/filename.md](docs/filename.md)"

    ### FILE REFERENCE FORMAT
    Always include specific file references:
    ```
    **Audio System** - Core implementation in src/audio.h (lines 15-45), platform backends in src/mac/audio.m and src/windows/audio.c
    **Build Configuration** - Main CMakeLists.txt (lines 67-89), presets in CMakePresets.json
    **Model Management** - Interface in src/models.h, implementation in src/models.c (model_download function at line 134)
    ```

    ### PRACTICAL EXAMPLES
    Use actual code from the codebase:
    ```c
    // From src/audio.h:23-27
    typedef struct {
    bool recording;
    float *buffer;
    int sample_rate;
    } AudioState;
    ```

    ## Final Steps

    After all tasks complete:

    1. **Read all `docs/*.md` files** and create README.md with:
    - Project description (2-3 sentences max)
    - Key entry points (src/main.c, CMakeLists.txt, etc.)
    - Quick build commands
    - Documentation links with brief descriptions of what LLMs will find useful
    - Keep it under 50 lines total

    2. **Duplication check**: Scan all files and remove any duplicated information

    3. **File reference check**: Ensure all file paths are accurate and helpful

    ## Agent Instructions

    Each agent must:
    1. **Read existing file** if it exists to understand current content
    2. **Analyze relevant codebase files** systematically
    3. **Extract specific file references** throughout analysis:
    - Note important headers, source files, configuration files
    - Include line numbers for key functions/sections when helpful
    - Reference actual code examples from the codebase
    4. **Create LLM-friendly content**:
    - Token-efficient writing (no redundant explanations)
    - Concrete file paths and examples throughout
    - Flexible formatting (subsections, code blocks, practical guidance)
    - Focus on what LLMs need to understand and work with the code
    5. **Include practical workflows** with specific file references
    6. **Create reference sections** with file locations and line numbers
    7. **Update timestamp** at the top with current UTC time
    8. **Read generated file** and revise for accuracy and completeness

    **Success criteria**: Each file should be a practical reference that helps LLMs quickly understand the codebase and find the right files for specific tasks.

    **Special note for development.md**: Merge content from both old development.md and patterns.md into a single comprehensive development guide with implementation patterns.

    The coordinating agent must:
    1. Wait for all agents to complete
    2. Read all generated files
    3. Remove any duplication found
    4. Create a minimal, LLM-optimized README.md with key file references
    5. **Update README.md timestamp** with current UTC time
    6. Delete docs/patterns.md since it's merged into development.md
    49 changes: 49 additions & 0 deletions 02-README.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,49 @@
    <!-- Generated: 2025-06-14 22:35:12 UTC -->

    # Yakety

    Real-time speech-to-text application with hotkey recording and local Whisper transcription. Records audio while holding a keyboard shortcut, transcribes using on-device AI, and pastes text directly into the active application.

    ## Key Entry Points

    - **src/main.c** - Application entry point and transcription pipeline
    - **CMakeLists.txt** - Build system with whisper.cpp integration
    - **src/audio.c** - Audio recording and processing core
    - **src/transcription.cpp** - Whisper model integration

    ## Quick Start

    ```bash
    # Build release version
    cmake --preset release
    cmake --build --preset release

    # Run CLI version
    ./build/bin/yakety-cli

    # Run GUI version (macOS/Windows)
    ./build/bin/Yakety.app # macOS
    ./build/bin/Yakety.exe # Windows
    ```

    ## Documentation

    - **[Project Overview](docs/project-overview.md)** - Core purpose, technology stack, platform requirements
    - **[Architecture](docs/architecture.md)** - System organization, component map, data flow patterns
    - **[Build System](docs/build-system.md)** - CMake configuration, build presets, platform setup
    - **[Development](docs/development.md)** - Code style, patterns, implementation guidelines
    - **[Testing](docs/testing.md)** - GUI test suite, dialog validation, test execution
    - **[Deployment](docs/deployment.md)** - Packaging, distribution, remote deployment

    ## Platform Support

    - **macOS 14.0+** (Apple Silicon) - Cocoa/SwiftUI interface
    - **Windows 10+** - Win32 native interface
    - **Linux** - Experimental CLI support

    ## Technology Stack

    - **Audio**: miniaudio (cross-platform capture)
    - **Speech Recognition**: whisper.cpp (local AI inference)
    - **Build System**: CMake 3.20+ with Ninja/Visual Studio
    - **GUI**: Platform-native (Cocoa/SwiftUI on macOS, Win32 on Windows)
    46 changes: 46 additions & 0 deletions 03-project-overview.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,46 @@
    <!-- Generated: 2025-06-14 15:30:00 UTC -->

    # Project Overview

    ## Overview

    Yakety is a real-time speech-to-text application that provides instant transcription through keyboard shortcuts. It records audio while a hotkey is held down, transcribes the speech using OpenAI's Whisper model, and automatically pastes the transcribed text into the active application. The application is designed for efficient voice-to-text input across desktop workflows.

    The project targets both CLI and GUI usage patterns, supporting macOS and Windows with platform-specific implementations. It integrates whisper.cpp for on-device transcription, eliminating the need for cloud services while maintaining privacy. The application features a system tray interface for GUI mode and comprehensive keyboard monitoring for seamless user interaction.

    ## Key Files

    - **src/main.c**: Primary application entry point containing initialization sequence, audio processing pipeline, and keyboard event handling (lines 254-388)
    - **src/app.h**: Cross-platform application framework with platform-specific entry point macros (lines 6-43) and async execution utilities
    - **CMakeLists.txt**: Build system configuration managing whisper.cpp integration (lines 28-32), platform-specific compilation (lines 48-85), and distribution packaging (lines 358-535)
    - **src/transcription.cpp**: Whisper model integration and audio processing core (lines 49-100)

    ## Technology Stack

    - **Audio Processing**: miniaudio library for cross-platform audio capture in src/audio.c with 16kHz mono configuration (lines 9-11)
    - **Speech Recognition**: whisper.cpp integration for local transcription processing in src/transcription.cpp (lines 14-15)
    - **Platform Abstraction**: C-style C++ implementation with platform-specific modules in src/mac/ and src/windows/
    - **Build System**: CMake with custom modules in cmake/ directory, supporting Ninja and Visual Studio generators
    - **GUI Framework**:
    - macOS: Objective-C/Swift UI in src/mac/dialogs/ with SwiftUI dialogs
    - Windows: Win32 API in src/windows/ with native dialog implementations

    ## Platform Support

    **macOS Requirements**:
    - Minimum macOS 14.0 (Apple Silicon only, set in CMakeLists.txt line 22)
    - Accessibility permissions for keyboard monitoring (handled in src/main.c lines 78-117)
    - Metal acceleration support via ggml-metal library integration
    - System tray menubar interface in src/mac/menu.m

    **Windows Requirements**:
    - Windows 10+ with Visual Studio 2022 build tools
    - Optional Vulkan support for GPU acceleration
    - WSL development environment supported via scripts in wsl/ directory
    - System tray interface in src/windows/menu.c

    **Cross-Platform Components**:
    - Keyboard monitoring: src/mac/keylogger.c and src/windows/keylogger.c
    - Audio recording: src/audio.c with platform-specific audio device handling
    - Preferences storage: src/preferences.c with platform-specific configuration paths
    - Model management: src/models.c with bundled and downloadable Whisper models defined in src/model_definitions.h
    124 changes: 124 additions & 0 deletions 04-architecture.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,124 @@
    # Yakety Architecture

    <!-- Generated: 2025-06-14 20:31:01 UTC -->

    ## Overview

    Yakety is a real-time voice transcription application built with a cross-platform C/C++ core and platform-specific UI layers. The system follows a layered architecture with clear separation between business logic, platform abstraction, and native implementations. The core design prioritizes low-latency audio processing, efficient memory management, and responsive user interaction through a unified hotkey system.

    The application operates in two modes: console CLI for development/testing and GUI tray application for production use. Both modes share the same core transcription pipeline but differ in their initialization and user interaction patterns. The system integrates OpenAI's Whisper.cpp for speech recognition, providing local processing without cloud dependencies.

    ## Component Map

    ### Core Business Logic (`src/`)
    - **Main Application**: `main.c` (lines 329-390) - Entry point and initialization flow
    - **Audio Processing**: `audio.c`, `audio.h` - Real-time audio capture and buffering
    - **Transcription Engine**: `transcription.cpp`, `transcription.h` - Whisper.cpp integration
    - **Model Management**: `models.c`, `models.h` - Model loading and fallback logic
    - **Input Handling**: `keylogger.h` - Cross-platform hotkey detection
    - **Menu System**: `menu.c`, `menu.h` - Tray/menubar interface

    ### Platform Abstraction Layer (`src/`)
    - **Application Framework**: `app.h` - Cross-platform app lifecycle management
    - **Preferences**: `preferences.c`, `preferences.h` - Configuration persistence
    - **Utilities**: `utils.h` - Platform-agnostic helper functions
    - **Dialog System**: `dialog.h` - Native dialog abstractions

    ### macOS Implementation (`src/mac/`)
    - **App Backend**: `app.m` - NSApplication integration and event loop
    - **UI Dialogs**: `dialogs/*.swift` - SwiftUI-based native dialogs
    - **System Integration**: `menu.m`, `clipboard.m`, `overlay.m` - Cocoa services
    - **Input Capture**: `keylogger.c` - Carbon event monitoring
    - **Threading**: `dispatch.m`, `dispatch.h` - GCD-based async execution

    ### Windows Implementation (`src/windows/`)
    - **App Backend**: `app.c` - Win32 application and message loop
    - **UI Components**: `dialog.c`, `overlay.c` - Win32 GUI elements
    - **System Services**: `menu.c`, `clipboard.c` - Windows shell integration
    - **Input Capture**: `keylogger.c` - Low-level keyboard hooks

    ### Build System
    - **CMake Configuration**: `CMakeLists.txt` (lines 1-535) - Cross-platform build
    - **Whisper Integration**: `cmake/BuildWhisper.cmake` - Whisper.cpp compilation
    - **Platform Setup**: `cmake/PlatformSetup.cmake` - Platform-specific configuration

    ## Key Files

    ### Core Headers and Data Structures

    **`src/app.h`** - Application lifecycle management
    - `APP_ENTRY_POINT` macro (lines 7-43): Platform-specific main() generation
    - `app_main()` function (line 46): Unified entry point for CLI and GUI modes
    - `AppReadyCallback` typedef (line 48): Deferred initialization pattern

    **`src/keylogger.h`** - Input event handling
    - `KeyCombination` struct (lines 17-20): Multi-key hotkey support
    - `KeyCallback` typedef (line 8): Event handler function signature
    - `KeyInfo` struct (lines 11-14): Platform-agnostic key representation

    **`src/transcription.h`** - Speech processing interface
    - `transcription_process()` (line 15): Main audio-to-text pipeline
    - `transcription_init()` (line 8): Whisper model initialization
    - Thread-safe C/C++ boundary with extern "C" wrapper

    **`src/models.h`** - Model management
    - `models_load()` (line 7): Unified model loading with fallback logic
    - `models_get_current_path()` (line 10): Active model path resolution

    **`src/model_definitions.h`** - Model catalog and metadata
    - `ModelInfo` struct (lines 6-12): Model metadata for UI and downloads
    - `DOWNLOADABLE_MODELS[]` array (lines 15-30): Available models with URLs
    - `SUPPORTED_LANGUAGES[]` array (lines 49-65): Language configuration

    ### Implementation Files

    **`src/main.c`** - Application bootstrap and flow control
    - `on_app_ready()` (lines 254-283): Deferred initialization sequence
    - `setup_keylogger()` (lines 120-148): Permission handling and hotkey setup
    - `process_recorded_audio()` (lines 169-215): Complete transcription pipeline
    - `AppState` struct (lines 28-31): Recording state management

    **`src/transcription.cpp`** - Whisper.cpp integration
    - `whisper_context *ctx` (line 17): Global Whisper model instance
    - `utils_mutex_t *ctx_mutex` (line 18): Thread safety for model access
    - `null_log_callback()` (lines 29-34): Whisper log suppression

    **`src/preferences.c`** - Configuration persistence
    - Cross-platform config file handling with JSON-like key-value storage
    - `KeyCombination` serialization for hotkey preferences
    - Platform-specific config directory resolution

    ## Data Flow

    ### Initialization Sequence
    1. **Entry Point**: `main()``app_main()` (main.c:329)
    2. **Core Setup**: Logging, preferences, signal handlers (main.c:340-360)
    3. **Platform Init**: `app_init()` calls platform-specific initialization
    4. **Deferred Loading**: `on_app_ready()` callback triggered after event loop starts
    5. **Model Loading**: `models_load()``transcription_init()` (models.c:24-42)
    6. **UI Setup**: Menu creation, keylogger initialization with permissions
    7. **Ready State**: Application monitoring for hotkey events

    ### Transcription Pipeline
    1. **Input Trigger**: Hotkey press detected by platform keylogger (keylogger.c)
    2. **Recording Start**: `on_key_press()``audio_recorder_start()` (main.c:217-231)
    3. **Audio Capture**: Platform-specific audio recording via miniaudio
    4. **Recording Stop**: `on_key_release()``process_recorded_audio()` (main.c:233-250)
    5. **Audio Processing**: `audio_recorder_get_samples()` retrieves float buffer
    6. **Speech Recognition**: `transcription_process()` → Whisper inference (transcription.cpp:15)
    7. **Text Output**: `clipboard_copy()``clipboard_paste()` for immediate insertion
    8. **UI Feedback**: Overlay shows "Recording" and "Transcribing" states

    ### Cross-Platform Abstraction
    - **Main Thread Dispatch**: macOS uses `dispatch_async(dispatch_get_main_queue())`, Windows uses `PostMessage()`
    - **Event Loop Integration**: macOS `NSRunLoop`, Windows `GetMessage()/DispatchMessage()`
    - **Permission Handling**: macOS Accessibility API, Windows UAC/Admin privileges
    - **Resource Management**: macOS app bundles with `Info.plist`, Windows resource files (.rc)

    ### Threading Model
    - **Main Thread**: UI, event handling, clipboard operations
    - **Audio Thread**: Real-time audio capture (managed by miniaudio)
    - **Background Thread**: Whisper inference (CPU/GPU intensive)
    - **Synchronization**: Mutex protection for Whisper context, atomic operations for state flags

    The architecture emphasizes minimal latency for the complete transcription cycle while maintaining thread safety and platform compatibility across macOS and Windows environments.
    194 changes: 194 additions & 0 deletions 05-build-system.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,194 @@
    <!-- Generated: 2025-06-14 21:30:00 UTC -->

    # Build System Documentation

    ## Overview

    Yakety uses CMake 3.20+ as its primary build system with multi-language support (C, C++, Swift). The build system is configured through:

    - **Main CMake file**: `CMakeLists.txt` - Primary build configuration
    - **Build presets**: `CMakePresets.json` - Platform-specific build configurations
    - **Helper modules**: `cmake/` directory with modular build logic
    - `cmake/BuildWhisper.cmake` - Whisper.cpp dependency management
    - `cmake/PlatformSetup.cmake` - Platform-specific libraries and frameworks
    - `cmake/GenerateIcons.cmake` - Asset generation from SVG sources

    The system automatically handles whisper.cpp dependency building, model downloading, icon generation, and platform-specific configurations.

    ## Build Workflows

    ### Quick Start Commands

    **Development builds:**
    ```bash
    # Release build (recommended for development)
    cmake --preset release
    cmake --build --preset release

    # Debug build
    cmake --preset debug
    cmake --build --preset debug
    ```

    **Windows debugging with Visual Studio:**
    ```bash
    # Only on Windows - enables Visual Studio debugging
    cmake --preset vs-debug
    cmake --build --preset vs-debug
    ```

    **Distribution packaging:**
    ```bash
    # Build and package for current platform
    cmake --build --preset release
    cmake --build build --target package

    # Platform-specific packages
    cmake --build build --target package-macos # macOS only
    cmake --build build --target package-windows # Windows only

    # Upload to server (requires SSH access)
    cmake --build build --target upload
    ```

    ### Build Targets

    The build system generates these executables in `build/bin/`:

    - **yakety-cli** - Command-line interface
    - **yakety-app** - GUI application (platform-specific bundle)
    - **recorder** - Audio recording utility
    - **transcribe** - Standalone transcription tool
    - **test-*** - Platform-specific test executables (macOS only)

    ### Asset Generation

    Icons are automatically generated from `assets/yakety.svg`:
    ```bash
    # Manual icon regeneration (automatic during build)
    cmake --build build --target generate_icons
    ```

    Requires: `rsvg-convert` (librsvg) and `magick` (ImageMagick)

    ## Platform Setup

    ### macOS Requirements

    - **Xcode Command Line Tools**: Required for Swift compiler
    - **macOS 14.0+**: Minimum deployment target
    - **Apple Silicon (ARM64)**: Target architecture
    - **System frameworks**: Automatically linked
    - CoreFoundation, AppKit, AudioToolbox, AVFoundation
    - Metal frameworks for GPU acceleration

    **Dependencies:**
    ```bash
    # Install build tools via Homebrew
    brew install librsvg imagemagick ninja cmake
    ```

    ### Windows Requirements

    - **Visual Studio 2022**: For MSVC compiler and debugging
    - **CMake 3.20+**: Build system
    - **Ninja**: Fast builds (included in VS2022)
    - **Vulkan SDK**: Optional GPU acceleration

    **Environment setup:**
    - Set VULKAN_SDK environment variable for GPU support
    - Use `winvs.bat` script for proper Visual Studio environment

    ### Windows/WSL Remote Development

    For development from macOS to Windows via SSH:

    **Setup scripts:**
    - `wsl/start-wsl-ssh.bat` - Run as Administrator on Windows
    - `wsl/setup-wsl-ssh.sh` - Configure SSH in WSL

    **Sync and build workflow:**
    ```bash
    # 1. Sync source files (excludes build directories)
    rsync -av --exclude='build/' --exclude='build-debug/' --exclude='whisper.cpp/' \
    . [email protected]:/mnt/c/workspaces/yakety/

    # 2. Configure build
    ssh [email protected] "cd /mnt/c/workspaces/yakety && \
    /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
    c:\\workspaces\\winvs.bat && cmake --preset release'"

    # 3. Build
    ssh [email protected] "cd /mnt/c/workspaces/yakety && \
    /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
    c:\\workspaces\\winvs.bat && cmake --build --preset release'"

    # 4. Run CLI
    ssh [email protected] "cd /mnt/c/workspaces/yakety && \
    build/bin/yakety-cli.exe"
    ```

    ### Linux Requirements (Experimental)

    - **GCC/Clang**: C/C++ compiler
    - **ALSA/PulseAudio**: Audio system libraries
    - **CMake 3.20+**, **Ninja**: Build tools

    ## Reference

    ### CMake Presets

    From `CMakePresets.json`:

    **Configure presets:**
    - `release` - Ninja generator, Release build, `build/` directory
    - `debug` - Ninja generator, Debug build, `build-debug/` directory
    - `vs-debug` - Visual Studio 2022, Windows-only debugging

    **Build presets:**
    - `release` - Build release configuration
    - `debug` - Build debug configuration
    - `vs-debug` - Build Windows VS debug configuration

    ### Whisper.cpp Integration

    Automatic dependency management via `cmake/BuildWhisper.cmake`:

    - **Auto-clone**: Downloads whisper.cpp from GitHub if missing
    - **Platform optimization**:
    - macOS: Metal GPU acceleration, ARM64 architecture
    - Windows: Native CPU optimization, optional Vulkan GPU
    - **Model download**: Automatically downloads ggml-base-q8_0.bin (110MB)
    - **Static linking**: All whisper libraries statically linked

    ### Code Signing (macOS)

    Automatic ad-hoc signing via `cmake/PlatformSetup.cmake`:
    ```bash
    # Manual signing
    ./sign-app.sh # Signs and removes quarantine
    ```

    ### Troubleshooting

    **Whisper.cpp build failures:**
    - Verify internet connection for auto-download
    - Check disk space (whisper.cpp ~500MB + model ~110MB)
    - On Windows: Ensure Visual Studio environment is loaded

    **Swift compilation warnings:**
    - Incremental compilation disabled via CMAKE_Swift_FLAGS
    - Normal for mixed C/Swift projects

    **Icon generation failures:**
    - Install librsvg: `brew install librsvg` (macOS) or `apt install librsvg2-bin` (Linux)
    - Install ImageMagick: `brew install imagemagick` (macOS)

    **Windows Vulkan not detected:**
    - Install Vulkan SDK and set VULKAN_SDK environment variable
    - Restart command prompt after installation

    **Linking errors:**
    - Clean build directories: `rm -rf build build-debug`
    - Rebuild whisper.cpp: `rm -rf whisper.cpp/build`
    - On Windows: Match Debug/Release configuration with whisper.cpp
    269 changes: 269 additions & 0 deletions 06-development.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,269 @@
    # Development Guide

    <!-- Generated: 2025-06-14 20:30:55 UTC -->

    ## Overview

    Yakety is a voice transcription tool using a **C-style C++** architecture with platform abstraction. The codebase follows C conventions with minimal C++ usage (only for whisper.cpp integration). Core features:

    - **Cross-platform**: macOS (Objective-C/Swift) and Windows (Win32 API)
    - **Singleton patterns**: Audio recorder, preferences, models
    - **Platform abstraction**: Clean separation between core logic and platform code
    - **Minimal dependencies**: Uses system APIs directly

    ## Code Style

    ### C-Style C++ Conventions

    **File Extensions:**
    - `.c` - Pure C code
    - `.cpp` - C++ code (only when whisper.cpp features needed)
    - `.m` - Objective-C (macOS platform layer)
    - `.swift` - SwiftUI dialogs (macOS only)

    **Naming Conventions:**
    ```c
    // Functions: module_action_object
    bool audio_recorder_init(void); // src/audio.c:82
    int keylogger_set_combination(combo); // Keylogger API
    void preferences_set_string(key, value); // src/preferences.h:24

    // Types: CamelCase with descriptive names
    typedef struct {
    ma_device device;
    float *buffer;
    bool is_recording; // Atomic access required
    } AudioRecorder; // src/audio.c:14-32

    // Constants: UPPER_CASE
    #define WHISPER_SAMPLE_RATE 16000 // src/audio.c:10
    #define MIN_RECORDING_DURATION 0.1 // src/main.c:26
    ```
    **C-Style Casting:**
    ```c
    AudioRecorder *recorder = (AudioRecorder *) pDevice->pUserData; // src/audio.c:39
    const float *input = (const float *) pInput; // src/audio.c:45
    ```

    **Header Guards:**
    ```c
    #ifndef AUDIO_H
    #define AUDIO_H
    // ... content ...
    #endif // AUDIO_H
    ```

    ## Common Patterns

    ### Singleton Pattern

    **Audio Recorder** (`src/audio.h`, `src/audio.c`):
    ```c
    // Global singleton instance
    static AudioRecorder *g_recorder = NULL;

    bool audio_recorder_init(void) {
    if (g_recorder) {
    return false; // Already initialized
    }
    g_recorder = (AudioRecorder *) calloc(1, sizeof(AudioRecorder));
    // ... initialization ...
    }

    void audio_recorder_cleanup(void) {
    if (!g_recorder) return;
    // ... cleanup ...
    free(g_recorder);
    g_recorder = NULL;
    }
    ```
    ### Platform Abstraction
    **Directory Structure:**
    ```
    src/
    ├── audio.h/c # Cross-platform core logic
    ├── utils.h # Platform abstraction interface
    ├── mac/ # macOS implementations
    │ ├── app.m # NSApplication handling
    │ ├── utils.m # Platform-specific utilities
    │ └── dialogs/ # SwiftUI dialog implementations
    └── windows/ # Windows implementations
    ├── app.c # Win32 application handling
    └── utils.c # Platform-specific utilities
    ```
    **Interface Pattern** (`src/utils.h`):
    ```c
    // Cross-platform interface
    void utils_open_accessibility_settings(void);
    bool utils_set_launch_at_login(bool enabled);
    double utils_get_time(void);
    // Platform implementations differ:
    // - src/mac/utils.m: Uses NSWorkspace, CFAbsoluteTimeGetCurrent
    // - src/windows/utils.c: Uses ShellExecute, GetTickCount64
    ```

    **App Initialization Pattern** (`src/app.h`):
    ```c
    typedef void (*AppReadyCallback)(void);
    int app_init(const char *name, const char *version, bool is_console, AppReadyCallback on_ready);

    // Platform-specific implementations:
    // - src/mac/app.m: Uses NSApplication, NSApplicationDelegate
    // - src/windows/app.c: Uses CreateWindow, message pump
    ```
    ### Thread Safety
    **Atomic Operations** (`src/audio.c:41`, `src/utils.h:43-46`):
    ```c
    // Thread-safe boolean access
    bool utils_atomic_read_bool(bool *ptr);
    void utils_atomic_write_bool(bool *ptr, bool value);
    // Usage in audio callback (audio thread → main thread)
    if (!utils_atomic_read_bool(&recorder->is_recording)) {
    return;
    }
    ```

    ### Error Handling

    **Return Code Pattern:**
    ```c
    // Success: 0, Failure: -1 or non-zero
    int audio_recorder_start(void); // src/audio.h:17
    int keylogger_init(callbacks, userdata); // Returns 0 on success

    // Boolean for simple operations
    bool audio_recorder_init(void); // src/audio.h:10
    bool preferences_init(void); // src/preferences.h:9
    ```
    **Error Logging:**
    ```c
    if (ma_device_start(&recorder->device) != MA_SUCCESS) {
    utils_atomic_write_bool(&recorder->is_recording, false);
    return -1;
    }
    ```

    ### SwiftUI Dialog Pattern

    **Modal Dialog Implementation** (`src/mac/dialogs/dialog_utils.swift:18-23`):
    ```swift
    func runModalDialog<T: View, StateType: ModalDialogState>(
    content: T,
    state: StateType,
    windowSize: NSSize = NSSize(width: 400, height: 200),
    windowTitle: String = ""
    ) -> StateType.ResultType
    ```

    **Dialog State Protocol:**
    ```swift
    protocol ModalDialogState: ObservableObject {
    associatedtype ResultType
    var isCompleted: Bool { get set }
    var result: ResultType { get }
    func reset()
    }
    ```

    ## Workflows

    ### Adding a New Feature

    1. **Core Logic** - Implement in `src/` using C-style conventions
    2. **Platform Interface** - Add declarations to appropriate header (e.g., `src/utils.h`)
    3. **Platform Implementation** - Implement in `src/mac/` and `src/windows/`
    4. **Integration** - Wire up in `src/main.c` app lifecycle

    ### Platform-Specific Dialog

    1. **macOS**: Create SwiftUI view in `src/mac/dialogs/`
    2. **Windows**: Implement Win32 dialog in `src/windows/dialog.c`
    3. **Interface**: Add C function declaration in `src/dialog.h`

    ### Audio Processing

    Audio pipeline follows whisper.cpp requirements:
    ```c
    #define WHISPER_SAMPLE_RATE 16000 // Fixed 16kHz
    #define WHISPER_CHANNELS 1 // Mono only
    ```
    Recording flow (`src/main.c:168-215`):
    ```
    Key Press → audio_recorder_start() → data_callback() fills buffer
    Key Release → audio_recorder_stop() → get_samples() → transcription_process()
    ```

    ## Reference

    ### File Organization

    **Core Modules:**
    - `src/main.c` - App entry point and lifecycle (329-391)
    - `src/audio.c/h` - Audio recording singleton
    - `src/preferences.c/h` - Configuration management
    - `src/models.c/h` - Whisper model loading
    - `src/transcription.cpp/h` - Whisper.cpp integration (C++)

    **Platform Abstraction:**
    - `src/utils.h` - Cross-platform interface definitions
    - `src/app.h` - Application framework interface
    - `src/mac/` - macOS implementations (Objective-C/Swift)
    - `src/windows/` - Windows implementations (Win32 C)

    **Build System:**
    - `CMakeLists.txt` - Main build configuration
    - `cmake/PlatformSetup.cmake` - Platform-specific setup
    - `cmake/BuildWhisper.cmake` - Whisper.cpp integration

    ### Key Constants

    ```c
    #define WHISPER_SAMPLE_RATE 16000 // Audio format for transcription
    #define WHISPER_CHANNELS 1 // Mono audio
    #define MIN_RECORDING_DURATION 0.1 // Minimum recording length
    #define PERMISSION_RETRY_DELAY_MS 500 // macOS permission retry delay
    ```
    ### Build Presets
    **Development:**
    ```bash
    cmake --preset debug # Debug build with Ninja
    cmake --preset release # Release build with Ninja
    ```

    **Windows Debugging:**
    ```bash
    cmake --preset vs-debug # Visual Studio generator for debugging
    ```

    ### Common Issues

    **macOS Accessibility Permissions:**
    - Handle in `src/main.c:78-117` with dialog prompts
    - Retry mechanism for permission granting

    **Thread Safety:**
    - Audio callback runs on separate thread
    - Use `utils_atomic_*` for shared state access
    - Main app state in `src/main.c:28-33`

    **Model Loading:**
    - Single unified function: `models_load()` in `src/models.c`
    - Handles download dialogs and fallback logic
    - Path management through preferences system

    **Memory Management:**
    - Consistent use of `malloc/free` for C compatibility
    - Audio buffer auto-resizing in `src/audio.c:54-66`
    - Caller owns returned buffers (e.g., `audio_recorder_get_samples`)
    95 changes: 95 additions & 0 deletions 07-testing.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,95 @@
    <!-- Generated: 2025-06-14 22:27:15 UTC -->

    # Testing Documentation

    ## Overview

    Yakety uses manual interactive tests for GUI components and dialog validation. All test programs are located in `/src/tests/` and test specific platform dialog implementations using the app's GUI framework integration.

    **Test File Locations:** `/src/tests/test_*.c`

    ## Test Types

    ### Dialog Integration Tests
    Manual GUI tests that validate platform-specific dialog implementations:

    - **`/src/tests/test_model_dialog.c`** - Models & Language selection dialog
    - **`/src/tests/test_keycombination_dialog.c`** - Hotkey capture dialog with keylogger integration
    - **`/src/tests/test_download_dialog.c`** - Model download progress dialog

    ### Test Structure
    All tests follow this pattern:
    - Initialize platform app framework (`app_init`)
    - Set up test-specific dependencies (keylogger, callbacks)
    - Execute dialog function with test parameters
    - Validate results and clean up resources
    - Exit with status code

    ## Running Tests

    ### Prerequisites
    ```bash
    # Configure and build project first
    cmake --preset release # or debug
    cmake --build --preset release
    ```

    ### Test Execution Commands

    **Model Dialog Test:**
    ```bash
    ./build/bin/test-model-dialog
    ```
    Expected: GUI dialog opens for model/language selection, prints selected values or cancellation.

    **Key Combination Dialog Test:**
    ```bash
    ./build/bin/test-keycombination-dialog
    ```
    Expected: GUI dialog captures key combinations, prints key codes and modifier flags.

    **Download Dialog Test:**
    ```bash
    ./build/bin/test-download-dialog
    ```
    Expected: Downloads test file (1KB from httpbin.org), shows progress dialog, cleans up temp file.

    ### Platform Availability
    Tests are only built and available on **macOS** (requires Cocoa/SwiftUI frameworks).
    Windows tests would require separate implementations using platform-specific dialog APIs.

    ## Reference

    ### Test File Organization
    ```
    src/tests/
    ├── test_model_dialog.c # Model selection dialog test
    ├── test_keycombination_dialog.c # Hotkey capture dialog test
    └── test_download_dialog.c # Download progress dialog test
    ```

    ### CMake Test Targets
    Test executables are defined in `/CMakeLists.txt` lines 502-532:

    ```cmake
    # Test programs (macOS only)
    add_executable(test-model-dialog src/tests/test_model_dialog.c)
    add_executable(test-keycombination-dialog src/tests/test_keycombination_dialog.c)
    add_executable(test-download-dialog src/tests/test_download_dialog.c)
    ```

    ### Test Dependencies
    - **Platform library:** Core app and dialog functions
    - **Cocoa framework:** macOS GUI integration (-framework Cocoa)
    - **SwiftUI framework:** Modern dialog implementations (-framework SwiftUI)
    - **Keylogger:** Required for hotkey capture testing

    ### Test Output Location
    All test executables built to: `/build/bin/test-*`

    ### Creating New Tests
    1. Add test source file in `/src/tests/test_<feature>.c`
    2. Follow existing test pattern with `app_init`, test logic, `app_cleanup`
    3. Add CMake target in main `/CMakeLists.txt` (macOS section)
    4. Link required platform libraries and frameworks
    5. Ensure proper cleanup and exit codes for automation
    142 changes: 142 additions & 0 deletions 08-deployment.md
    Original file line number Diff line number Diff line change
    @@ -0,0 +1,142 @@
    <!-- Generated: 2025-06-14 20:30:41 UTC -->

    # Deployment Documentation

    ## Overview

    Yakety provides multiple packaging and distribution options for cross-platform deployment. The build system includes automated packaging targets for creating distributable archives, DMG installers, and deployment to remote servers.

    ## Package Types

    ### CLI Distribution Packages
    - **Target**: `package-cli-macos` | `package-cli-windows`
    - **Output**: `yakety-cli-{platform}.zip`
    - **Location**: `${CMAKE_BINARY_DIR}/`
    - **Contents**: CLI tools (yakety-cli, recorder, transcribe), models, assets

    ### App Distribution Packages
    - **Target**: `package-app-macos` | `package-app-windows`
    - **Output**: macOS: `Yakety-macos.dmg` + `Yakety-macos.zip` | Windows: `Yakety-windows.zip`
    - **Location**: `${CMAKE_BINARY_DIR}/`
    - **Contents**: Application bundles with embedded resources

    ### Universal Package Target
    - **Target**: `package`
    - **Behavior**: Platform-conditional (`package-macos` on Darwin, `package-windows` on Windows)

    ## Platform Deployment

    ### macOS
    ```bash
    # Build release binaries
    cmake --preset release
    cmake --build --preset release

    # Create CLI distribution
    cmake --build build --target package-cli-macos
    # Output: build/yakety-cli-macos.zip

    # Create app distribution with DMG
    cmake --build build --target package-app-macos
    # Output: build/Yakety-macos.dmg, build/Yakety-macos.zip

    # Create all packages
    cmake --build build --target package-macos
    ```

    **DMG Creation Process**:
    1. Copies app bundle to temp directory
    2. Creates Applications symlink for drag-install
    3. Generates DMG with `hdiutil create`
    4. Creates compressed ZIP of DMG

    **Code Signing**: Automatic ad-hoc signing with `codesign --force --deep --sign -`

    ### Windows
    ```bash
    # Build release binaries
    cmake --preset release
    cmake --build --preset release

    # Create CLI distribution
    cmake --build build --target package-cli-windows
    # Output: build/yakety-cli-windows.zip

    # Create app distribution
    cmake --build build --target package-app-windows
    # Output: build/Yakety-windows.zip

    # Create all packages
    cmake --build build --target package-windows
    ```

    **Windows-specific**:
    - CLI executable: `yakety-cli.exe`
    - GUI executable: `Yakety.exe` (WIN32 app without console)
    - Vulkan acceleration support (if VULKAN_SDK available)

    ### Remote Deployment
    ```bash
    # Upload packages to server
    cmake --build build --target upload
    ```

    **Upload Destinations**:
    - **Target Server**: `[email protected]`
    - **Path**: `/home/badlogic/mariozechner.at/html/uploads/`
    - **Method**: Windows: SCP via batch script | Unix: rsync

    ### Website Deployment

    **Frontend-only Deploy**:
    ```bash
    cd website
    ./publish.sh
    ```

    **Full Deploy with Server Restart**:
    ```bash
    cd website
    ./publish.sh server
    ```

    **Docker Control**:
    ```bash
    cd website/docker
    ./control.sh start # Production mode
    ./control.sh startdev # Development mode
    ./control.sh stop # Stop services
    ./control.sh logs # View logs
    ./control.sh restart # Restart services
    ```

    ## Reference

    ### Build Outputs
    - **Binary Directory**: `${CMAKE_BINARY_DIR}/bin/`
    - **CLI Tools**: `yakety-cli`, `recorder`, `transcribe`
    - **GUI Apps**: `Yakety.app` (macOS bundle) | `Yakety.exe` (Windows)
    - **Models**: `bin/models/ggml-base-q8_0.bin`
    - **Assets**: `bin/menubar.png`

    ### Distribution Archives
    - **macOS CLI**: `yakety-cli-macos.zip`
    - **macOS App**: `Yakety-macos.dmg`, `Yakety-macos.zip`
    - **Windows CLI**: `yakety-cli-windows.zip`
    - **Windows App**: `Yakety-windows.zip`

    ### Website Configuration
    - **Production Domain**: `yakety.ai`, `www.yakety.ai`
    - **SSL**: Let's Encrypt via nginx-proxy
    - **Server Stack**: Docker (Nginx + Node.js)
    - **Deployment**: rsync to `slayer.marioslab.io`

    ### Build Presets
    - **Release**: `cmake --preset release` (Ninja, optimized)
    - **Debug**: `cmake --preset debug` (Ninja, debugging symbols)
    - **VS Debug**: `cmake --preset vs-debug` (Visual Studio, Windows only)

    ### WSL/Remote Development
    - **Target**: Windows machine at `192.168.1.21`
    - **Sync Command**: `rsync -av --exclude='build/' --exclude='whisper.cpp/' . [email protected]:/mnt/c/workspaces/yakety/`
    - **Build via SSH**: Uses `cmd.exe` with `winvs.bat` environment setup