PostgreSQL Knowledge Base for Millions of Markdown Files

A blazing-fast markdown document management system built with PostgreSQL and Fish shell. Handles everything from personal notes to millions of documents with smart tab completion and beautiful mdcat rendering.

Features

🚀 Scales to millions - Optimized indexes, parallel queries, batch operations
🔍 Full-text + Fuzzy search - PostgreSQL FTS with trigram support
📁 Project organization - Group documents by project with multi-level tab completion
⚡ Sub-second queries - Even with millions of documents
🎨 Beautiful rendering - mdcat integration for terminal viewing

Quick Start

# Source the functions
source claude-functions.fish

# Save a document (auto-detects project from git)
claude-save README.md

# Save with explicit project
claude-save api-notes.md "API Design" "myproject"

# View with smart tab completion
claude-view <TAB>                    # Shows recent + prefixes
claude-view project:<TAB>            # Browse by project
claude-view project:myapp/<TAB>      # Documents in project

# Search across everything
claude-find "docker compose"
claude-find "api" --project myapp --limit 50

# Fuzzy search for typos
claude-find-fuzzy "configuraiton"    # Finds "configuration"

Installation

Create the database schema:

CREATE SCHEMA claude;

CREATE TABLE claude.knowledge (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    project TEXT,
    tags TEXT[],
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    accessed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    access_count INTEGER DEFAULT 0,
    search_vector tsvector
);

-- Create indexes for performance
CREATE INDEX idx_claude_project ON claude.knowledge(project) WHERE project IS NOT NULL;
CREATE INDEX idx_claude_accessed ON claude.knowledge(accessed_at DESC);
CREATE INDEX idx_claude_search ON claude.knowledge USING GIN(search_vector);

-- Enable fuzzy search
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_claude_title_trgm ON claude.knowledge USING gin(title gin_trgm_ops);

Source the Fish functions:

source claude-functions.fish
source claude-completions.fish

Multi-Level Tab Completion

The completion system is context-aware and supports multiple levels:

# Level 1: Prefixes and recent docs
claude-view <TAB>
api-design.md                    Recent document (modified today)
project:                         Browse by project →
tag:                            Browse by tag →
date:                           Browse by date →

# Level 2: Within a context
claude-view project:my<TAB>
project:myapp                    myapp project (12 documents)
project:mysite                   mysite project (8 documents)

# Level 3: Specific documents
claude-view project:myapp/<TAB>
project:myapp/api-design.md      API endpoint planning document
project:myapp/docker-setup.md    Docker configuration

Scaling to Millions

Current optimizations (handles 100K+ docs):

12 specialized indexes including trigrams
Parallel query execution
Batch import operations
Smart search function with ranking

When you hit millions:

-- 1. Partition by date
CREATE TABLE claude.knowledge (...) PARTITION BY RANGE (created_at);

-- 2. Materialized views for tab completion
CREATE MATERIALIZED VIEW claude.project_summary AS
SELECT project, COUNT(*), MAX(updated_at) FROM claude.knowledge GROUP BY project;

-- 3. Separate search index
CREATE TABLE claude.search_index (
    id BIGINT PRIMARY KEY,
    title TEXT,
    search_vector tsvector,
    preview TEXT
);

Bulk Import

# Import entire directory
claude-import-dir ~/Documents/notes "personal"

# Import with progress for large sets
claude-import-dir ~/massive-docs "knowledge-base"
📁 Found 50000 markdown files
🚀 Starting batch import...
........ (dots show progress)
✅ Imported 50000 documents

Performance Monitoring

# Check performance stats
claude-perf
⚡ Performance Diagnostics
=========================
metric          | value   | age
total_documents | 50000   | 00:01:23
Index Usage:
index_name              | scans  | tuples_read
idx_claude_search       | 1523   | 4821933
idx_claude_project_title| 823    | 2847221

# Run maintenance
claude-vacuum
🧹 Running maintenance...
✅ Maintenance complete

Architecture Decisions

PostgreSQL over SQLite - Better concurrent access, full-text search, scales to millions
Fish over Bash - Better completions, cleaner syntax, amazing interactive features
Direct SQL over ORM - Maximum performance, full control over queries
Trigram indexes - Handles typos and fuzzy matching efficiently

Advanced Usage

# Export entire project
claude-export-project myapp ~/Desktop/myapp-docs/

# Search with context
claude-find "authentication" --project api --limit 100

# View by ID (faster for scripts)
claude-view id:42

# Recent documents with custom limit
claude-recent 50

Why This Design Scales

Smart Indexes - PostgreSQL chooses the best index automatically
Parallel Queries - Multi-core search execution
Batch Operations - Import thousands of files efficiently
Incremental Updates - Only reindex what changes
Partitioning Ready - Easy to add when needed

Testing Scale

# Generate test data
for i in (seq 1 10000)
    echo "# Document $i\n\nThis is test content for document $i" > /tmp/doc-$i.md
end

# Bulk import
claude-import-dir /tmp "test-docs"

# Test search performance
time claude-find "document 5423"

# Test fuzzy search
time claude-find-fuzzy "documnet"  # Typo intentional

# Check stats
claude-stats

Built with ❤️ for people who have too many markdown files and want sub-second search on all of them.

arockwell/claude-knowledge-gist.md

Select an option

No results found