Skip to content

Instantly share code, notes, and snippets.

@howinator
Created June 27, 2025 03:37
Show Gist options
  • Save howinator/6250c0b3bebd47c4c43c32f35d10f9b3 to your computer and use it in GitHub Desktop.
Save howinator/6250c0b3bebd47c4c43c32f35d10f9b3 to your computer and use it in GitHub Desktop.
Gemini Architecture

Gemini CLI Agent Architecture

This document provides a comprehensive overview of the Gemini CLI agent architecture, covering the main execution loop, concurrency handling, multi-turn conversations, and key architectural patterns.

Overview

The Gemini CLI is built as an event-driven, streaming AI agent with sophisticated tool execution capabilities. It supports both interactive (TTY with React UI) and non-interactive execution modes, with robust state management and error handling throughout.

Main Execution Loop

Entry Points

Primary Entry Point: packages/cli/index.ts

  • Simple entry that imports and calls main() from gemini.tsx
  • Global error handler catches unhandled rejections

Main Control Logic: packages/cli/src/gemini.tsx

async function main() {
  // 1. Load settings and configuration
  const settings = await Settings.load();
  
  // 2. Initialize file and git services
  const fileService = createNodeFileService();
  const gitService = createGitService();
  
  // 3. Check for sandbox/memory requirements
  await handleSandboxMode(settings);
  
  // 4. Determine execution mode:
  if (process.stdout.isTTY) {
    // Interactive mode: render React UI
    render(<App settings={settings} />);
  } else {
    // Non-interactive mode: headless execution
    await runNonInteractive(settings);
  }
}

Interactive vs Non-Interactive Modes

Interactive Mode: packages/cli/src/ui/App.tsx

  • Full React/Ink UI with streaming responses
  • Real-time tool execution and user interaction
  • Rich formatting and visual feedback

Non-Interactive Mode: packages/cli/src/nonInteractiveCli.ts

  • Headless execution for scripting and automation
  • Single request-response cycle
  • JSON output formatting

Core Architecture Components

1. GeminiClient - Central Orchestrator

Location: packages/core/src/core/client.ts

export class GeminiClient {
  private chat: GeminiChat;
  private toolRegistry: ToolRegistry;
  
  // Main entry point for message processing
  async *sendMessageStream(
    message: PartListUnion,
    signal: AbortSignal,
  ): AsyncGenerator<GeminiEvent> {
    // Creates and runs a Turn object
    const turn = new Turn(this.chat, this.toolRegistry, this.config);
    yield* turn.run(message, signal);
  }
}

2. Turn - Single Interaction Unit

Location: packages/core/src/core/turn.ts

Each user interaction creates a Turn object that manages the complete request-response-tool cycle:

export class Turn {
  readonly pendingToolCalls: ToolCallRequestInfo[];
  
  async *run(req: PartListUnion, signal: AbortSignal): AsyncGenerator<ServerGeminiStreamEvent> {
    // 1. Send request to Gemini API
    // 2. Process streaming response
    // 3. Extract and queue tool calls
    // 4. Execute tools concurrently
    // 5. Send tool results back to Gemini
    // 6. Continue until completion
  }
}

3. GeminiChat - Conversation Management

Location: packages/core/src/core/geminiChat.ts

export class GeminiChat {
  private history: Content[] = [];
  
  async sendMessage(contents: Content[], params?: ChatParams): Promise<GenerateContentResponse> {
    // Builds request with system prompt + history + user message
    const requestContents = this.buildRequestContents(contents);
    
    // Execute with retry logic and fallback handling
    const response = await retryWithBackoff(apiCall, {
      shouldRetry: (error: Error) => this.shouldRetryError(error),
      onPersistent429: async (authType?: string) => 
        await this.handleFlashFallback(authType),
    });
    
    return response;
  }
}

Concurrency and Parallelism

Streaming Architecture

The agent uses AsyncGenerator/AsyncIterable patterns for efficient streaming:

// Main streaming loop in useGeminiStream hook
const processGeminiStreamEvents = useCallback(async (
  stream: AsyncIterable<GeminiEvent>,
  userMessageTimestamp: number,
  signal: AbortSignal,
): Promise<StreamProcessingStatus> => {
  for await (const event of stream) {
    switch (event.type) {
      case ServerGeminiEventType.Content:
        // Handle streaming text content
        break;
      case ServerGeminiEventType.ToolCallRequest:
        // Queue tool calls for concurrent execution
        break;
      case ServerGeminiEventType.Thought:
        // Handle model "thinking" content (Gemini 2.0+)
        break;
    }
  }
});

Tool Execution Concurrency

Core Tool Scheduler: packages/core/src/core/coreToolScheduler.ts

Tools execute concurrently but are coordinated through state management:

export class CoreToolScheduler {
  private toolCalls: Map<string, ToolCall> = new Map();
  
  private attemptExecutionOfScheduledCalls(signal: AbortSignal): void {
    const callsToExecute = this.toolCalls.filter(call => call.status === 'scheduled');
    
    // Execute all scheduled tools concurrently
    callsToExecute.forEach((toolCall) => {
      scheduledCall.tool.execute(...)
        .then((toolResult: ToolResult) => {
          // Handle successful completion
          this.setStatusInternal(callId, 'success', toolResult);
        })
        .catch((executionError: Error) => {
          // Handle execution errors
          this.setStatusInternal(callId, 'error', 
            createErrorResponse(scheduledCall.request, executionError));
        });
    });
  }
}

Tool Call State Machine

Tools progress through well-defined states:

type ToolCall = 
  | ValidatingToolCall    // Parameter validation in progress
  | ScheduledToolCall     // Ready for execution
  | ExecutingToolCall     // Currently running
  | WaitingToolCall       // Awaiting user approval
  | SuccessfulToolCall    // Completed successfully
  | ErroredToolCall       // Failed with error
  | CancelledToolCall;    // User cancelled

AbortController Integration

Global cancellation support throughout the execution pipeline:

// UI level abort management
const abortControllerRef = useRef<AbortController>(new AbortController());

// Propagated through entire chain
const handleSubmit = useCallback(async (message: string) => {
  const stream = geminiClient.sendMessageStream(message, abortControllerRef.current.signal);
  await processGeminiStreamEvents(stream, timestamp, abortControllerRef.current.signal);
});

Multi-turn Conversation Flow

Conversation History Management

History Storage:

export class GeminiChat {
  private history: Content[] = [];
  
  getHistory(curated: boolean = false): Content[] {
    const history = curated 
      ? extractCuratedHistory(this.history)  // Remove intermediate tool calls
      : this.history;                        // Full conversation
    return structuredClone(history);         // Deep copy for safety
  }
}

Context Compression:

async tryCompressChat(force: boolean = false): Promise<ChatCompressionInfo | null> {
  // Automatically triggered when approaching token limits
  // Summarizes conversation history while preserving important context
  const summaryPrompt = `Please provide a concise summary of this conversation...`;
  
  // Replace history with compressed version
  this.history = [summaryContent, ...recentHistory];
}

Request Building Pipeline

Each turn builds the complete request context:

private buildRequestContents(userContents: Content[]): Content[] {
  const systemPrompt = this.getSystemPrompt();
  const conversationHistory = this.getHistory(false);
  
  return [
    systemPrompt,      // Agent instructions and context
    ...conversationHistory,  // Previous conversation
    ...userContents    // Current user message
  ];
}

Auto-continuation Logic

The agent automatically continues when more tool calls are requested:

// In Turn.run() - automatic continuation logic
while (hasMoreToolCalls) {
  // Execute pending tools
  const toolResults = await this.executeToolBatch(signal);
  
  // Send results back to Gemini for continuation
  const continuationResponse = await this.chat.sendMessage(toolResults);
  
  // Check if more tools are requested
  hasMoreToolCalls = this.extractToolCalls(continuationResponse);
}

Tool Execution Architecture

Tool Discovery and Registration

Tool Registry: packages/core/src/tools/tool-registry.ts

export class ToolRegistry {
  private tools: Map<string, Tool> = new Map();
  
  async discoverTools(): Promise<void> {
    // 1. Register built-in tools
    this.registerBuiltinTools();
    
    // 2. Discover external tools via commands
    await this.discoverToolsViaCommands();
    
    // 3. Initialize MCP (Model Context Protocol) servers
    await this.initializeMcpServers();
    
    // 4. Register extension tools
    await this.registerExtensionTools();
  }
}

Tool Interface and Implementation

All tools implement a common interface:

export interface Tool {
  name: string;
  description: string | ToolDescription;
  
  execute(
    request: ToolCallRequest,
    signal: AbortSignal,
    onStreamOutput?: (chunk: string) => void,
  ): Promise<ToolResult>;
}

Built-in Tools

Core tools are implemented in packages/core/src/tools/:

  • edit.ts - File editing with LLM-based error correction
  • shell.ts - Shell command execution with safety checks
  • web-fetch.ts - Web content fetching with fallback handling
  • web-search.ts - Google Search integration
  • memoryTool.ts - Long-term memory management

Tool Execution Pipeline

Complete Tool Lifecycle:

  1. Validation Phase:

    // Parameter validation and sanitization
    const validationResult = await validateToolParameters(request);
  2. Approval Phase:

    // Check if user confirmation required
    if (tool.requiresConfirmation(request)) {
      await waitForUserApproval(request);
    }
  3. Execution Phase:

    // Execute with live output streaming
    const result = await tool.execute(request, signal, (chunk) => {
      // Stream output to UI in real-time
      onStreamOutput(chunk);
    });
  4. Result Processing:

    // Convert tool result to Gemini function response format
    const functionResponse = convertToFunctionResponse(
      toolName, callId, result.content
    );

Error Handling and Recovery

Multi-level Error Handling

1. API Level Error Handling:

// In GeminiChat - comprehensive retry logic
const response = await retryWithBackoff(apiCall, {
  shouldRetry: (error: Error) => {
    if (error.message.includes('429')) return true;  // Rate limiting
    if (error.message.match(/5\d{2}/)) return true;  // Server errors
    return false;
  },
  onPersistent429: async (authType?: string) => 
    await this.handleFlashFallback(authType),  // Auto-fallback to Flash model
});

2. Tool Level Error Handling:

// Each tool execution wrapped in comprehensive error handling
.catch((executionError: Error) => {
  // Create structured error response
  const errorResponse = {
    success: false,
    error: {
      message: executionError.message,
      type: executionError.constructor.name,
      context: this.captureErrorContext(executionError)
    }
  };
  
  this.setStatusInternal(callId, 'error', errorResponse);
});

3. Edit Tool Error Correction:

The edit tool has sophisticated LLM-based error correction:

// packages/core/src/utils/editCorrector.ts
export class EditCorrector {
  // Automatically corrects string matching issues
  async correctOldStringMismatch(
    problematicSnippet: string,
    fileContent: string,
  ): Promise<string> {
    // Uses LLM to fix escaping and formatting issues
    const correctionPrompt = `Context: A process needs to find an exact literal match...`;
    return await this.llmCorrection(correctionPrompt);
  }
}

Flash Fallback System

Automatic model degradation for OAuth users:

private async handleFlashFallback(authType?: string): Promise<string | null> {
  // Only for OAuth users experiencing persistent 429 errors
  if (authType !== AuthType.LOGIN_WITH_GOOGLE_PERSONAL) return null;
  
  const currentModel = this.config.getModel();
  const fallbackModel = DEFAULT_GEMINI_FLASH_MODEL;
  
  if (currentModel === fallbackModel) return null;  // Already using Flash
  
  // Get user confirmation for model switch
  const accepted = await this.config.flashFallbackHandler?.(currentModel, fallbackModel);
  if (accepted) {
    this.config.setModel(fallbackModel);
    return fallbackModel;
  }
  
  return null;
}

State Management and Persistence

Configuration Hierarchy

Settings cascade through multiple levels:

// Hierarchical configuration loading
const settings = Settings.load([
  userConfigPath,     // ~/.gemini/settings.json
  projectConfigPath,  // ./GEMINI.md or .gemini/settings.json
  sessionOverrides    // Runtime configuration
]);

Memory and Context Management

Hierarchical Memory Loading:

const { memoryContent, fileCount } = await loadHierarchicalGeminiMemory(
  process.cwd(),
  config.getDebugMode(),
  config.getFileService(),
  config.getExtensionContextFilePaths(),
);

Project-specific Context:

  • GEMINI.md files provide project context
  • Git integration for change tracking
  • File snapshots for restoration capabilities

Session State Components

  1. Chat History: Maintained with automatic compression
  2. Tool Registry: Dynamic tool discovery and registration
  3. User Memory: Persistent across sessions via GEMINI.md
  4. Authentication State: OAuth tokens and API keys
  5. Model Configuration: Current model and fallback settings

UI Architecture (Interactive Mode)

React/Ink Integration

Main UI Component: packages/cli/src/ui/App.tsx

export function App({ settings }: { settings: Settings }) {
  const [historyItems, setHistoryItems] = useState<HistoryItem[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  
  // Core streaming hook manages all agent interaction
  const {
    handleSubmit,
    handleInterrupt,
    toolCalls,
    scheduleToolCalls,
  } = useGeminiStream({
    onStreamingChange: setIsStreaming,
    onHistoryUpdate: (items) => setHistoryItems(prev => [...prev, ...items]),
  });
  
  return (
    <Box flexDirection="column">
      <ChatHistory items={historyItems} />
      <ToolCallsDisplay toolCalls={toolCalls} />
      <InputPrompt onSubmit={handleSubmit} disabled={isStreaming} />
    </Box>
  );
}

Real-time Tool Execution Display

Tool Status Management:

const useReactToolScheduler = (onCompletedBatch) => {
  const [toolCalls, setToolCalls] = useState<ReactToolCall[]>([]);
  
  // Real-time updates as tools execute
  const updateToolStatus = useCallback((id: string, status: ToolCallStatus) => {
    setToolCalls(prev => prev.map(tool => 
      tool.id === id ? { ...tool, status } : tool
    ));
  }, []);
  
  return { toolCalls, scheduleToolCalls, markToolsAsSubmitted };
};

Message Streaming and Splitting

Performance Optimization:

const handleContentEvent = useCallback((eventValue, currentBuffer, timestamp) => {
  // Split large messages at safe points for better rendering performance
  const newBuffer = currentBuffer + eventValue;
  const splitPoint = findLastSafeSplitPoint(newBuffer);
  
  if (splitPoint === newBuffer.length) {
    // Update existing pending message
    setPendingHistoryItem({ type: 'gemini', text: newBuffer });
  } else {
    // Split for performance: static + streaming
    const staticPart = newBuffer.slice(0, splitPoint);
    const streamingPart = newBuffer.slice(splitPoint);
    
    addItem({ type: 'gemini', text: staticPart }, timestamp);
    setPendingHistoryItem({ type: 'gemini_content', text: streamingPart });
  }
});

Key Architectural Patterns

1. Event-Driven Architecture

  • Streaming Events: All communication via async generators
  • Reactive UI: Components respond to event stream changes
  • Loose Coupling: Clean separation between core logic and presentation

2. State Machine Pattern

  • Tool States: Well-defined state transitions for tool execution
  • Conversation States: Clear streaming states (idle, processing, waiting)
  • Error States: Structured error handling with recovery paths

3. Plugin Architecture

  • Tool Interface: Common interface for all tool implementations
  • MCP Integration: External tool servers via Model Context Protocol
  • Extension System: Additional functionality via configuration

4. Async Generator Streaming

  • Backpressure Handling: Natural flow control via generators
  • Cancellation Support: AbortController throughout pipeline
  • Memory Efficiency: Process data as it arrives, not in batches

5. Configuration Hierarchy

  • Cascading Settings: User → Project → Session configuration
  • Hot Reloading: Runtime configuration updates
  • Environment Adaptation: Auth fallbacks based on environment

6. Robust Error Recovery

  • Automatic Retries: Exponential backoff for transient errors
  • Model Fallbacks: Automatic degradation to faster models
  • LLM-based Correction: Edit tools use LLM to fix parameter issues

Performance Considerations

Memory Management

  • Streaming Processing: Process responses as they arrive
  • History Compression: Automatic summarization when approaching limits
  • Tool Result Caching: Avoid re-execution of expensive operations

Concurrency Optimization

  • Parallel Tool Execution: Multiple tools run simultaneously when safe
  • Non-blocking UI: Streaming keeps interface responsive
  • Background Processing: Long-running tools don't block interaction

Network Efficiency

  • Request Batching: Multiple tool results sent in single request
  • Connection Reuse: Persistent connections to Gemini API
  • Retry Logic: Intelligent backoff prevents excessive API calls

This architecture provides a robust, scalable foundation for an AI agent with sophisticated tool execution, streaming responses, and rich user interaction capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment