Gemini CLI Agent Architecture

This document provides a comprehensive overview of the Gemini CLI agent architecture, covering the main execution loop, concurrency handling, multi-turn conversations, and key architectural patterns.

Overview

The Gemini CLI is built as an event-driven, streaming AI agent with sophisticated tool execution capabilities. It supports both interactive (TTY with React UI) and non-interactive execution modes, with robust state management and error handling throughout.

Main Execution Loop

Entry Points

Primary Entry Point: packages/cli/index.ts

Simple entry that imports and calls main() from gemini.tsx
Global error handler catches unhandled rejections

Main Control Logic: packages/cli/src/gemini.tsx

async function main() {
  // 1. Load settings and configuration
  const settings = await Settings.load();
  
  // 2. Initialize file and git services
  const fileService = createNodeFileService();
  const gitService = createGitService();
  
  // 3. Check for sandbox/memory requirements
  await handleSandboxMode(settings);
  
  // 4. Determine execution mode:
  if (process.stdout.isTTY) {
    // Interactive mode: render React UI
    render(<App settings={settings} />);
  } else {
    // Non-interactive mode: headless execution
    await runNonInteractive(settings);
  }
}

Interactive vs Non-Interactive Modes

Interactive Mode: packages/cli/src/ui/App.tsx

Full React/Ink UI with streaming responses
Real-time tool execution and user interaction
Rich formatting and visual feedback

Non-Interactive Mode: packages/cli/src/nonInteractiveCli.ts

Headless execution for scripting and automation
Single request-response cycle
JSON output formatting

Core Architecture Components

1. GeminiClient - Central Orchestrator

Location: packages/core/src/core/client.ts

export class GeminiClient {
  private chat: GeminiChat;
  private toolRegistry: ToolRegistry;
  
  // Main entry point for message processing
  async *sendMessageStream(
    message: PartListUnion,
    signal: AbortSignal,
  ): AsyncGenerator<GeminiEvent> {
    // Creates and runs a Turn object
    const turn = new Turn(this.chat, this.toolRegistry, this.config);
    yield* turn.run(message, signal);
  }
}

2. Turn - Single Interaction Unit

Location: packages/core/src/core/turn.ts

Each user interaction creates a Turn object that manages the complete request-response-tool cycle:

export class Turn {
  readonly pendingToolCalls: ToolCallRequestInfo[];
  
  async *run(req: PartListUnion, signal: AbortSignal): AsyncGenerator<ServerGeminiStreamEvent> {
    // 1. Send request to Gemini API
    // 2. Process streaming response
    // 3. Extract and queue tool calls
    // 4. Execute tools concurrently
    // 5. Send tool results back to Gemini
    // 6. Continue until completion
  }
}

3. GeminiChat - Conversation Management

Location: packages/core/src/core/geminiChat.ts

export class GeminiChat {
  private history: Content[] = [];
  
  async sendMessage(contents: Content[], params?: ChatParams): Promise<GenerateContentResponse> {
    // Builds request with system prompt + history + user message
    const requestContents = this.buildRequestContents(contents);
    
    // Execute with retry logic and fallback handling
    const response = await retryWithBackoff(apiCall, {
      shouldRetry: (error: Error) => this.shouldRetryError(error),
      onPersistent429: async (authType?: string) => 
        await this.handleFlashFallback(authType),
    });
    
    return response;
  }
}

Concurrency and Parallelism

Streaming Architecture

The agent uses AsyncGenerator/AsyncIterable patterns for efficient streaming:

// Main streaming loop in useGeminiStream hook
const processGeminiStreamEvents = useCallback(async (
  stream: AsyncIterable<GeminiEvent>,
  userMessageTimestamp: number,
  signal: AbortSignal,
): Promise<StreamProcessingStatus> => {
  for await (const event of stream) {
    switch (event.type) {
      case ServerGeminiEventType.Content:
        // Handle streaming text content
        break;
      case ServerGeminiEventType.ToolCallRequest:
        // Queue tool calls for concurrent execution
        break;
      case ServerGeminiEventType.Thought:
        // Handle model "thinking" content (Gemini 2.0+)
        break;
    }
  }
});

Tool Execution Concurrency

Core Tool Scheduler: packages/core/src/core/coreToolScheduler.ts

Tools execute concurrently but are coordinated through state management:

export class CoreToolScheduler {
  private toolCalls: Map<string, ToolCall> = new Map();
  
  private attemptExecutionOfScheduledCalls(signal: AbortSignal): void {
    const callsToExecute = this.toolCalls.filter(call => call.status === 'scheduled');
    
    // Execute all scheduled tools concurrently
    callsToExecute.forEach((toolCall) => {
      scheduledCall.tool.execute(...)
        .then((toolResult: ToolResult) => {
          // Handle successful completion
          this.setStatusInternal(callId, 'success', toolResult);
        })
        .catch((executionError: Error) => {
          // Handle execution errors
          this.setStatusInternal(callId, 'error', 
            createErrorResponse(scheduledCall.request, executionError));
        });
    });
  }
}

Tool Call State Machine

Tools progress through well-defined states:

type ToolCall = 
  | ValidatingToolCall    // Parameter validation in progress
  | ScheduledToolCall     // Ready for execution
  | ExecutingToolCall     // Currently running
  | WaitingToolCall       // Awaiting user approval
  | SuccessfulToolCall    // Completed successfully
  | ErroredToolCall       // Failed with error
  | CancelledToolCall;    // User cancelled

AbortController Integration

Global cancellation support throughout the execution pipeline:

// UI level abort management
const abortControllerRef = useRef<AbortController>(new AbortController());

// Propagated through entire chain
const handleSubmit = useCallback(async (message: string) => {
  const stream = geminiClient.sendMessageStream(message, abortControllerRef.current.signal);
  await processGeminiStreamEvents(stream, timestamp, abortControllerRef.current.signal);
});

Multi-turn Conversation Flow

Conversation History Management

History Storage:

export class GeminiChat {
  private history: Content[] = [];
  
  getHistory(curated: boolean = false): Content[] {
    const history = curated 
      ? extractCuratedHistory(this.history)  // Remove intermediate tool calls
      : this.history;                        // Full conversation
    return structuredClone(history);         // Deep copy for safety
  }
}

Context Compression:

async tryCompressChat(force: boolean = false): Promise<ChatCompressionInfo | null> {
  // Automatically triggered when approaching token limits
  // Summarizes conversation history while preserving important context
  const summaryPrompt = `Please provide a concise summary of this conversation...`;
  
  // Replace history with compressed version
  this.history = [summaryContent, ...recentHistory];
}

Request Building Pipeline

Each turn builds the complete request context:

private buildRequestContents(userContents: Content[]): Content[] {
  const systemPrompt = this.getSystemPrompt();
  const conversationHistory = this.getHistory(false);
  
  return [
    systemPrompt,      // Agent instructions and context
    ...conversationHistory,  // Previous conversation
    ...userContents    // Current user message
  ];
}

Auto-continuation Logic

The agent automatically continues when more tool calls are requested:

// In Turn.run() - automatic continuation logic
while (hasMoreToolCalls) {
  // Execute pending tools
  const toolResults = await this.executeToolBatch(signal);
  
  // Send results back to Gemini for continuation
  const continuationResponse = await this.chat.sendMessage(toolResults);
  
  // Check if more tools are requested
  hasMoreToolCalls = this.extractToolCalls(continuationResponse);
}

Tool Execution Architecture

Tool Discovery and Registration

Tool Registry: packages/core/src/tools/tool-registry.ts

export class ToolRegistry {
  private tools: Map<string, Tool> = new Map();
  
  async discoverTools(): Promise<void> {
    // 1. Register built-in tools
    this.registerBuiltinTools();
    
    // 2. Discover external tools via commands
    await this.discoverToolsViaCommands();
    
    // 3. Initialize MCP (Model Context Protocol) servers
    await this.initializeMcpServers();
    
    // 4. Register extension tools
    await this.registerExtensionTools();
  }
}

Tool Interface and Implementation

All tools implement a common interface:

export interface Tool {
  name: string;
  description: string | ToolDescription;
  
  execute(
    request: ToolCallRequest,
    signal: AbortSignal,
    onStreamOutput?: (chunk: string) => void,
  ): Promise<ToolResult>;
}

Built-in Tools

Core tools are implemented in packages/core/src/tools/:

edit.ts - File editing with LLM-based error correction
shell.ts - Shell command execution with safety checks
web-fetch.ts - Web content fetching with fallback handling
web-search.ts - Google Search integration
memoryTool.ts - Long-term memory management

Tool Execution Pipeline

Complete Tool Lifecycle:

Validation Phase:

// Parameter validation and sanitization
const validationResult = await validateToolParameters(request);

Approval Phase:

// Check if user confirmation required
if (tool.requiresConfirmation(request)) {
  await waitForUserApproval(request);
}

Execution Phase:

// Execute with live output streaming
const result = await tool.execute(request, signal, (chunk) => {
  // Stream output to UI in real-time
  onStreamOutput(chunk);
});

Result Processing:

// Convert tool result to Gemini function response format
const functionResponse = convertToFunctionResponse(
  toolName, callId, result.content
);

Error Handling and Recovery

Multi-level Error Handling

1. API Level Error Handling:

// In GeminiChat - comprehensive retry logic
const response = await retryWithBackoff(apiCall, {
  shouldRetry: (error: Error) => {
    if (error.message.includes('429')) return true;  // Rate limiting
    if (error.message.match(/5\d{2}/)) return true;  // Server errors
    return false;
  },
  onPersistent429: async (authType?: string) => 
    await this.handleFlashFallback(authType),  // Auto-fallback to Flash model
});

2. Tool Level Error Handling:

// Each tool execution wrapped in comprehensive error handling
.catch((executionError: Error) => {
  // Create structured error response
  const errorResponse = {
    success: false,
    error: {
      message: executionError.message,
      type: executionError.constructor.name,
      context: this.captureErrorContext(executionError)
    }
  };
  
  this.setStatusInternal(callId, 'error', errorResponse);
});

3. Edit Tool Error Correction:

The edit tool has sophisticated LLM-based error correction:

// packages/core/src/utils/editCorrector.ts
export class EditCorrector {
  // Automatically corrects string matching issues
  async correctOldStringMismatch(
    problematicSnippet: string,
    fileContent: string,
  ): Promise<string> {
    // Uses LLM to fix escaping and formatting issues
    const correctionPrompt = `Context: A process needs to find an exact literal match...`;
    return await this.llmCorrection(correctionPrompt);
  }
}

Flash Fallback System

Automatic model degradation for OAuth users:

private async handleFlashFallback(authType?: string): Promise<string | null> {
  // Only for OAuth users experiencing persistent 429 errors
  if (authType !== AuthType.LOGIN_WITH_GOOGLE_PERSONAL) return null;
  
  const currentModel = this.config.getModel();
  const fallbackModel = DEFAULT_GEMINI_FLASH_MODEL;
  
  if (currentModel === fallbackModel) return null;  // Already using Flash
  
  // Get user confirmation for model switch
  const accepted = await this.config.flashFallbackHandler?.(currentModel, fallbackModel);
  if (accepted) {
    this.config.setModel(fallbackModel);
    return fallbackModel;
  }
  
  return null;
}

State Management and Persistence

Configuration Hierarchy

Settings cascade through multiple levels:

// Hierarchical configuration loading
const settings = Settings.load([
  userConfigPath,     // ~/.gemini/settings.json
  projectConfigPath,  // ./GEMINI.md or .gemini/settings.json
  sessionOverrides    // Runtime configuration
]);

Memory and Context Management

Hierarchical Memory Loading:

const { memoryContent, fileCount } = await loadHierarchicalGeminiMemory(
  process.cwd(),
  config.getDebugMode(),
  config.getFileService(),
  config.getExtensionContextFilePaths(),
);

Project-specific Context:

GEMINI.md files provide project context
Git integration for change tracking
File snapshots for restoration capabilities

Session State Components

Chat History: Maintained with automatic compression
Tool Registry: Dynamic tool discovery and registration
User Memory: Persistent across sessions via GEMINI.md
Authentication State: OAuth tokens and API keys
Model Configuration: Current model and fallback settings

UI Architecture (Interactive Mode)

React/Ink Integration

Main UI Component: packages/cli/src/ui/App.tsx

export function App({ settings }: { settings: Settings }) {
  const [historyItems, setHistoryItems] = useState<HistoryItem[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  
  // Core streaming hook manages all agent interaction
  const {
    handleSubmit,
    handleInterrupt,
    toolCalls,
    scheduleToolCalls,
  } = useGeminiStream({
    onStreamingChange: setIsStreaming,
    onHistoryUpdate: (items) => setHistoryItems(prev => [...prev, ...items]),
  });
  
  return (
    <Box flexDirection="column">
      <ChatHistory items={historyItems} />
      <ToolCallsDisplay toolCalls={toolCalls} />
      <InputPrompt onSubmit={handleSubmit} disabled={isStreaming} />
    </Box>
  );
}

Real-time Tool Execution Display

Tool Status Management:

const useReactToolScheduler = (onCompletedBatch) => {
  const [toolCalls, setToolCalls] = useState<ReactToolCall[]>([]);
  
  // Real-time updates as tools execute
  const updateToolStatus = useCallback((id: string, status: ToolCallStatus) => {
    setToolCalls(prev => prev.map(tool => 
      tool.id === id ? { ...tool, status } : tool
    ));
  }, []);
  
  return { toolCalls, scheduleToolCalls, markToolsAsSubmitted };
};

Message Streaming and Splitting

Performance Optimization:

const handleContentEvent = useCallback((eventValue, currentBuffer, timestamp) => {
  // Split large messages at safe points for better rendering performance
  const newBuffer = currentBuffer + eventValue;
  const splitPoint = findLastSafeSplitPoint(newBuffer);
  
  if (splitPoint === newBuffer.length) {
    // Update existing pending message
    setPendingHistoryItem({ type: 'gemini', text: newBuffer });
  } else {
    // Split for performance: static + streaming
    const staticPart = newBuffer.slice(0, splitPoint);
    const streamingPart = newBuffer.slice(splitPoint);
    
    addItem({ type: 'gemini', text: staticPart }, timestamp);
    setPendingHistoryItem({ type: 'gemini_content', text: streamingPart });
  }
});

Key Architectural Patterns

1. Event-Driven Architecture

Streaming Events: All communication via async generators
Reactive UI: Components respond to event stream changes
Loose Coupling: Clean separation between core logic and presentation

2. State Machine Pattern

Tool States: Well-defined state transitions for tool execution
Conversation States: Clear streaming states (idle, processing, waiting)
Error States: Structured error handling with recovery paths

3. Plugin Architecture

Tool Interface: Common interface for all tool implementations
MCP Integration: External tool servers via Model Context Protocol
Extension System: Additional functionality via configuration

4. Async Generator Streaming

Backpressure Handling: Natural flow control via generators
Cancellation Support: AbortController throughout pipeline
Memory Efficiency: Process data as it arrives, not in batches

5. Configuration Hierarchy

Cascading Settings: User → Project → Session configuration
Hot Reloading: Runtime configuration updates
Environment Adaptation: Auth fallbacks based on environment

6. Robust Error Recovery

Automatic Retries: Exponential backoff for transient errors
Model Fallbacks: Automatic degradation to faster models
LLM-based Correction: Edit tools use LLM to fix parameter issues

Performance Considerations

Memory Management

Streaming Processing: Process responses as they arrive
History Compression: Automatic summarization when approaching limits
Tool Result Caching: Avoid re-execution of expensive operations

Concurrency Optimization

Parallel Tool Execution: Multiple tools run simultaneously when safe
Non-blocking UI: Streaming keeps interface responsive
Background Processing: Long-running tools don't block interaction

Network Efficiency

Request Batching: Multiple tool results sent in single request
Connection Reuse: Persistent connections to Gemini API
Retry Logic: Intelligent backoff prevents excessive API calls

This architecture provides a robust, scalable foundation for an AI agent with sophisticated tool execution, streaming responses, and rich user interaction capabilities.

howinator/gemini-architecture.md