Add architecture diagrams explaining system components and WebSocket flow (#12542)

Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Saurya <saurya@openhands.dev> Co-authored-by: Ray Myers <ray.myers@gmail.com>
2026-03-22 05:37:20 +08:00 · 2026-03-17 08:52:40 -07:00
parent d58e12ad74
commit b68c75252d
9 changed files with 510 additions and 2 deletions
--- a/enterprise/doc/architecture/README.md
+++ b/enterprise/doc/architecture/README.md
@@ -0,0 +1,13 @@
 # Enterprise Architecture Documentation
 Architecture diagrams specific to the OpenHands SaaS/Enterprise deployment.
 ## Documentation
 - [Authentication Flow](./authentication.md) - Keycloak-based authentication for SaaS deployment
 - [External Integrations](./external-integrations.md) - GitHub, Slack, Jira, and other service integrations
 ## Related Documentation
 For core OpenHands architecture (applicable to all deployments), see:
 - [Core Architecture Documentation](../../../openhands/architecture/README.md)
--- a/enterprise/doc/architecture/authentication.md
+++ b/enterprise/doc/architecture/authentication.md
@@ -0,0 +1,58 @@
 # Authentication Flow (SaaS Deployment)
 OpenHands uses Keycloak for identity management in the SaaS deployment. The authentication flow involves multiple services:
 ```mermaid
 sequenceDiagram
    autonumber
    participant User as User (Browser)
    participant App as App Server
    participant KC as Keycloak
    participant IdP as Identity Provider<br/>(GitHub, Google, etc.)
    participant DB as User Database
    Note over User,DB: OAuth 2.0 / OIDC Authentication Flow
    User->>App: Access OpenHands
    App->>User: Redirect to Keycloak
    User->>KC: Login request
    KC->>User: Show login options
    User->>KC: Select provider (e.g., GitHub)
    KC->>IdP: OAuth redirect
    User->>IdP: Authenticate
    IdP-->>KC: OAuth callback + tokens
    Note over KC: Create/update user session
    KC-->>User: Redirect with auth code
    User->>App: Auth code
    App->>KC: Exchange code for tokens
    KC-->>App: Access token + Refresh token
    Note over App: Create signed JWT cookie
    App->>DB: Store/update user record
    App-->>User: Set keycloak_auth cookie
    Note over User,DB: Subsequent Requests
    User->>App: Request with cookie
    Note over App: Verify JWT signature
    App->>KC: Validate token (if needed)
    KC-->>App: Token valid
    Note over App: Extract user context
    App-->>User: Authorized response
 ```
 ### Authentication Components
 | Component | Purpose | Location |
 |-----------|---------|----------|
 | **Keycloak** | Identity provider, SSO, token management | External service |
 | **UserAuth** | Abstract auth interface | `openhands/server/user_auth/user_auth.py` |
 | **SaasUserAuth** | Keycloak implementation | `enterprise/server/auth/saas_user_auth.py` |
 | **JWT Service** | Token signing/verification | `openhands/app_server/services/jwt_service.py` |
 | **Auth Routes** | Login/logout endpoints | `enterprise/server/routes/auth.py` |
 ### Token Flow
 1. **Keycloak Access Token**: Short-lived token for API access
 2. **Keycloak Refresh Token**: Long-lived token to obtain new access tokens
 3. **Signed JWT Cookie**: App Server's session cookie containing encrypted Keycloak tokens
 4. **Provider Tokens**: OAuth tokens for GitHub, GitLab, etc. (stored separately for git operations)
--- a/enterprise/doc/architecture/external-integrations.md
+++ b/enterprise/doc/architecture/external-integrations.md
@@ -0,0 +1,88 @@
 # External Integrations
 OpenHands integrates with external services (GitHub, Slack, Jira, etc.) through webhook-based event handling:
 ```mermaid
 sequenceDiagram
    autonumber
    participant Ext as External Service<br/>(GitHub/Slack/Jira)
    participant App as App Server
    participant IntRouter as Integration Router
    participant Manager as Integration Manager
    participant Conv as Conversation Service
    participant Sandbox as Sandbox
    Note over Ext,Sandbox: Webhook Event Flow (e.g., GitHub Issue Created)
    Ext->>App: POST /api/integration/{service}/events
    App->>IntRouter: Route to service handler
    Note over IntRouter: Verify signature (HMAC)
    IntRouter->>Manager: Parse event payload
    Note over Manager: Extract context (repo, issue, user)
    Note over Manager: Map external user → OpenHands user
    Manager->>Conv: Create conversation (with issue context)
    Conv->>Sandbox: Provision sandbox
    Sandbox-->>Conv: Ready
    Manager->>Sandbox: Start agent with task
    Note over Ext,Sandbox: Agent Works on Task...
    Sandbox-->>Manager: Task complete
    Manager->>Ext: POST result<br/>(PR, comment, etc.)
    Note over Ext,Sandbox: Callback Flow (Agent → External Service)
    Sandbox->>App: Webhook callback<br/>/api/v1/webhooks
    App->>Manager: Process callback
    Manager->>Ext: Update external service
 ```
 ### Supported Integrations
 | Integration | Trigger Events | Agent Actions |
 |-------------|----------------|---------------|
 | **GitHub** | Issue created, PR opened, @mention | Create PR, comment, push commits |
 | **GitLab** | Issue created, MR opened | Create MR, comment, push commits |
 | **Slack** | @mention in channel | Reply in thread, create tasks |
 | **Jira** | Issue created/updated | Update ticket, add comments |
 | **Linear** | Issue created | Update status, add comments |
 ### Integration Components
 | Component | Purpose | Location |
 |-----------|---------|----------|
 | **Integration Routes** | Webhook endpoints per service | `enterprise/server/routes/integration/` |
 | **Integration Managers** | Business logic per service | `enterprise/integrations/{service}/` |
 | **Token Manager** | Store/retrieve OAuth tokens | `enterprise/server/auth/token_manager.py` |
 | **Callback Processor** | Handle agent → service updates | `enterprise/integrations/{service}/*_callback_processor.py` |
 ### Integration Authentication
 ```
 External Service (e.g., GitHub)
        │
        ▼
 ┌─────────────────────────────────┐
 │ GitHub App Installation         │
 │ - Webhook secret for signature  │
 │ - App private key for API calls │
 └─────────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────┐
 │ User Account Linking            │
 │ - Keycloak user ID              │
 │ - GitHub user ID                │
 │ - Stored OAuth tokens           │
 └─────────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────┐
 │ Agent Execution                 │
 │ - Uses linked tokens for API    │
 │ - Can push, create PRs, comment │
 └─────────────────────────────────┘
 ```
--- a/openhands/README.md
+++ b/openhands/README.md
@@ -1,8 +1,14 @@
 # OpenHands Architecture
 This directory contains the core components of OpenHands.
-For an overview of the system architecture, see the [architecture documentation](https://docs.openhands.dev/usage/architecture/backend) (v0 backend architecture).
+## Documentation
 **[Architecture Documentation](./architecture/README.md)** with diagrams covering:
  - System Architecture Overview
  - Conversation Startup & WebSocket Flow
  - Agent Execution & LLM Flow
 - **[External Architecture Docs](https://docs.openhands.dev/usage/architecture/backend)** - Official documentation (v0 backend architecture)
 ## Classes
--- a/openhands/architecture/README.md
+++ b/openhands/architecture/README.md
@@ -0,0 +1,10 @@
 # OpenHands Architecture
 Architecture diagrams and explanations for the OpenHands system.
 ## Documentation Sections
 - [System Architecture Overview](./system-architecture.md) - Multi-tier architecture and component responsibilities
 - [Conversation Startup & WebSocket Flow](./conversation-startup.md) - Runtime provisioning and real-time communication
 - [Agent Execution & LLM Flow](./agent-execution.md) - LLM integration and action execution loop
 - [Observability](./observability.md) - Logging, metrics, and monitoring
--- a/openhands/architecture/agent-execution.md
+++ b/openhands/architecture/agent-execution.md
@@ -0,0 +1,92 @@
 # Agent Execution & LLM Flow
 When the agent executes inside the sandbox, it makes LLM calls through LiteLLM:
 ```mermaid
 sequenceDiagram
    autonumber
    participant User as User (Browser)
    participant AS as Agent Server
    participant Agent as Agent<br/>(CodeAct)
    participant LLM as LLM Class
    participant Lite as LiteLLM
    participant Proxy as LLM Proxy<br/>(llm-proxy.app.all-hands.dev)
    participant Provider as LLM Provider<br/>(OpenAI, Anthropic, etc.)
    participant AES as Action Execution Server
    Note over User,AES: Agent Loop - LLM Call Flow
    User->>AS: WebSocket: User message
    AS->>Agent: Process message
    Note over Agent: Build prompt from state
    Agent->>LLM: completion(messages, tools)
    Note over LLM: Apply config (model, temp, etc.)
    alt Using OpenHands Provider
        LLM->>Lite: litellm_proxy/{model}
        Lite->>Proxy: POST /chat/completions
        Note over Proxy: Auth, rate limit, routing
        Proxy->>Provider: Forward request
        Provider-->>Proxy: Response
        Proxy-->>Lite: Response
    else Using Direct Provider
        LLM->>Lite: {provider}/{model}
        Lite->>Provider: Direct API call
        Provider-->>Lite: Response
    end
    Lite-->>LLM: ModelResponse
    Note over LLM: Track metrics (cost, tokens)
    LLM-->>Agent: Parsed response
    Note over Agent: Parse action from response
    AS->>User: WebSocket: Action event
    Note over User,AES: Action Execution
    AS->>AES: HTTP: Execute action
    Note over AES: Run command/edit file
    AES-->>AS: Observation
    AS->>User: WebSocket: Observation event
    Note over Agent: Update state
    Note over Agent: Loop continues...
 ```
 ### LLM Components
 | Component | Purpose | Location |
 |-----------|---------|----------|
 | **LLM Class** | Wrapper with retries, metrics, config | `openhands/llm/llm.py` |
 | **LiteLLM** | Universal LLM API adapter | External library |
 | **LLM Proxy** | OpenHands managed proxy for billing/routing | `llm-proxy.app.all-hands.dev` |
 | **LLM Registry** | Manages multiple LLM instances | `openhands/llm/llm_registry.py` |
 ### Model Routing
 ```
 User selects model
        │
        ▼
 ┌───────────────────┐
 │ Model prefix?     │
 └───────────────────┘
        │
        ├── openhands/claude-3-5  ──► Rewrite to litellm_proxy/claude-3-5
        │                              Base URL: llm-proxy.app.all-hands.dev
        │
        ├── anthropic/claude-3-5  ──► Direct to Anthropic API
        │                              (User's API key)
        │
        ├── openai/gpt-4          ──► Direct to OpenAI API
        │                              (User's API key)
        │
        └── azure/gpt-4           ──► Direct to Azure OpenAI
                                       (User's API key + endpoint)
 ```
 ### LLM Proxy
 When using `openhands/` prefixed models, requests are routed through a managed proxy.
 See the [OpenHands documentation](https://docs.openhands.dev/) for details on supported models.
--- a/openhands/architecture/conversation-startup.md
+++ b/openhands/architecture/conversation-startup.md
@@ -0,0 +1,68 @@
 # Conversation Startup & WebSocket Flow
 When a user starts a conversation, this sequence occurs:
 ```mermaid
 sequenceDiagram
    autonumber
    participant User as User (Browser)
    participant App as App Server
    participant SS as Sandbox Service
    participant RAPI as Runtime API
    participant Pool as Warm Pool
    participant Sandbox as Sandbox (Container)
    participant AS as Agent Server
    participant AES as Action Execution Server
    Note over User,AES: Phase 1: Conversation Creation
    User->>App: POST /api/conversations
    Note over App: Authenticate user
    App->>SS: Create sandbox
    Note over SS,Pool: Phase 2: Runtime Provisioning
    SS->>RAPI: POST /start (image, env, config)
    RAPI->>Pool: Check for warm runtime
    alt Warm runtime available
        Pool-->>RAPI: Return warm runtime
        Note over RAPI: Assign to session
    else No warm runtime
        RAPI->>Sandbox: Create new container
        Sandbox->>AS: Start Agent Server
        Sandbox->>AES: Start Action Execution Server
        AES-->>AS: Ready
    end
    RAPI-->>SS: Runtime URL + session API key
    SS-->>App: Sandbox info
    App-->>User: Conversation ID + Sandbox URL
    Note over User,AES: Phase 3: Direct WebSocket Connection
    User->>AS: WebSocket: /sockets/events/{id}
    AS-->>User: Connection accepted
    AS->>User: Replay historical events
    Note over User,AES: Phase 4: User Sends Message
    User->>AS: WebSocket: SendMessageRequest
    Note over AS: Agent processes message
    Note over AS: LLM call → generate action
    Note over User,AES: Phase 5: Action Execution Loop
    loop Agent Loop
        AS->>AES: HTTP: Execute action
        Note over AES: Run in sandbox
        AES-->>AS: Observation result
        AS->>User: WebSocket: Event update
        Note over AS: Update state, next action
    end
    Note over User,AES: Phase 6: Task Complete
    AS->>User: WebSocket: AgentStateChanged (FINISHED)
 ```
 ### Key Points
 1. **Initial Setup via App Server**: The App Server handles authentication and coordinates with the Sandbox Service
 2. **Runtime API Provisioning**: The Sandbox Service calls the Runtime API, which checks for warm runtimes before creating new containers
 3. **Warm Pool Optimization**: Pre-warmed runtimes reduce startup latency significantly
 4. **Direct WebSocket to Sandbox**: Once created, the user's browser connects **directly** to the Agent Server inside the sandbox
 5. **App Server Not in Hot Path**: After connection, all real-time communication bypasses the App Server entirely
 6. **Agent Server Orchestrates**: The Agent Server manages the AI loop, calling the Action Execution Server for actual command execution
--- a/openhands/architecture/observability.md
+++ b/openhands/architecture/observability.md
@@ -0,0 +1,85 @@
 # Observability
 OpenHands provides structured logging and metrics collection for monitoring and debugging.
 > **SDK Documentation**: For detailed guidance on observability and metrics in agent development, see:
 > - [SDK Observability Guide](https://docs.openhands.dev/sdk/guides/observability)
 > - [SDK Metrics Guide](https://docs.openhands.dev/sdk/guides/metrics)
 ```mermaid
 flowchart LR
    subgraph Sources["Sources"]
        Agent["Agent Server"]
        App["App Server"]
        Frontend["Frontend"]
    end
    subgraph Collection["Collection"]
        JSONLog["JSON Logs<br/>(stdout)"]
        Metrics["Metrics<br/>(Internal)"]
    end
    subgraph External["External (Optional)"]
        LogAgg["Log Aggregator"]
        Analytics["Analytics Service"]
    end
    Agent --> JSONLog
    App --> JSONLog
    App --> Metrics
    JSONLog --> LogAgg
    Frontend --> Analytics
 ```
 ### Structured Logging
 OpenHands uses Python's standard logging library with structured JSON output support.
 | Component | Format | Destination | Purpose |
 |-----------|--------|-------------|---------|
 | **Application Logs** | JSON (when `LOG_JSON=1`) | stdout | Debugging, error tracking |
 | **Access Logs** | JSON (Uvicorn) | stdout | Request tracing |
 | **LLM Debug Logs** | Plain text | File (optional) | LLM call debugging |
 ### JSON Log Format
 When `LOG_JSON=1` is set, logs are emitted as single-line JSON for ingestion by log aggregators:
 ```json
 {
  "message": "Conversation started",
  "severity": "INFO",
  "conversation_id": "abc-123",
  "user_id": "user-456",
  "timestamp": "2024-01-15T10:30:00Z"
 }
 ```
 Additional context can be added using Python's logger `extra=` parameter (see [Python logging docs](https://docs.python.org/3/library/logging.html)).
 ### Metrics
 | Metric | Tracked By | Storage | Purpose |
 |--------|------------|---------|---------|
 | **LLM Cost** | `Metrics` class | Conversation stats file | Billing, budget limits |
 | **Token Usage** | `Metrics` class | Conversation stats file | Usage analytics |
 | **Response Latency** | `Metrics` class | Conversation stats file | Performance monitoring |
 ### Conversation Stats Persistence
 Per-conversation metrics are persisted for analytics:
 ```python
 # Location: openhands/server/services/conversation_stats.py
 ConversationStats:
  - service_to_metrics: Dict[str, Metrics]
  - accumulated_cost: float
  - token_usage: TokenUsage
 # Stored at: {file_store}/conversation_stats/{conversation_id}.pkl
 ```
 ### Integration with External Services
 Structured JSON logging allows integration with any log aggregation service (e.g., ELK Stack, Loki, Splunk). Configure your log collector to ingest from container stdout/stderr.
--- a/openhands/architecture/system-architecture.md
+++ b/openhands/architecture/system-architecture.md
@@ -0,0 +1,88 @@
 # System Architecture Overview
 OpenHands supports multiple deployment configurations. This document describes the core components and how they interact.
 ## Local/Docker Deployment
 The simplest deployment runs everything locally or in Docker containers:
 ```mermaid
 flowchart TB
    subgraph Server["OpenHands Server"]
        API["REST API<br/>(FastAPI)"]
        ConvMgr["Conversation<br/>Manager"]
        Runtime["Runtime<br/>Manager"]
    end
    subgraph Sandbox["Sandbox (Docker Container)"]
        AES["Action Execution<br/>Server"]
        Browser["Browser<br/>Environment"]
        FS["File System"]
    end
    User["User"] -->|"HTTP/WebSocket"| API
    API --> ConvMgr
    ConvMgr --> Runtime
    Runtime -->|"Provision"| Sandbox
    Server -->|"Execute actions"| AES
    AES --> Browser
    AES --> FS
 ```
 ### Core Components
 | Component | Purpose | Location |
 |-----------|---------|----------|
 | **Server** | REST API, conversation management, runtime orchestration | `openhands/server/` |
 | **Runtime** | Abstract interface for sandbox execution | `openhands/runtime/` |
 | **Action Execution Server** | Execute bash, file ops, browser actions | Inside sandbox |
 | **EventStream** | Central event bus for all communication | `openhands/events/` |
 ## Scalable Deployment
 For production deployments, OpenHands can be configured with a separate Runtime API service:
 ```mermaid
 flowchart TB
    subgraph AppServer["App Server"]
        API["REST API"]
        ConvMgr["Conversation<br/>Manager"]
    end
    subgraph RuntimeAPI["Runtime API (Optional)"]
        RuntimeMgr["Runtime<br/>Manager"]
        WarmPool["Warm Pool"]
    end
    subgraph Sandbox["Sandbox"]
        AS["Agent Server"]
        AES["Action Execution<br/>Server"]
    end
    User["User"] -->|"HTTP"| API
    API --> ConvMgr
    ConvMgr -->|"Provision"| RuntimeMgr
    RuntimeMgr --> WarmPool
    RuntimeMgr --> Sandbox
    User -.->|"WebSocket"| AS
    AS -->|"HTTP"| AES
 ```
 This configuration enables:
 - **Warm pool**: Pre-provisioned runtimes for faster startup
 - **Direct WebSocket**: Users connect directly to their sandbox, bypassing the App Server
 - **Horizontal scaling**: App Server and Runtime API can scale independently
 ### Runtime Options
 OpenHands supports multiple runtime implementations:
 | Runtime | Use Case |
 |---------|----------|
 | **DockerRuntime** | Local development, single-machine deployments |
 | **RemoteRuntime** | Connect to externally managed sandboxes |
 | **ModalRuntime** | Serverless execution via Modal |
 See the [Runtime documentation](https://docs.openhands.dev/usage/architecture/runtime) for details.