Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
OpenHands Storage Module
The storage module provides different storage options for file operations in OpenHands, used for storing events, settings and other metadata. This module implements a common interface (FileStore) that allows for interchangeable storage backends.
Usage:
store = ...
# Write, read, list, and delete operations
store.write("example.txt", "Hello, world!")
content = store.read("example.txt")
files = store.list("/")
store.delete("example.txt")
Available Storage Options
1. Local File Storage (local)
Local file storage saves files to the local filesystem.
Environment Variables:
- None specific to this storage option
- Files are stored at the path specified by
file_store_pathin the configuration
2. In-Memory Storage (memory)
In-memory storage keeps files in memory, which is useful for testing or temporary storage.
Environment Variables:
- None
3. Amazon S3 Storage (s3)
S3 storage uses Amazon S3 or compatible services for file storage.
Environment Variables:
- The bucket name is specified by
file_store_pathin the configuration with a fallback to theAWS_S3_BUCKETenvironment variable. AWS_ACCESS_KEY_ID: Your AWS access keyAWS_SECRET_ACCESS_KEY: Your AWS secret keyAWS_S3_ENDPOINT: Optional custom endpoint for S3-compatible services (Allows overriding the default)AWS_S3_SECURE: Whether to use HTTPS (default: "true")
4. Google Cloud Storage (google_cloud)
Google Cloud Storage uses Google Cloud Storage buckets for file storage.
Environment Variables:
- The bucket name is specified by
file_store_pathin the configuration with a fallback to theGOOGLE_CLOUD_BUCKET_NAMEenvironment variable. GOOGLE_APPLICATION_CREDENTIALS: Path to Google Cloud credentials JSON file
Webhook Protocol
The webhook protocol allows for integration with external systems by sending HTTP requests when files are written or deleted.
Overview
The WebHookFileStore wraps another FileStore implementation and sends HTTP requests to a specified URL whenever files are written or deleted. This enables real-time notifications and synchronization with external systems.
Configuration Options:
file_store_web_hook_url: The base URL for webhook requestsfile_store_web_hook_headers: HTTP headers to include in webhook requestsfile_store_web_hook_batch: Whether to use batched webhook requests (default: false)
Protocol Details
Standard Webhook Protocol (Non-Batched)
-
File Write Operation:
- When a file is written, a POST request is sent to
{base_url}{path} - The request body contains the file contents
- The operation is retried up to 3 times with a 1-second delay between attempts
- When a file is written, a POST request is sent to
-
File Delete Operation:
- When a file is deleted, a DELETE request is sent to
{base_url}{path} - The operation is retried up to 3 times with a 1-second delay between attempts
- When a file is deleted, a DELETE request is sent to
Batched Webhook Protocol
The BatchedWebHookFileStore extends the webhook functionality by batching multiple file operations into a single request, which can significantly improve performance when many files are being modified in a short period of time.
-
Batch Request:
- A single POST request is sent to
{base_url}with a JSON array in the body - Each item in the array contains:
method: "POST" for write operations, "DELETE" for delete operationspath: The file pathcontent: The file contents (for write operations only)encoding: "base64" if binary content was base64-encoded (optional)
- A single POST request is sent to
-
Batch Triggering:
- Batches are sent when one of the following conditions is met:
- A timeout period has elapsed (defaults to 5 seconds, configurable via constructor parameter)
- The total size of batched content exceeds a size limit (defaults to 1MB, configurable via constructor parameter)
- The
flush()method is explicitly called
- Batches are sent when one of the following conditions is met:
-
Error Handling:
- The batch request is retried up to 3 times with a 1-second delay between attempts
Configuration
To configure the storage module in OpenHands, use the following configuration options:
[core]
# File store type: "local", "memory", "s3", "google_cloud"
file_store = "local"
# Path for local file store
file_store_path = "/tmp/file_store"
# Optional webhook URL
file_store_web_hook_url = "https://example.com/api/files"
# Optional webhook headers (JSON string)
file_store_web_hook_headers = '{"Authorization": "Bearer token"}'
# Optional batched webhook mode (default: false)
file_store_web_hook_batch = true
Batched Webhook Configuration: The batched webhook behavior uses predefined constants with the following default values:
- Batch timeout: 5 seconds
- Batch size limit: 1MB (1048576 bytes)
These values can be customized by passing batch_timeout_seconds and batch_size_limit_bytes parameters to the BatchedWebHookFileStore constructor.