mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 05:48:36 +08:00
docs: Update CodeAct agent documentation (#5418)
Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
parent
786cde39fd
commit
83b94786a3
@ -1,28 +1,75 @@
|
||||
# CodeAct Agent Framework
|
||||
|
||||
This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
|
||||
This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on ([CodeAct](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)), an idea of consolidating LLM agents' **act**ions into a unified **code** action space for both *simplicity* and *performance*.
|
||||
|
||||
The conceptual idea is illustrated below. At each turn, the agent can:
|
||||
## Overview
|
||||
|
||||
The CodeAct agent operates through a function calling interface. At each turn, the agent can:
|
||||
|
||||
1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
|
||||
2. **CodeAct**: Choose to perform the task by executing code
|
||||
- Execute any valid Linux `bash` command
|
||||
- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
|
||||
2. **CodeAct**: Execute actions through a set of well-defined tools:
|
||||
- Execute Linux `bash` commands with `execute_bash`
|
||||
- Run Python code in an [IPython](https://ipython.org/) environment with `execute_ipython_cell`
|
||||
- Interact with web browsers using `browser` and `web_read`
|
||||
- Edit files using `str_replace_editor` or `edit_file`
|
||||
|
||||

|
||||
|
||||
## Built-in Tools
|
||||
|
||||
The agent provides several built-in tools:
|
||||
|
||||
### 1. `execute_bash`
|
||||
- Execute any valid Linux bash command
|
||||
- Handles long-running commands by running them in background with output redirection
|
||||
- Supports interactive processes with STDIN input and process interruption
|
||||
- Handles command timeouts with automatic retry in background mode
|
||||
|
||||
### 2. `execute_ipython_cell`
|
||||
- Run Python code in an IPython environment
|
||||
- Supports magic commands like `%pip`
|
||||
- Variables are scoped to the IPython environment
|
||||
- Requires defining variables and importing packages before use
|
||||
|
||||
### 3. `web_read` and `browser`
|
||||
- `web_read`: Read and convert webpage content to markdown
|
||||
- `browser`: Interact with webpages through Python code
|
||||
- Supports common browser actions like navigation, clicking, form filling, scrolling
|
||||
- Handles file uploads and drag-and-drop operations
|
||||
|
||||
### 4. `str_replace_editor`
|
||||
- View, create and edit files through string replacement
|
||||
- Persistent state across command calls
|
||||
- File viewing with line numbers
|
||||
- String replacement with exact matching
|
||||
- Undo functionality for edits
|
||||
|
||||
### 5. `edit_file` (LLM-based)
|
||||
- Edit files using LLM-based content generation
|
||||
- Support for partial file edits with line ranges
|
||||
- Handles large files by editing specific sections
|
||||
- Append mode for adding content to files
|
||||
|
||||
## Configuration
|
||||
|
||||
Tools can be enabled/disabled through configuration parameters:
|
||||
- `codeact_enable_browsing`: Enable browser interaction tools
|
||||
- `codeact_enable_jupyter`: Enable IPython code execution
|
||||
- `codeact_enable_llm_editor`: Enable LLM-based file editing (falls back to string replacement editor if disabled)
|
||||
|
||||
## Micro-agents
|
||||
|
||||
The agent includes specialized micro-agents for specific tasks:
|
||||
|
||||
1. **npm**: Handles npm package installation with non-interactive shell workarounds
|
||||
2. **github**: Manages GitHub operations with API token support and PR creation guidelines
|
||||
3. **flarglebargle**: Easter egg response handler
|
||||
|
||||
## Adding New Tools
|
||||
|
||||
The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in `function_calling.py` using the `ChatCompletionToolParam` class from `litellm`. Each tool consists of:
|
||||
|
||||
1. A description string that explains what the tool does and how to use it
|
||||
2. A tool definition using `ChatCompletionToolParam` that specifies:
|
||||
- The tool's name
|
||||
- The tool's parameters and their types
|
||||
- Required vs optional parameters
|
||||
|
||||
Here's an example of how a tool is defined:
|
||||
The CodeAct agent uses a function calling interface based on `litellm`'s `ChatCompletionToolParam`. To add a new tool:
|
||||
|
||||
1. Define the tool in `function_calling.py`:
|
||||
```python
|
||||
MyTool = ChatCompletionToolParam(
|
||||
type='function',
|
||||
@ -47,20 +94,20 @@ MyTool = ChatCompletionToolParam(
|
||||
)
|
||||
```
|
||||
|
||||
To add a new tool:
|
||||
2. Add the tool to `get_tools()` in `function_calling.py`
|
||||
3. Implement the corresponding action handler in the agent class
|
||||
|
||||
1. Define your tool in `function_calling.py` following the pattern above
|
||||
2. Add your tool to the `get_tools()` function in `function_calling.py`
|
||||
3. Implement the corresponding action handler in the agent to process the tool's invocation
|
||||
## Implementation Details
|
||||
|
||||
The agent currently supports several built-in tools:
|
||||
- `execute_bash`: Execute bash commands
|
||||
- `execute_ipython_cell`: Run Python code in IPython
|
||||
- `browser`: Interact with a web browser
|
||||
- `str_replace_editor`: Edit files using string replacement
|
||||
- `edit_file`: Edit files using LLM-based editing
|
||||
The agent is implemented in two main files:
|
||||
|
||||
Tools can be enabled/disabled through configuration parameters:
|
||||
- `codeact_enable_browsing`: Enable browser interaction
|
||||
- `codeact_enable_jupyter`: Enable IPython code execution
|
||||
- `codeact_enable_llm_editor`: Enable LLM-based file editing (if disabled, uses string replacement editor instead)
|
||||
1. `codeact_agent.py`: Core agent implementation with:
|
||||
- Message history management
|
||||
- Tool execution handling
|
||||
- State management
|
||||
- Action/observation processing
|
||||
|
||||
2. `function_calling.py`: Tool definitions and function calling interface with:
|
||||
- Tool parameter specifications
|
||||
- Tool descriptions and examples
|
||||
- Function calling response parsing
|
||||
Loading…
x
Reference in New Issue
Block a user