mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-25 21:36:52 +08:00

Replace All-Hands-AI references with OpenHands (#11287 )

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

2025-10-26 01:52:45 +02:00

4.3 KiB

Raw Blame History

CodeAct Agent Framework

This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on (CodeAct, tweet), an idea of consolidating LLM agents' actions into a unified code action space for both simplicity and performance.

Overview

The CodeAct agent operates through a function calling interface. At each turn, the agent can:

Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
CodeAct: Execute actions through a set of well-defined tools:
- Execute Linux bash commands with execute_bash
- Run Python code in an IPython environment with execute_ipython_cell
- Interact with web browsers using browser and fetch
- Edit files using str_replace_editor or edit_file

Built-in Tools

The agent provides several built-in tools:

1. `execute_bash`

Execute any valid Linux bash command
Handles long-running commands by running them in background with output redirection
Supports interactive processes with STDIN input and process interruption
Handles command timeouts with automatic retry in background mode

2. `execute_ipython_cell`

Run Python code in an IPython environment
Supports magic commands like %pip
Variables are scoped to the IPython environment
Requires defining variables and importing packages before use

3. `web_read` and `browser`

web_read: Read and convert webpage content to markdown
browser: Interact with webpages through Python code
Supports common browser actions like navigation, clicking, form filling, scrolling
Handles file uploads and drag-and-drop operations

4. `str_replace_editor`

View, create and edit files through string replacement
Persistent state across command calls
File viewing with line numbers
String replacement with exact matching
Undo functionality for edits

5. `edit_file` (LLM-based)

Edit files using LLM-based content generation
Support for partial file edits with line ranges
Handles large files by editing specific sections
Append mode for adding content to files

Configuration

Tools can be enabled/disabled through configuration parameters:

enable_browsing: Enable browser interaction tools
enable_jupyter: Enable IPython code execution
enable_llm_editor: Enable LLM-based file editing (falls back to string replacement editor if disabled)

Micro-agents

The agent includes specialized micro-agents for specific tasks:

npm: Handles npm package installation with non-interactive shell workarounds
github: Manages GitHub operations with API token support and PR creation guidelines
flarglebargle: Easter egg response handler

Adding New Tools

The CodeAct agent uses a function calling interface based on litellm's ChatCompletionToolParam. To add a new tool:

Define the tool in function_calling.py:

MyTool = ChatCompletionToolParam(
    type='function',
    function=ChatCompletionToolParamFunctionChunk(
        name='my_tool',
        description='Description of what the tool does and how to use it',
        parameters={
            'type': 'object',
            'properties': {
                'param1': {
                    'type': 'string',
                    'description': 'Description of parameter 1',
                },
                'param2': {
                    'type': 'integer',
                    'description': 'Description of parameter 2',
                },
            },
            'required': ['param1'],  # List required parameters here
        },
    ),
)

Add the tool to get_tools() in function_calling.py
Implement the corresponding action handler in the agent class

Implementation Details

The agent is implemented in two main files:

codeact_agent.py: Core agent implementation with:
- Message history management
- Tool execution handling
- State management
- Action/observation processing
function_calling.py: Tool definitions and function calling interface with:
- Tool parameter specifications
- Tool descriptions and examples
- Function calling response parsing

4.3 KiB Raw Blame History