mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 13:52:43 +08:00
CodeAct Agent Framework
This folder implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).
The conceptual idea is illustrated below. At each turn, the agent can:
- Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
- CodeAct: Choose to perform the task by executing code
- Execute any valid Linux
bashcommand - Execute any valid
Pythoncode with an interactive Python interpreter. This is simulated throughbashcommand, see plugin system below for more details.
- Execute any valid Linux
Adding New Tools
The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in function_calling.py using the ChatCompletionToolParam class from litellm. Each tool consists of:
- A description string that explains what the tool does and how to use it
- A tool definition using
ChatCompletionToolParamthat specifies:- The tool's name
- The tool's parameters and their types
- Required vs optional parameters
Here's an example of how a tool is defined:
MyTool = ChatCompletionToolParam(
type='function',
function=ChatCompletionToolParamFunctionChunk(
name='my_tool',
description='Description of what the tool does and how to use it',
parameters={
'type': 'object',
'properties': {
'param1': {
'type': 'string',
'description': 'Description of parameter 1',
},
'param2': {
'type': 'integer',
'description': 'Description of parameter 2',
},
},
'required': ['param1'], # List required parameters here
},
),
)
To add a new tool:
- Define your tool in
function_calling.pyfollowing the pattern above - Add your tool to the
get_tools()function infunction_calling.py - Implement the corresponding action handler in the agent to process the tool's invocation
The agent currently supports several built-in tools:
execute_bash: Execute bash commandsexecute_ipython_cell: Run Python code in IPythonbrowser: Interact with a web browserstr_replace_editor: Edit files using string replacementedit_file: Edit files using LLM-based editing
Tools can be enabled/disabled through configuration parameters:
codeact_enable_browsing: Enable browser interactioncodeact_enable_jupyter: Enable IPython code executioncodeact_enable_llm_editor: Enable LLM-based file editing (if disabled, uses string replacement editor instead)