This folder implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
CodeAct: Choose to perform the task by executing code
- Execute any valid Linux bash command
- Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

Adding New Tools

The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in function_calling.py using the ChatCompletionToolParam class from litellm. Each tool consists of:

A description string that explains what the tool does and how to use it
A tool definition using ChatCompletionToolParam that specifies:
- The tool's name
- The tool's parameters and their types
- Required vs optional parameters

Here's an example of how a tool is defined:

MyTool = ChatCompletionToolParam(
    type='function',
    function=ChatCompletionToolParamFunctionChunk(
        name='my_tool',
        description='Description of what the tool does and how to use it',
        parameters={
            'type': 'object',
            'properties': {
                'param1': {
                    'type': 'string',
                    'description': 'Description of parameter 1',
                },
                'param2': {
                    'type': 'integer',
                    'description': 'Description of parameter 2',
                },
            },
            'required': ['param1'],  # List required parameters here
        },
    ),
)

To add a new tool:

Define your tool in function_calling.py following the pattern above
Add your tool to the get_tools() function in function_calling.py
Implement the corresponding action handler in the agent to process the tool's invocation

The agent currently supports several built-in tools:

execute_bash: Execute bash commands
execute_ipython_cell: Run Python code in IPython
browser: Interact with a web browser
str_replace_editor: Edit files using string replacement
edit_file: Edit files using LLM-based editing

Tools can be enabled/disabled through configuration parameters:

codeact_enable_browsing: Enable browser interaction
codeact_enable_jupyter: Enable IPython code execution
codeact_enable_llm_editor: Enable LLM-based file editing (if disabled, uses string replacement editor instead)

README.md Unescape Escape

CodeAct Agent Framework

Adding New Tools

README.md