mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

[Arch] Remove supports for Background Commands (#2803 )

* depracting docker exec box

* remove doc exec from workflow and docs

* remove background commands

* Update tests/unit/test_sandbox.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* replace for-loop with assignment

* fix integration tests

* fix integration tests for shell script

* fix integration tests

* increase max iter to fix some monologue agent issue

* fix integration test again

* fix integration tests (seems related to run_user issue)

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

2024-07-06 03:38:05 +08:00

6.2 KiB

Raw Blame History

sidebar_position

sidebar_position
3

🧠 Agents and Capabilities

CodeAct Agent

Description

This agent implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
CodeAct: Choose to perform the task by executing code

Execute any valid Linux bash command
Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

Plugin System

To make the CodeAct agent more powerful with only access to bash action space, CodeAct agent leverages OpenDevin's plugin system:

Jupyter plugin: for IPython execution via bash command
SWE-agent tool plugin: Powerful bash command line tools for software development tasks introduced by swe-agent.

Demo

https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac

Example of CodeActAgent with gpt-4-turbo-2024-04-09 performing a data science task (linear regression)

Actions

Action, CmdRunAction, IPythonRunCellAction, AgentEchoAction, AgentFinishAction, AgentTalkAction

Observations

CmdOutputObservation, IPythonRunCellObservation, AgentMessageObservation, UserMessageObservation

Methods

Method	Description
`__init__`	Initializes an agent with `llm` and a list of messages `list[Mapping[str, str]]`
`step`	Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute.
`search_memory`	Not yet implemented

Monologue Agent

Description

The Monologue Agent utilizes long and short term memory to complete tasks. Long term memory is stored as a LongTermMemory object and the model uses it to search for examples from the past. Short term memory is stored as a Monologue object and the model can condense it as necessary.

Actions

Action, NullAction, CmdRunAction, FileWriteAction, FileReadAction, AgentRecallAction, BrowseURLAction, GithubPushAction, AgentThinkAction

Observations

Observation, NullObservation, CmdOutputObservation, FileReadObservation, AgentRecallObservation, BrowserOutputObservation

Methods

Method	Description
`__init__`	Initializes the agent with a long term memory, and an internal monologue
`_add_event`	Appends events to the monologue of the agent and condenses with summary automatically if the monologue is too long
`_initialize`	Utilizes the `INITIAL_THOUGHTS` list to give the agent a context for its capabilities and how to navigate the `/workspace`
`step`	Modifies the current state by adding the most recent actions and observations, then prompts the model to think about its next action to take.
`search_memory`	Uses `VectorIndexRetriever` to find related memories within the long term memory.

Planner Agent

Description

The planner agent utilizes a special prompting strategy to create long term plans for solving problems. The agent is given its previous action-observation pairs, current task, and hint based on last action taken at every step.

Actions

NullAction, CmdRunAction, BrowseURLAction, GithubPushAction, FileReadAction, FileWriteAction, AgentRecallAction, AgentThinkAction, AgentFinishAction, AgentSummarizeAction, AddTaskAction, ModifyTaskAction,

Observations

Observation, NullObservation, CmdOutputObservation, FileReadObservation, AgentRecallObservation, BrowserOutputObservation

Methods

Method	Description
`__init__`	Initializes an agent with `llm`
`step`	Checks to see if current step is completed, returns `AgentFinishAction` if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action.
`search_memory`	Not yet implemented

6.2 KiB Raw Blame History Unescape Escape

🧠 Agents and Capabilities

CodeAct Agent

Description

Plugin System

Demo

Actions

Observations

Methods

Monologue Agent

Description

Actions

Observations

Methods

Planner Agent

Description

Actions

Observations

Methods

6.2 KiB

Raw Blame History