mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 13:47:19 +08:00

Files

Xingyao Wang a5195b0e65 chore: clean up sandbox and ssh related configs (#3301 )

* clean up sandbox and ssh related stuff

* remove ssh hostname

* remove ssh hostname

* remove ssh password

* update config

* fix typo that breaks the test

2024-08-08 22:15:40 +00:00

browsing_agent

Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 )

2024-08-04 02:26:22 +08:00

codeact_agent

Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 )

2024-08-04 02:26:22 +08:00

codeact_swe_agent

Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 )

2024-08-04 02:26:22 +08:00

delegator_agent

Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 )

2024-08-04 02:26:22 +08:00

dummy_agent

(arch) Switch default runtime to EventStream Runtime (#3271 )

2024-08-08 10:11:49 +08:00

micro

chore: clean up sandbox and ssh related configs (#3301 )

2024-08-08 22:15:40 +00:00

planner_agent

Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 )

2024-08-04 02:26:22 +08:00

__init__.py

Remove monologue agent (#3036 )

2024-07-19 19:25:05 +00:00

README.md

Change doc title of agent hub (#3100 )

2024-07-25 01:28:40 +00:00

README.md

Agent Hub

In this folder, there may exist multiple implementations of Agent that will be used by the framework.

For example, agenthub/codeact_agent, etc. Contributors from different backgrounds and interests can choose to contribute to any (or all!) of these directions.

Constructing an Agent

The abstraction for an agent can be found here.

Agents are run inside of a loop. At each iteration, agent.step() is called with a State input, and the agent must output an Action.

Every agent also has a self.llm which it can use to interact with the LLM configured by the user. See the LiteLLM docs for self.llm.completion.

State

The state contains:

A history of actions taken by the agent, as well as any observations (e.g. file content, command output) from those actions
A list of actions/observations that have happened since the most recent step
A root_task, which contains a plan of action
- The agent can add and modify subtasks through the AddTaskAction and ModifyTaskAction

Actions

Here is a list of available Actions, which can be returned by agent.step():

CmdRunAction - Runs a command inside a sandboxed terminal
IPythonRunCellAction - Execute a block of Python code interactively (in Jupyter notebook) and receives CmdOutputObservation. Requires setting up jupyter plugin as a requirement.
FileReadAction - Reads the content of a file
FileWriteAction - Writes new content to a file
BrowseURLAction - Gets the content of a URL
AddTaskAction - Adds a subtask to the plan
ModifyTaskAction - Changes the state of a subtask.
AgentFinishAction - Stops the control loop, allowing the user/delegator agent to enter a new task
AgentRejectAction - Stops the control loop, allowing the user/delegator agent to enter a new task
AgentFinishAction - Stops the control loop, allowing the user to enter a new task
MessageAction - Represents a message from an agent or the user

You can use action.to_dict() and action_from_dict to serialize and deserialize actions.

Observations

There are also several types of Observations. These are typically available in the step following the corresponding Action. But they may also appear as a result of asynchronous events (e.g. a message from the user).

Here is a list of available Observations:

You can use observation.to_dict() and observation_from_dict to serialize and deserialize observations.

Interface

Every agent must implement the following methods:

`step`

def step(self, state: "State") -> "Action"

step moves the agent forward one step towards its goal. This probably means sending a prompt to the LLM, then parsing the response into an Action.