Kaushik Deka 415843476c
Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848)
* add image feature

* fix-linting

* check model support for images

* add comment

* Add image support to other models

* Add images to chat

* fix linting

* fix test issues

* refactor variable names and import

* fix tests

* fix chat message tests

* fix linting

* add pydantic class message

* use message

* remove redundant comments

* remove redundant comments

* change Message class

* remove unintended change

* fix integration tests using regenerate.sh

* rename image_bas64 to images_url, fix tests

* rename Message.py to message, change reminder append logic, add unit tests

* remove comment, fix error to merge

* codeact_swe_agent

* fix f string

* update eventstream integration tests

* add missing if check in codeact_swe_agent

* update integration tests

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatMessage.tsx

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-08-04 02:26:22 +08:00
..

CodeAct Agent Framework

This folder implements the CodeAct idea (paper, tweet) that consolidates LLM agents actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

  1. Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
  2. CodeAct: Choose to perform the task by executing code
    • Execute any valid Linux bash command
    • Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

image

Plugin System

To make the CodeAct agent more powerful with only access to bash action space, CodeAct agent leverages OpenDevin's plugin system:

Demo

https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac

Example of CodeActAgent with gpt-4-turbo-2024-04-09 performing a data science task (linear regression)

Work-in-progress & Next step

[] Support web-browsing [] Complete the workflow for CodeAct agent to submit Github PRs