85 Commits

Author SHA1 Message Date
Robert Brennan
01ae22ef57
Rename OpenDevin to OpenHands (#3472)
* Replace OpenDevin with OpenHands

* Update CONTRIBUTING.md

* Update README.md

* Update README.md

* update poetry lock; move opendevin folder to openhands

* fix env var

* revert image references in docs

* revert permissions

* revert permissions

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-08-20 00:44:54 +08:00
tobitege
d1b9787751
remove obsolete prompt.py file (codeact_agent) (#3450) 2024-08-19 09:18:36 +08:00
Xingyao Wang
537fb7d985
feat: convert agent prompts into structured Jinja2 templates (#3360)
* commit jinja draft

* remove extra file

* update system prompt

* remove github message

* update prompts

* add prompt manager and its tests

* use prompt manager for codeact and bump version

* fix integration tests

* fix lint

* simplify test case

* update system

* fix integration tests

* update credit path for aider

* Update CREDITS.md

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-08-18 16:38:46 +00:00
Xingyao Wang
a2ea17909d
chore: remove deprecated RuntimeTool (#3443) 2024-08-18 09:45:45 +08:00
Engel Nyst
92b1a2da5c
Refactor agent to accept agent config (#3430)
* refactor agents to receive their agent config

* add unit test

* fix test

* fix tests
2024-08-17 18:11:30 +02:00
Graham Neubig
7d331acffa
Handle error observations in codeact (#3383)
* Handle error observations in codeact

* Remove comments
2024-08-14 13:47:31 +00:00
Kaushik Deka
415843476c
Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848)
* add image feature

* fix-linting

* check model support for images

* add comment

* Add image support to other models

* Add images to chat

* fix linting

* fix test issues

* refactor variable names and import

* fix tests

* fix chat message tests

* fix linting

* add pydantic class message

* use message

* remove redundant comments

* remove redundant comments

* change Message class

* remove unintended change

* fix integration tests using regenerate.sh

* rename image_bas64 to images_url, fix tests

* rename Message.py to message, change reminder append logic, add unit tests

* remove comment, fix error to merge

* codeact_swe_agent

* fix f string

* update eventstream integration tests

* add missing if check in codeact_swe_agent

* update integration tests

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatMessage.tsx

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-08-04 02:26:22 +08:00
Xingyao Wang
2e60d25eae
[Agent, LLM] Make sure codeact agent produce message in u/a/u/a order (#3193)
* make sure codeact agent produce message in u/a/u/a order

* integration tests

* sync message changes to codeact swe

* fix integration tests

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-08-02 00:17:53 +08:00
Engel Nyst
3328669b89
fix Finish action to sent its 'thoughts' in the prompt (#3149) 2024-07-27 17:37:44 +00:00
linshaoxin-maker
800e25eac1
Modify codeAct paper link (#3076) 2024-07-23 20:25:54 +00:00
Xingyao Wang
6b16a5da0b
[Eval,Arch] Update GPTQ eval and add headless_mode for Controller (#2994)
* update and polish gptq eval

* fix typo

* Update evaluation/gpqa/README.md

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/gpqa/run_infer.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* add headless mode to all appropriate agent controller call

* delegate set to error when in headless mode

* try to deduplicate a bit

* make headless_mode default to True and only change it to false for AgentSession

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-07-20 03:35:48 +00:00
Graham Neubig
c897791024
Refactor LLM config (#2953)
* Add max_message_chars to LLM

* Refactor LLM config

* Fix tests

* Made some functions class functions

* Fix regression

* Fixed comments
2024-07-17 09:16:04 -04:00
Anush Kumar V
8f76587e5c
docs: updated docstrings using ruff's autofix feature (#2923)
* Updated documentation using ruff's autofix feature

* Updated pyproject.toml to include docstring validations

* Updated documentation using ruff's autofix feature

* Updated pyproject.toml to include docstring validations

* Updated docstrings using ruff's autfix feature

* Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-07-16 01:35:33 +00:00
Raj Maheshwari
64be2cb466
[Fix] Minor bug in parse_response of CodeActResponseParser (#2912) 2024-07-13 14:36:27 +00:00
Xingyao Wang
e45ddeb2a2
arch: deprecating recall action and search_memory (#2900)
* deprecating recall action

* fix integration tests

* fix integration tests

* remove search memory
2024-07-12 19:23:21 +00:00
Xingyao Wang
1b54800a29
[Agent] Improve edits by adding back edit_file_by_line (#2722)
* add replace-based block edit & preliminary test case fix

* further fix the insert behavior

* make edit only work on first occurence

* bump codeact version since we now use new edit agentskills

* update prompt for new agentskills

* update integration tests

* make run_infer.sh executable

* remove code block for edit_file

* update integration test for prompt changes

* default to not use hint for eval

* fix insert emptyfile bug

* throw value error when `to_replace` is empty

* make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt)

* add todo

* update integration test

* fix sandbox test for this PR

* fix inserting with additional newline

* rename to edit_file_by_replace

* add back `edit_file_by_line`

* update prompt for new editing tool

* fix integration tests

* bump codeact version since there are more changes

* add back append file

* fix current line for append

* fix append unit tests

* change the location where we show edited line no to agent and fix tests

* update integration tests

* fix global window size affect by open_file bug

* fix global window size affect by open_file bug

* increase window size to 300

* add file beginning and ending marker to avoid looping

* expand the editor window to better display edit error for model

* refractor to breakdown edit to internal functions

* reduce window to 200

* move window to 100

* refractor to cleanup some logic into _calculate_window_bounds

* fix integration tests

* fix sandbox test on new prompt

* update demonstration with new changes

* fix integration

* initialize llm inside process_instance to circumvent "AttributeError: Can't pickle local object"

* update kwargs

* retry for internal server error

* fix max iteration

* override max iter from config

* fix integration tests

* remove edit file by line

* fix integration tests

* add instruction to avoid hanging

* Revert "add instruction to avoid hanging"

This reverts commit 06fd2c59387c1c2348bc95cb487af1eb913c6ddd.

* handle content policy violation error

* fix integration tests

* fix typo in prompt - the window is 100

* update all integration tests

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2024-07-11 15:30:20 +00:00
Boxuan Li
c68478f470
Customize LLM config per agent (#2756)
Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent).

Partially solves #2075 (web GUI improvement is not the goal of this PR)
2024-07-09 22:05:54 -07:00
Engel Nyst
d37b2973b2
Refactoring: event stream based agent history (#2709)
* add to event stream sync

* remove async from tests

* small logging spam fix

* remove swe agent

* arch refactoring: use history from the event stream

* refactor agents

* monologue agent

* ruff

* planner agent

* micro-agents

* refactor history in evaluations

* evals history refactoring

* adapt evals and tests

* unit testing stuck

* testing micro agents, event stream

* fix planner agent

* fix tests

* fix stuck after rename

* fix test

* small clean up

* fix merge

* fix merge issue

* fix integration tests

* Update agenthub/dummy_agent/agent.py

* fix tests

* rename more clearly; add todo; clean up
2024-07-07 21:04:23 +00:00
Xingyao Wang
a47713ecb0
[Arch] Remove supports for Background Commands (#2803)
* depracting docker exec box

* remove doc exec from workflow and docs

* remove background commands

* Update tests/unit/test_sandbox.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* replace for-loop with assignment

* fix integration tests

* fix integration tests for shell script

* fix integration tests

* increase max iter to fix some monologue agent issue

* fix integration test again

* fix integration tests (seems related to run_user issue)

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-07-06 03:38:05 +08:00
sven
1b10e2b9d5
Make CodeAct finish task (#2673)
* Added feature to CodeAct agent to finish action instead of waiting for user input.

* Minor change

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>

* updated integration tests with claude-sonnet-3.5

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* updated tests to remove typo in prompt

* resolve merge conflicts II

* revert unintended change of regenerate script

* re-regenerating prompts to resolve merge conflicts

---------

Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-07-04 11:21:46 -07:00
Leo
c2f557edde
refactor: multiple code improvements (#2771) 2024-07-04 18:51:22 +08:00
Xingyao Wang
41ddba84bd
[Agent] (Potentially) improve Editing using diff (#2685)
* add replace-based block edit & preliminary test case fix

* further fix the insert behavior

* make edit only work on first occurence

* bump codeact version since we now use new edit agentskills

* update prompt for new agentskills

* update integration tests

* make run_infer.sh executable

* remove code block for edit_file

* update integration test for prompt changes

* default to not use hint for eval

* fix insert emptyfile bug

* throw value error when `to_replace` is empty

* make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt)

* add todo

* update integration test

* fix sandbox test for this PR
2024-07-02 11:50:15 +09:00
Engel Nyst
e24c52d060
Small refactoring of obs truncation (#2701)
* refactor truncate_content a bit to be usable by all agents

* adjust doc
2024-06-30 12:12:08 +02:00
Boxuan Li
e45b311c35
Remove MAX_CHARS traffic control (#2694)
* Remove MAX_CHARS limiting

* More cleanup
2024-06-29 12:59:41 -07:00
Boxuan Li
7766a3283e
CodeActAgent: Fix delegate history (#2672) 2024-06-28 16:37:23 +09:00
Engel Nyst
80fe13f4be
rename our completion as a drop-in replacement of litellm completion (#2509) 2024-06-19 05:25:25 +02:00
tobitege
823298e0d0
fix: Agentskills enhancements (#2384)
* avoid repeat logging of unneeded messages

* refactored append/edit_file (tests next)

* agentskills and unit test fixes

* testing

* more changes and test prompts

* smaller changes

* final test fixes

* remove dead code from test_agent.py

* reverting unneeded changes

* updated tests, more tweaks to skills

* refactor (#2442)

* chores: fix DelegatorAgent description (#2446)

* change

* change comments

* fix

* stopped container to prevent port issues. (#2447)

* chore: remove useless browsing code in CodeActSWEAgent (#2438)

* remove useless

* fix integration test

* Regenerate test_ipython_module artifacts for CodeActSWEAgent

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Merge remote-tracking branch 'upstream/main' into agent-fileops

* unneeded tweak

* * fix edit_file to not introduce extra newline
* updated docstrings with more details for LLM
* fix legacy typo in prompts causing ]] instead of ]
* several mock files regenerated

* Regen'ed CodeActSWEAgent integration tests

* fix _print_window signature; explicit exception type in _is_valid_path

* splitlines with named param

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-16 15:06:46 -04:00
tobitege
9605106e72
feat: append_file incl. all tests [agentskills] (#2346)
* new skill: append_file incl. all tests

* more tests needed caring

* file_name for append_file/edit_file; updated tests
2024-06-10 17:18:40 +00:00
Yufan Song
f7491bd2fa
Refactor response to action in agent step (#2350)
* refactor action parser

* Fix typos

* fix typo

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-10 10:17:30 +00:00
Temo
e925cefeef
Refactored prompt.py to reduce token usage (#1996)
* Refactored prompt.py to reduce token usage

* Reverted some destructive changes

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Apply suggestions from code review

* Apply suggestions from code review

* Update agenthub/codeact_agent/prompt.py

* fix integration test

* make lint

* feat: support ToolQA benchmark (#2263)

* Add files via upload

* Update README.md

* Update run_infer.py

* Update utils.py

* make lint

* Update evaluation/toolqa/run_infer.py

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* feat: revert hiden special paths change in file action (#2328)

* revert change in file action

* remove useless code

* make lint

* Support gpqa benchmark evaluation (#2080)

* feat: add gpqa benchmark evaluation

* add metrics

* reset configs in final block

* make lint

---------

Co-authored-by: yufansong <yufan@risingwave-labs.com>

* fix(frontend): prevent API key from resetting after modal change (#2329)

* remove bottom chatbox fade

* Modal wider; fix lint error

* settings: attempt to not clear api key for same provider

* prevent api key from resetting after changing the model

* revert other changes and fix post test tear down error

---------

Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>

* fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034)

* fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895

* fix: add CmdRunAction timeout hint.

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* regenerate integration test

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>

* Feat: Support Gorilla APIBench  (#2081)

* removed unused files from gorilla

* Update run_infer.py, removed unused imports

* Update utils.py

* Update ast_eval_hf.py

* Update ast_eval_tf.py

* Update ast_eval_th.py

* Create README.md

* Update run_infer.py

* make lint

* Update run_infer.py

* fix lint

---------

Co-authored-by: yufansong <yufan@risingwave-labs.com>

* remote useless (#2332)

* fix integration test

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* fix integration test

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Frank Xu <frankxu2004@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: yueqis <141804823+yueqis@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Jaskirat Singh <1.jaskiratsingh@gmail.com>
Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
Co-authored-by: Aaron Xia <zhhuaxia@gmail.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-06-09 10:19:05 -07:00
Aaron Xia
b5a17efc45
fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034)
* fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895

* fix: add CmdRunAction timeout hint.

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* regenerate integration test

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
2024-06-08 16:40:23 +00:00
Boxuan Li
45ce09d70e
CodeActAgent: Delegate to BrowsingAgent for browsing tasks (#2103) 2024-06-07 00:53:47 -07:00
Aaron Xia
42c6b506b5
Lazy launching BrowseEnv / making BrowseEnv optional (#2155)
* feat: lazy launching browser; browser optional for diffrent agents.

* style: lint

* fix: integration test fail due to browser not started.

* fix: run by cli and integration test failed.

* fix: lint

* fix: lint

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-31 16:40:42 -04:00
Xingyao Wang
01ef90205d
Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105)
* update swe_bench prompt;
use minimal prompt for codeact;

* upgrade agentskills and update testcases

* update infer prompt

* fix cwd

* add icl for swebench

* also log in_context_example to run infer

* remove extra print

* change prompt to abs path

* update error message to include current file info

* change cwd for jupyter if needed

* update edit error message

* update prompt

* improve git get patch

* update hint string

* default to 50 turns

* revert changes from codeact agent and create new CodeActSWEAgent

* revert changes to codeact

* revert instructions for run infer

* revert instructions for run infer

* update README

* update max iter

* add codeact swe agent

* fix issue for CodeActSWEAgent

* allow specifying max iter in cmdline script

* stop printing

* Update agenthub/codeact_swe_agent/README.md

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>

* Fix prompt regression in jupyter plugin

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-29 21:19:00 -07:00
மனோஜ்குமார் பழனிச்சாமி
d4ccd48af8
Persistent docker session (#1998)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-29 13:22:34 +00:00
Xingyao Wang
602ffcdffb
Implement agentskills for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
* add draft for skills

* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file

* Remove new_sample.txt file

* add some work from opendevin w/ fixes

* Add unit tests for agentskills module

* fix some issues and updated tests

* add more tests for open

* tweak and handle goto_line

* add tests for some edge cases

* add tests for scrolling

* add tests for edit

* add tests for search_dir

* update tests to use pytest

* use pytest --forked to avoid file op unit tests to interfere with each other via global var

* update doc based on swe agent tool

* update and add tests for find_file and search_file

* move agent_skills to plugins

* add agentskills as plugin and docs

* add agentskill to ssh box and fix sandbox integration

* remove extra returns in doc

* add agentskills to initial tool for jupyter

* support re-init jupyter kernel (for agentskills) after restart

* fix print window's issue with indentation and add testcases

* add prompt for codeact with the newest edit primitives

* modify the way line number is presented (remove leading space)

* change prompt to the newest display format

* support tracking of costs via metrics

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* implement and add tests for py linting

* remove extra text arg for incompatible subprocess ver

* remove sample.txt

* update test_edits integration tests

* fix all integration

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update opendevin/runtime/plugins/agent_skills/README.md

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/runtime/plugins/agent_skills/agentskills.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* correctly setup plugins for swebench eval

* bump swe-bench version and add logging

* Revert "correctly setup plugins for swebench eval"

This reverts commit 2bd10556739e2af602ea85371b976390f7c48077.

* bump version

* remove _AGENT_SKILLS_DOCS

* move flake8 to test dep

* update poetry.lock

* remove extra arg

* reduce max iter for eval

* update poetry

* fix integration tests

---------

Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-23 16:04:09 +00:00
Yufan Song
d18e6c85a0
feat: add metrics related to cost for better observability (#1944)
* add metrics for total_cost

* make lint

* refact codeact

* change metrics into llm

* add costs list, add into state

* refactor log completion

* refactor and test others

* make lint

* Update opendevin/core/metrics.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* Update opendevin/llm/llm.py

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* refactor

* add code

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-22 08:53:31 +00:00
Robert Brennan
0ecba83e53
Move message history out of CodeAct (#1847)
* stop keeping history state in codeact

* regenerate tests

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* revert tests

* regen tests

* refactor codeact a bit

* regenerate without using LLM

* simplify logic

* change to heredoc

* fix heredoc

* fix end_of_edit docs

* regen tests

* regenerate

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 18:39:27 +00:00
Gant
f950e3b48e
make CodeAct paper link correct (#1870) 2024-05-18 03:54:10 +00:00
மனோஜ்குமார் பழனிச்சாமி
b0b44ed467
Auto restarted Jupyter kernel (#1808)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-18 08:40:31 +05:30
Xingyao Wang
2406b901df
feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
* add draft dockerfile for build all

* add rsync for build

* add all-in-one docker

* update prepare scripts

* Update swe_env_box.py

* Add swe_entry.sh (buggy now)

* Parse the test command in swe_entry.sh

* Update README for instance eval in sandbox

* revert specialized config

* replace run_as_devin as an init arg

* set container & run_as_root via args

* update swe entry script

* update env

* remove mounting

* allow error after swe_entry

* update swe_env_box

* move file

* update gitignore

* get swe_env_box a working demo

* support faking user response & provide sandox ahead of time;
also return state for controller

* tweak main to support adding controller kwargs

* add module

* initialize plugin for provided sandbox

* add pip cache to plugin & fix jupyter kernel waiting

* better print Observation output

* add run infer scripts

* update readme

* add utility for getting diff patch

* use get_diff_patch in infer

* update readme

* support cost tracking for codeact

* add swe agent edit hack

* disable color in git diff

* fix git diff cmd

* fix state return

* support limit eval

* increase t imeout and export pip cache

* add eval limit config

* return state when hit turn limit

* save log to file; allow agent to give up

* run eval with max 50 turns

* add outputs to gitignore

* save swe_instance & instruction

* add uuid to swebench

* add streamlit dep

* fix save series

* fix the issue where session id might be duplicated

* allow setting temperature for llm (use 0 for eval)

* Get report from agent running log

* support evaluating task success right after inference.

* remove extra log

* comment out prompt for baseline

* add visualizer for eval

* use plaintext for instruction

* reduce timeout for all; only increase timeout for init

* reduce timeout for all; only increase timeout for init

* ignore sid for swe env

* close sandbox in each eval loop

* update visualizer instruction

* increase max chars

* add finish action to history too

* show test result in metrics

* add sidebars for visualizer

* also visualize swe_instance

* cleanup browser when agent controller finish runinng

* do not mount workspace for swe-eval to avoid accidentally overwrite files

* Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"

This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c.

* Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files""

This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e.

* run jupyter command via copy to, instead of cp to mount

* only print mixin output when failed

* change ssh box logging

* add visualizer for pass rate

* add instance id to sandbox name

* only remove container we created

* use opendevin logger in main

* support multi-processing infer

* add back metadata, support keyboard interrupt

* remove container with startswith

* make pbar behave correctly

* update instruction w/ multi-processing

* show resolved rate by repo

* rename tmp dir name

* attempt to fix racing for copy to ssh_box

* fix script

* bump swe-bench-all version

* fix ipython with self-contained commands

* add jupyter demo to swe_env_box

* make resolved count two column

* increase height

* do not add glob to url params

* analyze obs length

* print instance id prior to removal handler

* add gold patch in visualizer

* fix interactive git by adding a git --no-pager as alias

* increase max_char to 10k to cover 98% of swe-bench obs cases

* allow parsing note

* prompt v2

* add iteration reminder

* adjust user response

* adjust order

* fix return eval

* fix typo

* add reminder before logging

* remove other resolve rate

* re adjust to new folder structure

* support adding eval note

* fix eval note path

* make sure first log of each instance is printed

* add eval note

* fix the display for visualizer

* tweak visualizer for better git patch reading

* exclude empty patch

* add retry mechanism for swe_env_box start

* fix ssh timeout issue

* add stat field for apply test patch success

* add visualization for fine-grained report

* attempt to support monologue agent by constraining it to single thread

* also log error msg when stopeed

* save error as well

* override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp

* add retry mechanism for sshbox

* remove retry for swe env box

* try to handle loop state stopped

* Add get report scripts

* Add script to convert agent output to swe-bench format

* Merge fine grained report for visualizer

* Update eval readme

* Update README.md

* Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite

* Update the script to get model report

* Update get_model_report.sh

* Update get_agent_report.sh

* Update report merge script

* Add agent output conversion script

* Update swe_lite_env_setup.sh

* Add example swe-bench output files

* Update eval readme

* Remove redundant scripts

* set iteration count down to false by default

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>

* fix prepare_swe_util scripts

* update builder images

* update setup script

* remove swe-bench build workflow

* update lock

* remove experiments since they are moved to hf

* remove visualizer (since it is moved to hf repo)

* simply jupyter execution via heredoc

* update ssh_box

* add initial docker readme

* add pkg-config as dependency

* add script for swe_bench all-in-one docker

* add rsync to builder

* rename var

* update commit

* update readme

* update lock

* support specify timeout for long running tasks

* fix path

* separate building of all deps and files

* support returning states at the end of controller

* remove return None

* support specify timeout for long running tasks

* add timeout for all existing sandbox impl

* fix swe_env_box for new codebase

* update llm config in config.py

* support pass sandbox in

* remove force set

* update eval script

* fix issue of overriding final state

* change default eval output to hf demo

* change default eval output to hf demo

* fix config

* only close it when it is NOT external sandbox

* add scripts

* tweak config

* only put in hostory when state has history attr

* fix agent controller on the case of run out interaction budget

* always assume state is always not none

* remove print of final state

* catch all exception when cannot compute completion cost

* Update README.md

* save source into json

* fix path

* update docker path

* return the final state on close

* merge AgentState with State

* fix integration test

* merge AgentState with State

* fix integration test

* add ChangeAgentStateAction to history in attempt to fix integration

* add back set agent state

* update tests

* update tests

* move scripts for setup

* update script and readme for infer

* do not reset logger when n processes == 1

* update eval_infer scripts and readme

* simplify readme

* copy over dir after eval

* copy over dir after eval

* directly return get state

* update lock

* fix output saving of infer

* replace print with logger

* update eval_infer script

* add back the missing .close

* increase timeout

* copy all swe_bench_format file

* attempt to fix output parsing

* log git commit id as metadata

* fix eval script

* update lock

* update unit tests

* fix argparser unit test

* fix lock

* the deps are now lightweight enough to be incude in make build

* add spaces for tests

* add eval outputs to gitignore

* remove git submodule

* readme

* tweak git email

* update upload instruction

* bump codeact version for eval

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
Co-authored-by: huybery <huybery@gmail.com>
Co-authored-by: Bart Shappee <bshappee@gmail.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-15 16:15:55 +00:00
Frank Xu
a84d19f03c
Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807)
* enable browsing in codeact, and  arbitrary browsergym DSL support

* fix

* fix unit test case

* update frontend for the new interactive browsing action

* bump ver

* Fix integration tests

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
2024-05-15 11:59:58 -04:00
Boxuan Li
6714000b2c
CodeActAgent: Fix iteration reminder (#1803)
This PR includes three changes:
1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100
2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2`
3) Remove legacy ITERATION_REMINDER config
2024-05-15 13:48:47 +08:00
Graham Neubig
3cef8ee187
Add GitHub prompt to CodeAct (#1792)
* Added github to CodeAct

* More codeact

* Simplify prompt

* Modify codeact prompt

* fix integration test for CodeAct

* yet another integration test fix for codeact

* fix plugin use in jupyter

* update edit tests

* fix jupyter plugin potential port conflict

* fix test ipython with latest ipython fix

* update integration test

* wait a bit for jupyter execution

* add one unit tests for sandbox

* fix integration test

---------

Co-authored-by: OpenDevinBot <bot@opendevin.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-05-14 21:25:21 +00:00
Robert Brennan
beb74a19f6
Use event stream for the runtime (#1776)
* rebuild PR from scratch

* fix max_iter

* regenerate tests

* cut down on history

* Update opendevin/controller/agent_controller.py

* regenerate tests

* revert swe agent

* revert some codeact chagnes

* regenerate tests

* add source to dict

* only add source if not none

* try to fix coverage issue

* lock

* add gevent
2024-05-14 13:35:25 +00:00
Robert Brennan
82a798990c
refactor remind_iterations (#1760)
* refactor remind_iterations

* regenerate tests

* concatenate iteration message

* fix merge issues

* update integration tests
2024-05-14 08:27:12 -04:00
Robert Brennan
b028bd46bb
Use messages to drive tasks (#1688)
* finish is working

* start reworking main_goal

* remove main_goal from microagents

* remove main_goal from other agents

* fix issues

* revert codeact line

* make plan a subclass of task

* fix frontend for new plan setup

* lint

* fix type

* more lint

* fix build issues

* fix codeact mgs

* fix edge case in regen script

* fix task validation errors

* regenerate integration tests

* fix up tests

* fix sweagent

* revert codeact prompt

* update integration tests

* update integration tests

* handle loading state

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/agent_controller.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update opendevin/controller/state/plan.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* update docs

* regenerate tests

* remove none from state type

* revert test files

* update integration tests

* rename plan to root_task

* revert plugin perms

* regen integration tests

* tweak integration script

* prettier

* fix test

* set workspace up for regeneration

* regenerate tests

* Change directory of copy

* Updated tests

* Disable PlannerAgent test

* Fix listen

* Updated prompts

* Disable planner again

* Make codecov more lenient

* Update agenthub/README.md

* Update opendevin/server/README.md

* re-enable planner tests

* finish top level tasks

* regen planner

* fix root task factory

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-13 23:14:15 +00:00
Boxuan Li
316a772849
CodeAct: Emphasize open before edit (#1709)
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
2024-05-11 12:20:14 -07:00
Boxuan Li
bde12f4a09
CodeActAgent: Fix hack for multiple edits in same command (#1684)
* Fix edit hack for multiple edits in same command

This PR changes ([\s\S]*) to ([\s\S]*?) to make the capturing
group non-greedy. This change ensures that the regex captures
the smallest set of characters that extends up to the first
end_of_edit it encounters, rather than extending across multiple
edit commands.

Without the fix, a bash command consisting of multiple edits
would be corrupt and lead to unexpected edit results.
2024-05-10 23:32:09 -07:00
Bart Shappee
78cd2e5b47
fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)
* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-10 13:57:37 -04:00