OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Robert Brennan	01ae22ef57	Rename OpenDevin to OpenHands (#3472 ) * Replace OpenDevin with OpenHands * Update CONTRIBUTING.md * Update README.md * Update README.md * update poetry lock; move opendevin folder to openhands * fix env var * revert image references in docs * revert permissions * revert permissions --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-20 00:44:54 +08:00
tobitege	d1b9787751	remove obsolete prompt.py file (codeact_agent) (#3450 )	2024-08-19 09:18:36 +08:00
Xingyao Wang	537fb7d985	feat: convert agent prompts into structured Jinja2 templates (#3360 ) * commit jinja draft * remove extra file * update system prompt * remove github message * update prompts * add prompt manager and its tests * use prompt manager for codeact and bump version * fix integration tests * fix lint * simplify test case * update system * fix integration tests * update credit path for aider * Update CREDITS.md Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-08-18 16:38:46 +00:00
Xingyao Wang	a2ea17909d	chore: remove deprecated RuntimeTool (#3443 )	2024-08-18 09:45:45 +08:00
Engel Nyst	92b1a2da5c	Refactor agent to accept agent config (#3430 ) * refactor agents to receive their agent config * add unit test * fix test * fix tests	2024-08-17 18:11:30 +02:00
Graham Neubig	7d331acffa	Handle error observations in codeact (#3383 ) * Handle error observations in codeact * Remove comments	2024-08-14 13:47:31 +00:00
Kaushik Deka	415843476c	Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 ) * add image feature * fix-linting * check model support for images * add comment * Add image support to other models * Add images to chat * fix linting * fix test issues * refactor variable names and import * fix tests * fix chat message tests * fix linting * add pydantic class message * use message * remove redundant comments * remove redundant comments * change Message class * remove unintended change * fix integration tests using regenerate.sh * rename image_bas64 to images_url, fix tests * rename Message.py to message, change reminder append logic, add unit tests * remove comment, fix error to merge * codeact_swe_agent * fix f string * update eventstream integration tests * add missing if check in codeact_swe_agent * update integration tests * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatMessage.tsx --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-08-04 02:26:22 +08:00
Xingyao Wang	2e60d25eae	[Agent, LLM] Make sure codeact agent produce message in u/a/u/a order (#3193 ) * make sure codeact agent produce message in u/a/u/a order * integration tests * sync message changes to codeact swe * fix integration tests --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-08-02 00:17:53 +08:00
Engel Nyst	3328669b89	fix Finish action to sent its 'thoughts' in the prompt (#3149 )	2024-07-27 17:37:44 +00:00
linshaoxin-maker	800e25eac1	Modify codeAct paper link (#3076 )	2024-07-23 20:25:54 +00:00
Xingyao Wang	6b16a5da0b	[Eval,Arch] Update GPTQ eval and add `headless_mode` for Controller (#2994 ) * update and polish gptq eval * fix typo * Update evaluation/gpqa/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/gpqa/run_infer.py Co-authored-by: Graham Neubig <neubig@gmail.com> * add headless mode to all appropriate agent controller call * delegate set to error when in headless mode * try to deduplicate a bit * make headless_mode default to True and only change it to false for AgentSession --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-20 03:35:48 +00:00
Graham Neubig	c897791024	Refactor LLM config (#2953 ) * Add max_message_chars to LLM * Refactor LLM config * Fix tests * Made some functions class functions * Fix regression * Fixed comments	2024-07-17 09:16:04 -04:00
Anush Kumar V	8f76587e5c	docs: updated docstrings using ruff's autofix feature (#2923 ) * Updated documentation using ruff's autofix feature * Updated pyproject.toml to include docstring validations * Updated documentation using ruff's autofix feature * Updated pyproject.toml to include docstring validations * Updated docstrings using ruff's autfix feature * Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-16 01:35:33 +00:00
Raj Maheshwari	64be2cb466	[Fix] Minor bug in parse_response of CodeActResponseParser (#2912 )	2024-07-13 14:36:27 +00:00
Xingyao Wang	e45ddeb2a2	arch: deprecating recall action and `search_memory` (#2900 ) * deprecating recall action * fix integration tests * fix integration tests * remove search memory	2024-07-12 19:23:21 +00:00
Xingyao Wang	1b54800a29	[Agent] Improve edits by adding back `edit_file_by_line` (#2722 ) * add replace-based block edit & preliminary test case fix * further fix the insert behavior * make edit only work on first occurence * bump codeact version since we now use new edit agentskills * update prompt for new agentskills * update integration tests * make run_infer.sh executable * remove code block for edit_file * update integration test for prompt changes * default to not use hint for eval * fix insert emptyfile bug * throw value error when `to_replace` is empty * make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt) * add todo * update integration test * fix sandbox test for this PR * fix inserting with additional newline * rename to edit_file_by_replace * add back `edit_file_by_line` * update prompt for new editing tool * fix integration tests * bump codeact version since there are more changes * add back append file * fix current line for append * fix append unit tests * change the location where we show edited line no to agent and fix tests * update integration tests * fix global window size affect by open_file bug * fix global window size affect by open_file bug * increase window size to 300 * add file beginning and ending marker to avoid looping * expand the editor window to better display edit error for model * refractor to breakdown edit to internal functions * reduce window to 200 * move window to 100 * refractor to cleanup some logic into _calculate_window_bounds * fix integration tests * fix sandbox test on new prompt * update demonstration with new changes * fix integration * initialize llm inside process_instance to circumvent "AttributeError: Can't pickle local object" * update kwargs * retry for internal server error * fix max iteration * override max iter from config * fix integration tests * remove edit file by line * fix integration tests * add instruction to avoid hanging * Revert "add instruction to avoid hanging" This reverts commit 06fd2c59387c1c2348bc95cb487af1eb913c6ddd. * handle content policy violation error * fix integration tests * fix typo in prompt - the window is 100 * update all integration tests --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-07-11 15:30:20 +00:00
Boxuan Li	c68478f470	Customize LLM config per agent (#2756 ) Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent). Partially solves #2075 (web GUI improvement is not the goal of this PR)	2024-07-09 22:05:54 -07:00
Engel Nyst	d37b2973b2	Refactoring: event stream based agent history (#2709 ) * add to event stream sync * remove async from tests * small logging spam fix * remove swe agent * arch refactoring: use history from the event stream * refactor agents * monologue agent * ruff * planner agent * micro-agents * refactor history in evaluations * evals history refactoring * adapt evals and tests * unit testing stuck * testing micro agents, event stream * fix planner agent * fix tests * fix stuck after rename * fix test * small clean up * fix merge * fix merge issue * fix integration tests * Update agenthub/dummy_agent/agent.py * fix tests * rename more clearly; add todo; clean up	2024-07-07 21:04:23 +00:00
Xingyao Wang	a47713ecb0	[Arch] Remove supports for Background Commands (#2803 ) * depracting docker exec box * remove doc exec from workflow and docs * remove background commands * Update tests/unit/test_sandbox.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * replace for-loop with assignment * fix integration tests * fix integration tests for shell script * fix integration tests * increase max iter to fix some monologue agent issue * fix integration test again * fix integration tests (seems related to run_user issue) --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-06 03:38:05 +08:00
sven	1b10e2b9d5	Make CodeAct finish task (#2673 ) * Added feature to CodeAct agent to finish action instead of waiting for user input. * Minor change * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> * updated integration tests with claude-sonnet-3.5 * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * updated tests to remove typo in prompt * resolve merge conflicts II * revert unintended change of regenerate script * re-regenerating prompts to resolve merge conflicts --------- Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-07-04 11:21:46 -07:00
Leo	c2f557edde	refactor: multiple code improvements (#2771 )	2024-07-04 18:51:22 +08:00
Xingyao Wang	41ddba84bd	[Agent] (Potentially) improve Editing using `diff` (#2685 ) * add replace-based block edit & preliminary test case fix * further fix the insert behavior * make edit only work on first occurence * bump codeact version since we now use new edit agentskills * update prompt for new agentskills * update integration tests * make run_infer.sh executable * remove code block for edit_file * update integration test for prompt changes * default to not use hint for eval * fix insert emptyfile bug * throw value error when `to_replace` is empty * make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt) * add todo * update integration test * fix sandbox test for this PR	2024-07-02 11:50:15 +09:00
Engel Nyst	e24c52d060	Small refactoring of obs truncation (#2701 ) * refactor truncate_content a bit to be usable by all agents * adjust doc	2024-06-30 12:12:08 +02:00
Boxuan Li	e45b311c35	Remove MAX_CHARS traffic control (#2694 ) * Remove MAX_CHARS limiting * More cleanup	2024-06-29 12:59:41 -07:00
Boxuan Li	7766a3283e	CodeActAgent: Fix delegate history (#2672 )	2024-06-28 16:37:23 +09:00
Engel Nyst	80fe13f4be	rename our completion as a drop-in replacement of litellm completion (#2509 )	2024-06-19 05:25:25 +02:00
tobitege	823298e0d0	fix: Agentskills enhancements (#2384 ) * avoid repeat logging of unneeded messages * refactored append/edit_file (tests next) * agentskills and unit test fixes * testing * more changes and test prompts * smaller changes * final test fixes * remove dead code from test_agent.py * reverting unneeded changes * updated tests, more tweaks to skills * refactor (#2442) * chores: fix DelegatorAgent description (#2446) * change * change comments * fix * stopped container to prevent port issues. (#2447) * chore: remove useless browsing code in CodeActSWEAgent (#2438) * remove useless * fix integration test * Regenerate test_ipython_module artifacts for CodeActSWEAgent --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Merge remote-tracking branch 'upstream/main' into agent-fileops * unneeded tweak * * fix edit_file to not introduce extra newline * updated docstrings with more details for LLM * fix legacy typo in prompts causing ]] instead of ] * several mock files regenerated * Regen'ed CodeActSWEAgent integration tests * fix _print_window signature; explicit exception type in _is_valid_path * splitlines with named param --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-16 15:06:46 -04:00
tobitege	9605106e72	feat: append_file incl. all tests [agentskills] (#2346 ) * new skill: append_file incl. all tests * more tests needed caring * file_name for append_file/edit_file; updated tests	2024-06-10 17:18:40 +00:00
Yufan Song	f7491bd2fa	Refactor response to action in agent step (#2350 ) * refactor action parser * Fix typos * fix typo --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-10 10:17:30 +00:00
Temo	e925cefeef	Refactored prompt.py to reduce token usage (#1996 ) * Refactored prompt.py to reduce token usage * Reverted some destructive changes * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Apply suggestions from code review * Apply suggestions from code review * Update agenthub/codeact_agent/prompt.py * fix integration test * make lint * feat: support ToolQA benchmark (#2263) * Add files via upload * Update README.md * Update run_infer.py * Update utils.py * make lint * Update evaluation/toolqa/run_infer.py --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * feat: revert hiden special paths change in file action (#2328) * revert change in file action * remove useless code * make lint * Support gpqa benchmark evaluation (#2080) * feat: add gpqa benchmark evaluation * add metrics * reset configs in final block * make lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * fix(frontend): prevent API key from resetting after modal change (#2329) * remove bottom chatbox fade * Modal wider; fix lint error * settings: attempt to not clear api key for same provider * prevent api key from resetting after changing the model * revert other changes and fix post test tear down error --------- Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> * fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034) * fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895 * fix: add CmdRunAction timeout hint. * Update agenthub/codeact_agent/prompt.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * regenerate integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> * Feat: Support Gorilla APIBench (#2081) * removed unused files from gorilla * Update run_infer.py, removed unused imports * Update utils.py * Update ast_eval_hf.py * Update ast_eval_tf.py * Update ast_eval_th.py * Create README.md * Update run_infer.py * make lint * Update run_infer.py * fix lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * remote useless (#2332) * fix integration test * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * fix integration test --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Frank Xu <frankxu2004@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: yueqis <141804823+yueqis@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Jaskirat Singh <1.jaskiratsingh@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> Co-authored-by: Aaron Xia <zhhuaxia@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-06-09 10:19:05 -07:00
Aaron Xia	b5a17efc45	fix: codeact bug [If running a command that never returns, it gets stuck #1895 ] (#2034 ) * fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895 * fix: add CmdRunAction timeout hint. * Update agenthub/codeact_agent/prompt.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * regenerate integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com>	2024-06-08 16:40:23 +00:00
Boxuan Li	45ce09d70e	CodeActAgent: Delegate to BrowsingAgent for browsing tasks (#2103 )	2024-06-07 00:53:47 -07:00
Aaron Xia	42c6b506b5	Lazy launching BrowseEnv / making BrowseEnv optional (#2155 ) * feat: lazy launching browser; browser optional for diffrent agents. * style: lint * fix: integration test fail due to browser not started. * fix: run by cli and integration test failed. * fix: lint * fix: lint --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-05-31 16:40:42 -04:00
Xingyao Wang	01ef90205d	Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105 ) * update swe_bench prompt; use minimal prompt for codeact; * upgrade agentskills and update testcases * update infer prompt * fix cwd * add icl for swebench * also log in_context_example to run infer * remove extra print * change prompt to abs path * update error message to include current file info * change cwd for jupyter if needed * update edit error message * update prompt * improve git get patch * update hint string * default to 50 turns * revert changes from codeact agent and create new CodeActSWEAgent * revert changes to codeact * revert instructions for run infer * revert instructions for run infer * update README * update max iter * add codeact swe agent * fix issue for CodeActSWEAgent * allow specifying max iter in cmdline script * stop printing * Update agenthub/codeact_swe_agent/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Fix prompt regression in jupyter plugin --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-29 21:19:00 -07:00
மனோஜ்குமார் பழனிச்சாமி	d4ccd48af8	Persistent docker session (#1998 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-05-29 13:22:34 +00:00
Xingyao Wang	602ffcdffb	Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941 ) * add draft for skills * Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file * Remove new_sample.txt file * add some work from opendevin w/ fixes * Add unit tests for agentskills module * fix some issues and updated tests * add more tests for open * tweak and handle goto_line * add tests for some edge cases * add tests for scrolling * add tests for edit * add tests for search_dir * update tests to use pytest * use pytest --forked to avoid file op unit tests to interfere with each other via global var * update doc based on swe agent tool * update and add tests for find_file and search_file * move agent_skills to plugins * add agentskills as plugin and docs * add agentskill to ssh box and fix sandbox integration * remove extra returns in doc * add agentskills to initial tool for jupyter * support re-init jupyter kernel (for agentskills) after restart * fix print window's issue with indentation and add testcases * add prompt for codeact with the newest edit primitives * modify the way line number is presented (remove leading space) * change prompt to the newest display format * support tracking of costs via metrics * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * implement and add tests for py linting * remove extra text arg for incompatible subprocess ver * remove sample.txt * update test_edits integration tests * fix all integration * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * correctly setup plugins for swebench eval * bump swe-bench version and add logging * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit 2bd10556739e2af602ea85371b976390f7c48077. * bump version * remove _AGENT_SKILLS_DOCS * move flake8 to test dep * update poetry.lock * remove extra arg * reduce max iter for eval * update poetry * fix integration tests --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-23 16:04:09 +00:00
Yufan Song	d18e6c85a0	feat: add metrics related to cost for better observability (#1944 ) * add metrics for total_cost * make lint * refact codeact * change metrics into llm * add costs list, add into state * refactor log completion * refactor and test others * make lint * Update opendevin/core/metrics.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/llm/llm.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor * add code --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-22 08:53:31 +00:00
Robert Brennan	0ecba83e53	Move message history out of CodeAct (#1847 ) * stop keeping history state in codeact * regenerate tests * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * revert tests * regen tests * refactor codeact a bit * regenerate without using LLM * simplify logic * change to heredoc * fix heredoc * fix end_of_edit docs * regen tests * regenerate --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-18 18:39:27 +00:00
Gant	f950e3b48e	make CodeAct paper link correct (#1870 )	2024-05-18 03:54:10 +00:00
மனோஜ்குமார் பழனிச்சாமி	b0b44ed467	Auto restarted Jupyter kernel (#1808 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-18 08:40:31 +05:30
Xingyao Wang	2406b901df	feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 ) * add draft dockerfile for build all * add rsync for build * add all-in-one docker * update prepare scripts * Update swe_env_box.py * Add swe_entry.sh (buggy now) * Parse the test command in swe_entry.sh * Update README for instance eval in sandbox * revert specialized config * replace run_as_devin as an init arg * set container & run_as_root via args * update swe entry script * update env * remove mounting * allow error after swe_entry * update swe_env_box * move file * update gitignore * get swe_env_box a working demo * support faking user response & provide sandox ahead of time; also return state for controller * tweak main to support adding controller kwargs * add module * initialize plugin for provided sandbox * add pip cache to plugin & fix jupyter kernel waiting * better print Observation output * add run infer scripts * update readme * add utility for getting diff patch * use get_diff_patch in infer * update readme * support cost tracking for codeact * add swe agent edit hack * disable color in git diff * fix git diff cmd * fix state return * support limit eval * increase t imeout and export pip cache * add eval limit config * return state when hit turn limit * save log to file; allow agent to give up * run eval with max 50 turns * add outputs to gitignore * save swe_instance & instruction * add uuid to swebench * add streamlit dep * fix save series * fix the issue where session id might be duplicated * allow setting temperature for llm (use 0 for eval) * Get report from agent running log * support evaluating task success right after inference. * remove extra log * comment out prompt for baseline * add visualizer for eval * use plaintext for instruction * reduce timeout for all; only increase timeout for init * reduce timeout for all; only increase timeout for init * ignore sid for swe env * close sandbox in each eval loop * update visualizer instruction * increase max chars * add finish action to history too * show test result in metrics * add sidebars for visualizer * also visualize swe_instance * cleanup browser when agent controller finish runinng * do not mount workspace for swe-eval to avoid accidentally overwrite files * Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files" This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c. * Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"" This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e. * run jupyter command via copy to, instead of cp to mount * only print mixin output when failed * change ssh box logging * add visualizer for pass rate * add instance id to sandbox name * only remove container we created * use opendevin logger in main * support multi-processing infer * add back metadata, support keyboard interrupt * remove container with startswith * make pbar behave correctly * update instruction w/ multi-processing * show resolved rate by repo * rename tmp dir name * attempt to fix racing for copy to ssh_box * fix script * bump swe-bench-all version * fix ipython with self-contained commands * add jupyter demo to swe_env_box * make resolved count two column * increase height * do not add glob to url params * analyze obs length * print instance id prior to removal handler * add gold patch in visualizer * fix interactive git by adding a git --no-pager as alias * increase max_char to 10k to cover 98% of swe-bench obs cases * allow parsing note * prompt v2 * add iteration reminder * adjust user response * adjust order * fix return eval * fix typo * add reminder before logging * remove other resolve rate * re adjust to new folder structure * support adding eval note * fix eval note path * make sure first log of each instance is printed * add eval note * fix the display for visualizer * tweak visualizer for better git patch reading * exclude empty patch * add retry mechanism for swe_env_box start * fix ssh timeout issue * add stat field for apply test patch success * add visualization for fine-grained report * attempt to support monologue agent by constraining it to single thread * also log error msg when stopeed * save error as well * override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp * add retry mechanism for sshbox * remove retry for swe env box * try to handle loop state stopped * Add get report scripts * Add script to convert agent output to swe-bench format * Merge fine grained report for visualizer * Update eval readme * Update README.md * Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite * Update the script to get model report * Update get_model_report.sh * Update get_agent_report.sh * Update report merge script * Add agent output conversion script * Update swe_lite_env_setup.sh * Add example swe-bench output files * Update eval readme * Remove redundant scripts * set iteration count down to false by default * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io> * fix prepare_swe_util scripts * update builder images * update setup script * remove swe-bench build workflow * update lock * remove experiments since they are moved to hf * remove visualizer (since it is moved to hf repo) * simply jupyter execution via heredoc * update ssh_box * add initial docker readme * add pkg-config as dependency * add script for swe_bench all-in-one docker * add rsync to builder * rename var * update commit * update readme * update lock * support specify timeout for long running tasks * fix path * separate building of all deps and files * support returning states at the end of controller * remove return None * support specify timeout for long running tasks * add timeout for all existing sandbox impl * fix swe_env_box for new codebase * update llm config in config.py * support pass sandbox in * remove force set * update eval script * fix issue of overriding final state * change default eval output to hf demo * change default eval output to hf demo * fix config * only close it when it is NOT external sandbox * add scripts * tweak config * only put in hostory when state has history attr * fix agent controller on the case of run out interaction budget * always assume state is always not none * remove print of final state * catch all exception when cannot compute completion cost * Update README.md * save source into json * fix path * update docker path * return the final state on close * merge AgentState with State * fix integration test * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * move scripts for setup * update script and readme for infer * do not reset logger when n processes == 1 * update eval_infer scripts and readme * simplify readme * copy over dir after eval * copy over dir after eval * directly return get state * update lock * fix output saving of infer * replace print with logger * update eval_infer script * add back the missing .close * increase timeout * copy all swe_bench_format file * attempt to fix output parsing * log git commit id as metadata * fix eval script * update lock * update unit tests * fix argparser unit test * fix lock * the deps are now lightweight enough to be incude in make build * add spaces for tests * add eval outputs to gitignore * remove git submodule * readme * tweak git email * update upload instruction * bump codeact version for eval --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com> Co-authored-by: huybery <huybery@gmail.com> Co-authored-by: Bart Shappee <bshappee@gmail.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-05-15 16:15:55 +00:00
Frank Xu	a84d19f03c	Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807 ) * enable browsing in codeact, and arbitrary browsergym DSL support * fix * fix unit test case * update frontend for the new interactive browsing action * bump ver * Fix integration tests --------- Co-authored-by: OpenDevinBot <bot@opendevin.com>	2024-05-15 11:59:58 -04:00
Boxuan Li	6714000b2c	CodeActAgent: Fix iteration reminder (#1803 ) This PR includes three changes: 1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100 2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2` 3) Remove legacy ITERATION_REMINDER config	2024-05-15 13:48:47 +08:00
Graham Neubig	3cef8ee187	Add GitHub prompt to CodeAct (#1792 ) * Added github to CodeAct * More codeact * Simplify prompt * Modify codeact prompt * fix integration test for CodeAct * yet another integration test fix for codeact * fix plugin use in jupyter * update edit tests * fix jupyter plugin potential port conflict * fix test ipython with latest ipython fix * update integration test * wait a bit for jupyter execution * add one unit tests for sandbox * fix integration test --------- Co-authored-by: OpenDevinBot <bot@opendevin.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-14 21:25:21 +00:00
Robert Brennan	beb74a19f6	Use event stream for the runtime (#1776 ) * rebuild PR from scratch * fix max_iter * regenerate tests * cut down on history * Update opendevin/controller/agent_controller.py * regenerate tests * revert swe agent * revert some codeact chagnes * regenerate tests * add source to dict * only add source if not none * try to fix coverage issue * lock * add gevent	2024-05-14 13:35:25 +00:00
Robert Brennan	82a798990c	refactor remind_iterations (#1760 ) * refactor remind_iterations * regenerate tests * concatenate iteration message * fix merge issues * update integration tests	2024-05-14 08:27:12 -04:00
Robert Brennan	b028bd46bb	Use messages to drive tasks (#1688 ) * finish is working * start reworking main_goal * remove main_goal from microagents * remove main_goal from other agents * fix issues * revert codeact line * make plan a subclass of task * fix frontend for new plan setup * lint * fix type * more lint * fix build issues * fix codeact mgs * fix edge case in regen script * fix task validation errors * regenerate integration tests * fix up tests * fix sweagent * revert codeact prompt * update integration tests * update integration tests * handle loading state * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/agent_controller.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/state/plan.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * update docs * regenerate tests * remove none from state type * revert test files * update integration tests * rename plan to root_task * revert plugin perms * regen integration tests * tweak integration script * prettier * fix test * set workspace up for regeneration * regenerate tests * Change directory of copy * Updated tests * Disable PlannerAgent test * Fix listen * Updated prompts * Disable planner again * Make codecov more lenient * Update agenthub/README.md * Update opendevin/server/README.md * re-enable planner tests * finish top level tasks * regen planner * fix root task factory --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-13 23:14:15 +00:00
Boxuan Li	316a772849	CodeAct: Emphasize open before edit (#1709 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-11 12:20:14 -07:00
Boxuan Li	bde12f4a09	CodeActAgent: Fix hack for multiple edits in same command (#1684 ) * Fix edit hack for multiple edits in same command This PR changes ([\s\S]) to ([\s\S]?) to make the capturing group non-greedy. This change ensures that the regex captures the smallest set of characters that extends up to the first end_of_edit it encounters, rather than extending across multiple edit commands. Without the fix, a bash command consisting of multiple edits would be corrupt and lead to unexpected edit results.	2024-05-10 23:32:09 -07:00
Bart Shappee	78cd2e5b47	fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666 ) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-05-10 13:57:37 -04:00

1 2

85 Commits