OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Robert Brennan	a4a7ad6c87	Create dependabot.yml (#1829 )	2024-05-16 11:55:38 -04:00
Boxuan Li	b6ff201780	Refactor integration test framework and relieve the pain of regeneration (#1818 ) * Update README.md * Fix WORKSPACE_MOUNT_PATH_IN_SANDBOX variable in regenerate.sh * Regenerate prompts without calling real LLM * Disable pytest warning capture * Change planner agent prompt by a bit for demo * Regenerate prompt files following prompt changes * doc: elaborate on FORCE_USE_LLM * Add another prompt change to monologue_agent for demo purpose * Regenerate prompts with FORCE_USE_LLM=true --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-16 08:30:29 -07:00
Leo	e89cc8f19b	Feat: add stream output to exec_run (#1625 ) * Feat: add stream output to exec_run * Using command timeout to control the exec_box's timeout. * add bash -c to source command to compatible for sh. Signed-off-by: ifuryst <ifuryst@gmail.com> * Feat: add stream output to SSHBox execute Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case fail. Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case import wrong path for method. Signed-off-by: ifuryst <ifuryst@gmail.com> --------- Signed-off-by: ifuryst <ifuryst@gmail.com>	2024-05-16 14:37:49 +00:00
Xingyao Wang	0fdbe1ee93	Update README.md (#1825 )	2024-05-16 11:06:28 +00:00
மனோஜ்குமார் பழனிச்சாமி	7313421ae4	Enabled LLM logs by default (#1819 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-16 10:35:18 +00:00
mzyddd	6cce9c3c28	Fix/attribute error (#1812 ) * refactor : delete useless messages.json messages * Update msg_stack.py * Update msg_stack.py * buf fix #1809 AttributeError * buf fix #1809 AttributeError --------- Co-authored-by: mengziyi.mzy <mengziyi.mzy@alibaba-inc.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-16 10:26:30 +00:00
Frank Xu	adea9b3f32	fix frontend browsing screenshot, allow link following in MD (#1817 )	2024-05-16 18:06:06 +08:00
yangpryili	52e21c20e3	Update msg_stack.py (#1820 ) * Update msg_stack.py 1、[msg.to_dict() for msg in msgs], msg is not instanse of Message, it not has a func of to_dict(), so msg.to_dict() will accur JSONDecodeError; 2、json.dump(new_data, file), it appends new_data to the end of the file instead of overwriting from the beginning, Hence, it's necessary to first perform file.seek(0) and file.truncate(). * Update opendevin/server/session/msg_stack.py --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-16 10:04:05 +00:00
sp.wack	15685f9aba	feat(frontend): uploading multiple files (#1718 ) * create test todos * extend to support uploading directories * remove dir-upload logic and feature drag-and-drop --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-16 17:41:15 +08:00
Xingyao Wang	9e59937180	fix utf-8 decoding issue (#1816 )	2024-05-15 22:49:49 -07:00
Xingyao Wang	2406b901df	feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 ) * add draft dockerfile for build all * add rsync for build * add all-in-one docker * update prepare scripts * Update swe_env_box.py * Add swe_entry.sh (buggy now) * Parse the test command in swe_entry.sh * Update README for instance eval in sandbox * revert specialized config * replace run_as_devin as an init arg * set container & run_as_root via args * update swe entry script * update env * remove mounting * allow error after swe_entry * update swe_env_box * move file * update gitignore * get swe_env_box a working demo * support faking user response & provide sandox ahead of time; also return state for controller * tweak main to support adding controller kwargs * add module * initialize plugin for provided sandbox * add pip cache to plugin & fix jupyter kernel waiting * better print Observation output * add run infer scripts * update readme * add utility for getting diff patch * use get_diff_patch in infer * update readme * support cost tracking for codeact * add swe agent edit hack * disable color in git diff * fix git diff cmd * fix state return * support limit eval * increase t imeout and export pip cache * add eval limit config * return state when hit turn limit * save log to file; allow agent to give up * run eval with max 50 turns * add outputs to gitignore * save swe_instance & instruction * add uuid to swebench * add streamlit dep * fix save series * fix the issue where session id might be duplicated * allow setting temperature for llm (use 0 for eval) * Get report from agent running log * support evaluating task success right after inference. * remove extra log * comment out prompt for baseline * add visualizer for eval * use plaintext for instruction * reduce timeout for all; only increase timeout for init * reduce timeout for all; only increase timeout for init * ignore sid for swe env * close sandbox in each eval loop * update visualizer instruction * increase max chars * add finish action to history too * show test result in metrics * add sidebars for visualizer * also visualize swe_instance * cleanup browser when agent controller finish runinng * do not mount workspace for swe-eval to avoid accidentally overwrite files * Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files" This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c. * Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"" This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e. * run jupyter command via copy to, instead of cp to mount * only print mixin output when failed * change ssh box logging * add visualizer for pass rate * add instance id to sandbox name * only remove container we created * use opendevin logger in main * support multi-processing infer * add back metadata, support keyboard interrupt * remove container with startswith * make pbar behave correctly * update instruction w/ multi-processing * show resolved rate by repo * rename tmp dir name * attempt to fix racing for copy to ssh_box * fix script * bump swe-bench-all version * fix ipython with self-contained commands * add jupyter demo to swe_env_box * make resolved count two column * increase height * do not add glob to url params * analyze obs length * print instance id prior to removal handler * add gold patch in visualizer * fix interactive git by adding a git --no-pager as alias * increase max_char to 10k to cover 98% of swe-bench obs cases * allow parsing note * prompt v2 * add iteration reminder * adjust user response * adjust order * fix return eval * fix typo * add reminder before logging * remove other resolve rate * re adjust to new folder structure * support adding eval note * fix eval note path * make sure first log of each instance is printed * add eval note * fix the display for visualizer * tweak visualizer for better git patch reading * exclude empty patch * add retry mechanism for swe_env_box start * fix ssh timeout issue * add stat field for apply test patch success * add visualization for fine-grained report * attempt to support monologue agent by constraining it to single thread * also log error msg when stopeed * save error as well * override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp * add retry mechanism for sshbox * remove retry for swe env box * try to handle loop state stopped * Add get report scripts * Add script to convert agent output to swe-bench format * Merge fine grained report for visualizer * Update eval readme * Update README.md * Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite * Update the script to get model report * Update get_model_report.sh * Update get_agent_report.sh * Update report merge script * Add agent output conversion script * Update swe_lite_env_setup.sh * Add example swe-bench output files * Update eval readme * Remove redundant scripts * set iteration count down to false by default * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io> * fix prepare_swe_util scripts * update builder images * update setup script * remove swe-bench build workflow * update lock * remove experiments since they are moved to hf * remove visualizer (since it is moved to hf repo) * simply jupyter execution via heredoc * update ssh_box * add initial docker readme * add pkg-config as dependency * add script for swe_bench all-in-one docker * add rsync to builder * rename var * update commit * update readme * update lock * support specify timeout for long running tasks * fix path * separate building of all deps and files * support returning states at the end of controller * remove return None * support specify timeout for long running tasks * add timeout for all existing sandbox impl * fix swe_env_box for new codebase * update llm config in config.py * support pass sandbox in * remove force set * update eval script * fix issue of overriding final state * change default eval output to hf demo * change default eval output to hf demo * fix config * only close it when it is NOT external sandbox * add scripts * tweak config * only put in hostory when state has history attr * fix agent controller on the case of run out interaction budget * always assume state is always not none * remove print of final state * catch all exception when cannot compute completion cost * Update README.md * save source into json * fix path * update docker path * return the final state on close * merge AgentState with State * fix integration test * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * move scripts for setup * update script and readme for infer * do not reset logger when n processes == 1 * update eval_infer scripts and readme * simplify readme * copy over dir after eval * copy over dir after eval * directly return get state * update lock * fix output saving of infer * replace print with logger * update eval_infer script * add back the missing .close * increase timeout * copy all swe_bench_format file * attempt to fix output parsing * log git commit id as metadata * fix eval script * update lock * update unit tests * fix argparser unit test * fix lock * the deps are now lightweight enough to be incude in make build * add spaces for tests * add eval outputs to gitignore * remove git submodule * readme * tweak git email * update upload instruction * bump codeact version for eval --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com> Co-authored-by: huybery <huybery@gmail.com> Co-authored-by: Bart Shappee <bshappee@gmail.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-05-15 16:15:55 +00:00
Frank Xu	a84d19f03c	Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807 ) * enable browsing in codeact, and arbitrary browsergym DSL support * fix * fix unit test case * update frontend for the new interactive browsing action * bump ver * Fix integration tests --------- Co-authored-by: OpenDevinBot <bot@opendevin.com>	2024-05-15 11:59:58 -04:00
Xia Zhenhua	76abca361c	feat: simplify state.history with to_memory call in micro-agent. Or the call to LLM may exceed the token limit. (#1806 ) * feat: simplify state.history with to_memory call in micro-agent. * feat: merge master and replace to_memory with event_to_memory. --------- Co-authored-by: aaren.xzh <aaren.xzh@antfin.com>	2024-05-15 14:47:37 +02:00
Xia Zhenhua	bf14b47890	feat: make other agents support asking user input in MessageAction. (#1777 ) * feat: make other agents support asking user input in MessageAction. * Update agenthub/micro/_instructions/actions/message.md Co-authored-by: Robert Brennan <accounts@rbren.io> * Update agenthub/micro/_instructions/actions/message.md Co-authored-by: Robert Brennan <accounts@rbren.io> * feat: make other agents support asking user input in MessageAction. * Regenerate test artifacts --------- Co-authored-by: aaren.xzh <aaren.xzh@antfin.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-15 00:44:45 -07:00
Shimada666	817222061f	refactor: jupyter scroll (#1799 ) * refactor: jupyter scroll * Update Jupyter.tsx --------- Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-05-15 07:39:55 +00:00
Boxuan Li	6714000b2c	CodeActAgent: Fix iteration reminder (#1803 ) This PR includes three changes: 1) Iteration reminder should start with MAX_ITERATIONS from config rather than default value 100 2) In the first prompt, we should tell the LLM it has `MAX_ITERATIONS - 1` turns left, rather than `MAX_ITERATIONS - 2` 3) Remove legacy ITERATION_REMINDER config	2024-05-15 13:48:47 +08:00
Xingyao Wang	d1fd277ad4	Support return final task states for evaluation (#1755 ) * support returning states at the end of controller * remove return None * fix issue of overriding final state * return the final state on close * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * directly return get state * add back the missing .close() * Update typo in opendevin/core/main.py --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-15 03:43:01 +00:00
Robert Brennan	c604f8fcd2	change error message to something more descriptive (#1790 )	2024-05-15 08:32:51 +08:00
Robert Brennan	135320861c	set a higer UID_MAX (#1788 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-15 00:28:37 +00:00
Xingyao Wang	123968f887	Runtime only close then sandbox if it is created by itself (#1793 )	2024-05-15 05:47:56 +08:00
Graham Neubig	3cef8ee187	Add GitHub prompt to CodeAct (#1792 ) * Added github to CodeAct * More codeact * Simplify prompt * Modify codeact prompt * fix integration test for CodeAct * yet another integration test fix for codeact * fix plugin use in jupyter * update edit tests * fix jupyter plugin potential port conflict * fix test ipython with latest ipython fix * update integration test * wait a bit for jupyter execution * add one unit tests for sandbox * fix integration test --------- Co-authored-by: OpenDevinBot <bot@opendevin.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-05-14 21:25:21 +00:00
Xingyao Wang	8d8ed0c3be	hotfix: Initialize plugin with new runtime (#1795 ) * fix plugin use in jupyter * fix jupyter plugin potential port conflict * update integration test * wait a bit for jupyter execution * add one unit tests for sandbox * fix integration test * fix integration * fix integration yet again * init sandbox plugins in the server	2024-05-14 21:15:19 +00:00
Shimada666	e4460a974d	feat: chat interface autoscroll (#1761 ) Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-05-14 18:00:16 +00:00
Robert Brennan	6ed17aad37	fix serialization (#1785 )	2024-05-14 13:46:15 -04:00
Marshall Roch	64ee5d404d	Fix CodeAct paper link (#1784 ) https://arxiv.org/abs/2402.13463 is RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models https://arxiv.org/abs/2402.01030 is Executable Code Actions Elicit Better LLM Agents	2024-05-14 17:40:07 +00:00
mamoodi	1d8402a14a	doc: Small fixes to documentation (#1783 )	2024-05-14 13:36:53 -04:00
Robert Brennan	dcb5d1ce0a	Add permanent storage option for EventStream (#1697 ) * add storage classes * add minio * add event stream storage * storage test working * use fixture * event stream test passing * better serialization * factor out serialization pkg * move more serialization * fix tests * fix test * remove __all__ * add rehydration test * add more rehydration test * fix fixture * fix dict init * update tests * lock * regenerate tests * Update opendevin/events/stream.py * revert tests * revert old integration tests * only add fields if present * regen tests * pin pyarrow * fix unit tests * remove cause from memories * revert tests * regen tests	2024-05-14 11:09:45 -04:00
Robert Brennan	beb74a19f6	Use event stream for the runtime (#1776 ) * rebuild PR from scratch * fix max_iter * regenerate tests * cut down on history * Update opendevin/controller/agent_controller.py * regenerate tests * revert swe agent * revert some codeact chagnes * regenerate tests * add source to dict * only add source if not none * try to fix coverage issue * lock * add gevent	2024-05-14 13:35:25 +00:00
Robert Brennan	82a798990c	refactor remind_iterations (#1760 ) * refactor remind_iterations * regenerate tests * concatenate iteration message * fix merge issues * update integration tests	2024-05-14 08:27:12 -04:00
Boxuan Li	3d53d363b4	Integration test: Verify finish state & add auto-rerun in regenerate.sh (#1773 ) * regenerate.sh: Allow testing on a specific agent and/or test * Check agent finish state * rengerate.sh: Rerun after fixing the prompts * Fix SWEAgent test_write_simple_script * Add more help message * Add a known issue to README.md * regenerate.sh: Fix help message typo * Fix a typo in README	2024-05-14 03:50:29 -04:00
Boxuan Li	b84f25ab35	Integration test: exit if no prompt match (#1772 )	2024-05-13 20:03:09 -07:00
Robert Brennan	2771328036	use -it and pull=always for docker (#1769 )	2024-05-13 19:17:57 -04:00
Robert Brennan	b028bd46bb	Use messages to drive tasks (#1688 ) * finish is working * start reworking main_goal * remove main_goal from microagents * remove main_goal from other agents * fix issues * revert codeact line * make plan a subclass of task * fix frontend for new plan setup * lint * fix type * more lint * fix build issues * fix codeact mgs * fix edge case in regen script * fix task validation errors * regenerate integration tests * fix up tests * fix sweagent * revert codeact prompt * update integration tests * update integration tests * handle loading state * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/agent_controller.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update opendevin/controller/state/plan.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * update docs * regenerate tests * remove none from state type * revert test files * update integration tests * rename plan to root_task * revert plugin perms * regen integration tests * tweak integration script * prettier * fix test * set workspace up for regeneration * regenerate tests * Change directory of copy * Updated tests * Disable PlannerAgent test * Fix listen * Updated prompts * Disable planner again * Make codecov more lenient * Update agenthub/README.md * Update opendevin/server/README.md * re-enable planner tests * finish top level tasks * regen planner * fix root task factory --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-13 23:14:15 +00:00
Robert Brennan	e28b3ef9e8	Fix integration tests (#1764 ) * refactor remind_iterations * regenerate tests * concatenate iteration message * add some helpers to the tests * regenerate tests * add to logs * regenerate tests * add debug info * fix exit_on_message * fix regen script * regenerate tests * Revert "Merge branch 'rb/test-regen' of ssh://github.com/opendevin/opendevin into rb/test-regen" This reverts commit b9cd1acbf2af07d5d01336039d0393eaf2183a41, reversing changes made to c888285304adcdee3e1ca0c62f6aa716e5121b45. * remove prints * revert files * revert more * revert more * regenerate for the last time I hope * add back remind_iter * regenerate * add back remind_iter * regenerate * fix remind_iter * regenerate yet again * regen * remove comment * regen again	2024-05-13 18:08:59 -04:00
wallter	ee66a1d5d1	Fix: Correct --add-host Flag Format in README (#1767 ) This PR updates the README to correct the format of the --add-host flag used in the Docker run command. The previous format, host.docker.internal=host-gateway, was incorrect and resulted in the following error: invalid argument "host.docker.internal=host-gateway" for "--add-host" flag: bad format for add-host: "host.docker.internal=host-gateway" Use code with caution. This PR fixes the issue by updating the flag to the correct format: --add-host host-gateway:host.docker.internal Use code with caution. This ensures that the Docker container can correctly resolve the host.docker.internal hostname to the host machine's gateway IP address.	2024-05-13 22:07:56 +00:00
Graham Neubig	b13d4647ab	Print out the regenerate command (#1759 ) * Print out the output of the regenerate command * Update regenerate.sh	2024-05-13 18:43:58 +00:00
Pete Stenger	a48b02207f	await closing the controller (#1751 ) * await closing the controller * Update manager.py * Cleanly exit * Update agent.py --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Jim Su <jimsu@protonmail.com>	2024-05-13 14:34:03 -04:00
Xingyao Wang	755a4072b6	Support specify timeout for long running tasks (#1756 ) * support specify timeout for long running tasks * add timeout for all existing sandbox impl * Update opendevin/runtime/docker/local_box.py Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/docker/exec_box.py Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/docker/ssh_box.py Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/e2b/sandbox.py Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/sandbox.py Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-05-13 10:17:03 +00:00
Xingyao Wang	00c0edae5f	Re-adjust ssh_box for parallel evaluation (#1729 ) * update ssh_box * fix controller in test --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-13 14:35:30 +08:00
Frank Xu	ba8d8634ac	fix browsergym to old ver (#1753 )	2024-05-12 22:05:37 -07:00
Boxuan Li	eba5ef8e67	Fix test_ipython (#1750 )	2024-05-12 16:15:32 -07:00
Xingyao Wang	4db4a84e2e	Simply Jupyter execution via heredoc (#1728 ) * simply jupyter execution via heredoc * make sure /tmp always exists * add integration test for jupyter exec	2024-05-13 04:57:06 +08:00
Boxuan Li	49de262577	opendevin/core/main.py: Graceful shutdown (#1731 ) * opendevin/core/main.py: Graceful shutdown * Shutdown controller at exit * Update opendevin/core/main.py --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-05-12 13:56:35 -07:00
Engel Nyst	e5f1dbf5e7	Move json utility to the custom json parsing; apply it to the monologue-like agents (#1740 )	2024-05-12 13:39:38 -04:00
Aleksandar	f861db6675	Enhance API Documentation (#1727 ) * Add Server Interaction Guide * Fix style * Remove the server_interaction.md and add docstrings doc * Remove very specific setup for the token from the doc * Fix mdx expression failure * Fix all examples * Fix missing empty args {} * Fix the run example to have and background	2024-05-12 08:58:01 -07:00
Robert Brennan	efd0d61e70	Fix the tests (#1737 ) * fix config patching * revert tests	2024-05-12 11:02:10 -04:00
Robert Brennan	d94b575cd4	Sandbox: adjust whitespace processing (#1474 ) * adjust whitespace processing * revert ssh_box * adjust tests * change lstrip to remove prefix * run tests on exec box * remove lstrips * fix multiline * remove stripping logic * fix single multiline commands * fix imports * fix multiline echo * better command splitter * fix merge issue	2024-05-12 14:41:50 +00:00
Xingyao Wang	8bfae8413e	Support passing sandbox as argument and iteration reminder (#1730 ) * support custom sandbox; add iteration_reminder * Enable iteration reminder in CodeActAgent integration test * Don't remove numbers when comparing prompts * Update tests/integration/README.md --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-12 07:57:33 +00:00
Xingyao Wang	1d58917bc8	remove swe-bench build workflow (#1726 )	2024-05-12 06:56:20 +08:00
Jens Roland	6a18cafa40	docs: fixed typo in launch command (#1724 ) The argument `--add-host host.docker.internal:host-gateway` should be `--add-host host.docker.internal=host-gateway` (with an `=` character). Solves `Error creating controller: Could not establish connection to host` errors. Co-authored-by: Jim Su <jimsu@protonmail.com>	2024-05-11 17:48:45 -04:00

1 2 3 4 5 ...

681 Commits