OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Xingyao Wang	31b244f95e	[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230 ) * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * default to use instance image * update run_controller interface * add logging for copy * refactor swe_bench for the new design * fix action execution timeout * updatelock * remove build sandbox locally * fix runtime * use plain for-loop for single process * remove extra print * get swebench inference working * print whole `test_result` dict * got swebench patch post-process working * update swe-bench evaluation readme * refactor using shared reset_logger function * move messy swebench prompt to a different file * support the ability to specify whether to keep prompt * support the ability to specify whether to keep prompt * fix dockerfile * fix import and remove unnecessary strip logic * fix action serialization * get agentbench running * remove extra ls for agent bench * fix agentbench metric * factor out common documentation for eval * update biocoder doc * remove swe_env_box since it is no longer needed * get biocoder working * add func timeout for bird * fix jupyter pwd with ~ as user name * fix jupyter pwd with ~ as user name * get bird working * get browsing evaluation working * make eda runnable * fix id column * fix eda run_infer * unify eval output using a structured format; make swebench coompatible with that format; update client source code for every swebench run; do not inject testcmd for swebench * standardize existing benchs for the new eval output * set update source code = true * get gaia standardized * fix gaia * gorilla refactored but stuck at language.so to test * refactor and make gpqa work * refactor humanevalfix and get it working * refactor logic reasoning and get it working * refactor browser env so it works with eventstream runtime for eval * add initial version of miniwob refactor * fix browsergym environment * get miniwob working!! * allowing injecting additional dependency to OD runtime docker image * allowing injecting additional dependency to OD runtime docker image * support logic reasoning with pre-injected dependency * get mint working * update runtime build * fix mint docker * add test for keep_prompt; add missing await close for some tests * update integration tests for eventstream runtime * fix integration tests for server runtime * refactor ml bench and toolqa * refactor webarena * fix default factory * Update run_infer.py * add APIError to retry * increase timeout for swebench * make sure to hide api key when dump eval output * update the behavior of put source code to put files instead of tarball * add dishash to dependency * sendintr when timeout * fix dockerfile copy * reduce timeout * use dirhash to avoid repeat building for update source * fix runtime_build testcase * add dir_hash to docker build pipeline * revert api error * update poetry lock * add retries for swebench run infer * fix git patch * update poetry lock * adjust config order * fix mount volumns * enforce all eval to use "instance_id" * remove file store from runtime * make file_store public inside eventstream * move the runtime logic inside `main` out * support using async function for process_instance_fn * refactor run_infer with the create_time * fix file store * Update evaluation/toolqa/utils.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix typo --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: super-dainiu <78588128+super-dainiu@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-06 17:21:45 +00:00
tobitege	abec52abfe	(fix) Revert #3233 ; more logging in runtimes (#3236 ) * ServerRuntime: config copy in init * revert #3233 but more logging * get_box_classes: reset order back to previous version * 3 logging commands switched to debug (were info) * runtimes debug output of config on initialization * removed unneeded logger message from _init_container	2024-08-04 19:13:37 +00:00
Xingyao Wang	4f0a454ed6	[Arch] Support integration tests using EventStream Runtime (#3184 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * fix action execution timeout * updatelock --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-08-01 22:07:39 +00:00
Xingyao Wang	286f10053e	[arch] Implement `copy_to` for Runtime (#3211 ) * add copy to * implement for ServerRuntime * implement copyto for runtime (required by eval); add tests for copy to * fix exist file check * unify copy_to_behavior and fix stuff	2024-08-02 02:46:11 +08:00
Xingyao Wang	bd68249fba	[Arch] Test `EventStreamRuntime` to ensure its feature parity with `ServerRuntime` (#3157 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-31 04:30:59 +08:00
Xingyao Wang	b1ea204c5b	Migrate multi-line-bash-related sandbox tests into runtime tests and fix multi-line issue (#3128 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-27 20:12:57 +00:00
Graham Neubig	275ea706cf	Remove remaining global config (#3099 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import	2024-07-26 18:43:32 +00:00
tobitege	d0217b84ef	test_runtime: run tests per runtime, not alternating (#3103 )	2024-07-26 03:01:50 +08:00
Graham Neubig	dada004fac	Remove config from files (#3039 )	2024-07-19 23:20:44 -04:00
Xingyao Wang	ff6ddc831f	fix: runtime test for mac (#3005 ) * move use_host_network to sandbox config * fix test runtime tests * fix kwargs to make it clearer	2024-07-19 03:03:55 +00:00
tobitege	5a5713009f	INT: prevent error on repeat integration tests after failed test(s) (#2935 ) * Integration tests: prevent File not found error * forgot to remove debug calls in regenerate.sh	2024-07-18 06:29:15 +02:00
Xingyao Wang	f80ecec772	[Arch] Add tests for `EventStreamRuntime` and fix bash parsing (#2933 ) * deprecating recall action * fix integration tests * fix integration tests * refractor runtime to use async * remove search memory * rename .initialize to .ainit * draft of runtime image building (separate from img agnostic) * refractor runtime build into separate file and add unit tests for it * fix image agnostic tests * move `split_bash_commands` into a separate util file * fix bash pexcept parsing for env * refractor add_env_var from sandbox to runtime; add test runtime for env var, remove it from sandbox; * remove unclear comment * capture broader error * make `add_env_var` handle multiple export at the same time * add multi env var test * fix tests with new config * make runtime tests a separate ci to avoid full disk * Update Runtime README with architecture diagram and detailed explanations * update test * remove dependency of global config in sandbox test * fix sandbox typo * runtime tests does not need ghcr build now * remove download runtime img * remove dependency of global config in sandbox test * fix sandbox typo * try to free disk before running the tests * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * try to reduce code duplication * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * cleanup before setup * temporarily remove this enable lint test since env var are now handled by runtime * linter --------- Co-authored-by: OpenDevin <opendevin@all-hands.dev> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-07-18 06:10:45 +08:00
Graham Neubig	257698e89b	Remove global config from sandbox (#2961 ) * Some changes * Fixed errors * Remove duplicate initialize_plugins * Fix some tests * Fix tests	2024-07-16 18:34:04 +00:00
Xingyao Wang	7e68de746d	arch: refractor eventstream into async (#2907 ) * deprecating recall action * fix integration tests * fix integration tests * refractor runtime to use async * remove search memory * rename .initialize to .ainit	2024-07-12 20:03:28 +00:00
Xingyao Wang	e45ddeb2a2	arch: deprecating recall action and `search_memory` (#2900 ) * deprecating recall action * fix integration tests * fix integration tests * remove search memory	2024-07-12 19:23:21 +00:00
Xingyao Wang	96b5cb78fd	[Arch] `EventStreamRuntime` supports browser (#2899 ) * fix the case when source and tmp are not on the same device * always build a dev box (with updated source code) for development purpose * tail the log before removing the container * move browse function * support browser!	2024-07-11 22:32:12 +00:00
Xingyao Wang	a47713ecb0	[Arch] Remove supports for Background Commands (#2803 ) * depracting docker exec box * remove doc exec from workflow and docs * remove background commands * Update tests/unit/test_sandbox.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * replace for-loop with assignment * fix integration tests * fix integration tests for shell script * fix integration tests * increase max iter to fix some monologue agent issue * fix integration test again * fix integration tests (seems related to run_user issue) --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-06 03:38:05 +08:00
மனோஜ்குமார் பழனிச்சாமி	9919d8e448	Provide [Package already installed] info to LLM (#2642 ) * Provide [Package already installed] info to LLM * regenerate tests	2024-06-27 09:03:54 +00:00
tobitege	41344f0dfe	remove backtick handling from run_ipython (#2347 )	2024-06-09 22:53:06 +00:00
tobitege	a97d0767e9	fix: Backticks get always escaped by runtime; add Ipython test (#2321 ) * added tests related to backticks * updated .gitignore * added extra linter test for #2210 * hotfix for integration test * added test_ipython unit test * added test_ipython unit test * remove draft test from test_ipython.py --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-06-08 21:02:27 +00:00
Binyuan Hui	46dcf4bb3e	Support BIRD benchmark (#2117 ) * update: change timeout from 10 to 30 * update: readme for bird evaluation * Update evaluation/bird/run_infer.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update evaluation/bird/README.md Co-authored-by: Shimada666 <649940882@qq.com> * Update evaluation/bird/README.md Co-authored-by: Shimada666 <649940882@qq.com> * Update evaluation/bird/run_infer.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Shimada666 <649940882@qq.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-06-01 11:34:36 +00:00
Leo	78e003caf6	Fix: Avoid bash backtick eval in runtime commands. (#2180 ) Signed-off-by: ifuryst <ifuryst@gmail.com>	2024-06-01 19:19:15 +08:00
Xingyao Wang	01ef90205d	Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105 ) * update swe_bench prompt; use minimal prompt for codeact; * upgrade agentskills and update testcases * update infer prompt * fix cwd * add icl for swebench * also log in_context_example to run infer * remove extra print * change prompt to abs path * update error message to include current file info * change cwd for jupyter if needed * update edit error message * update prompt * improve git get patch * update hint string * default to 50 turns * revert changes from codeact agent and create new CodeActSWEAgent * revert changes to codeact * revert instructions for run infer * revert instructions for run infer * update README * update max iter * add codeact swe agent * fix issue for CodeActSWEAgent * allow specifying max iter in cmdline script * stop printing * Update agenthub/codeact_swe_agent/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Fix prompt regression in jupyter plugin --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-29 21:19:00 -07:00
Xingyao Wang	602ffcdffb	Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941 ) * add draft for skills * Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file * Remove new_sample.txt file * add some work from opendevin w/ fixes * Add unit tests for agentskills module * fix some issues and updated tests * add more tests for open * tweak and handle goto_line * add tests for some edge cases * add tests for scrolling * add tests for edit * add tests for search_dir * update tests to use pytest * use pytest --forked to avoid file op unit tests to interfere with each other via global var * update doc based on swe agent tool * update and add tests for find_file and search_file * move agent_skills to plugins * add agentskills as plugin and docs * add agentskill to ssh box and fix sandbox integration * remove extra returns in doc * add agentskills to initial tool for jupyter * support re-init jupyter kernel (for agentskills) after restart * fix print window's issue with indentation and add testcases * add prompt for codeact with the newest edit primitives * modify the way line number is presented (remove leading space) * change prompt to the newest display format * support tracking of costs via metrics * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * implement and add tests for py linting * remove extra text arg for incompatible subprocess ver * remove sample.txt * update test_edits integration tests * fix all integration * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * correctly setup plugins for swebench eval * bump swe-bench version and add logging * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit 2bd10556739e2af602ea85371b976390f7c48077. * bump version * remove _AGENT_SKILLS_DOCS * move flake8 to test dep * update poetry.lock * remove extra arg * reduce max iter for eval * update poetry * fix integration tests --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-23 16:04:09 +00:00
Robert Brennan	5bdacf738d	Refactor session management (#1810 ) * refactor session mgmt * defer file handling to runtime * add todo * refactor sessions a bit more * remove messages logic from FE * fix up socket handshake * refactor frontend auth a bit * first pass at redoing file explorer * implement directory suffix * fix up file tree * close agent on websocket close * remove session saving * move file refresh * remove getWorkspace * plumb path/code differently * fix build issues * fix the tests * fix npm build * add session rehydration * fix event serialization * logspam * fix user message rehydration * add get_event fn * agent state restoration * change history tracking for codeact * fix responsiveness of init * fix lint * lint * delint * fix prop * update tests * logspam * lint * fix test * revert codeact * change fileService to use API * fix up session loading * delint * delint * fix integration tests * revert test * fix up access to options endpoints * fix initial files load * delint * fix file initialization * fix mock server * fixl int * fix auth for html * Update frontend/src/i18n/translation.json Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor sessions and sockets * avoid reinitializing the same session * fix reconnect issue * change up intro message * more guards on reinit * rename agent_session * delint * fix a bunch of tests * delint * fix last test * remove code editor context * fix build * fix any * fix dot notation * Update frontend/src/services/api.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix up error handling * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update frontend/src/services/session.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix build errs * fix else * add closed state * delint * Update opendevin/server/session/session.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-22 18:33:16 +00:00
மனோஜ்குமார் பழனிச்சாமி	b0b44ed467	Auto restarted Jupyter kernel (#1808 ) Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-05-18 08:40:31 +05:30
Leo	e89cc8f19b	Feat: add stream output to exec_run (#1625 ) * Feat: add stream output to exec_run * Using command timeout to control the exec_box's timeout. * add bash -c to source command to compatible for sh. Signed-off-by: ifuryst <ifuryst@gmail.com> * Feat: add stream output to SSHBox execute Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case fail. Signed-off-by: ifuryst <ifuryst@gmail.com> * fix the test case import wrong path for method. Signed-off-by: ifuryst <ifuryst@gmail.com> --------- Signed-off-by: ifuryst <ifuryst@gmail.com>	2024-05-16 14:37:49 +00:00
Frank Xu	a84d19f03c	Enable CodeAct agents with browsing, and also enable arbitrary BrowserGym action support (#1807 ) * enable browsing in codeact, and arbitrary browsergym DSL support * fix * fix unit test case * update frontend for the new interactive browsing action * bump ver * Fix integration tests --------- Co-authored-by: OpenDevinBot <bot@opendevin.com>	2024-05-15 11:59:58 -04:00
Robert Brennan	dcb5d1ce0a	Add permanent storage option for EventStream (#1697 ) * add storage classes * add minio * add event stream storage * storage test working * use fixture * event stream test passing * better serialization * factor out serialization pkg * move more serialization * fix tests * fix test * remove __all__ * add rehydration test * add more rehydration test * fix fixture * fix dict init * update tests * lock * regenerate tests * Update opendevin/events/stream.py * revert tests * revert old integration tests * only add fields if present * regen tests * pin pyarrow * fix unit tests * remove cause from memories * revert tests * regen tests	2024-05-14 11:09:45 -04:00
Robert Brennan	beb74a19f6	Use event stream for the runtime (#1776 ) * rebuild PR from scratch * fix max_iter * regenerate tests * cut down on history * Update opendevin/controller/agent_controller.py * regenerate tests * revert swe agent * revert some codeact chagnes * regenerate tests * add source to dict * only add source if not none * try to fix coverage issue * lock * add gevent	2024-05-14 13:35:25 +00:00
Xingyao Wang	4db4a84e2e	Simply Jupyter execution via heredoc (#1728 ) * simply jupyter execution via heredoc * make sure /tmp always exists * add integration test for jupyter exec	2024-05-13 04:57:06 +08:00
Robert Brennan	26d82841d5	Create runtime implementation (#1626 ) * first pass at moving runtime * fix import * remove github refs * remove unnecessary import * remove unnecessary import * add e2b * refactor read and write file ops * remove github test * rm action * revert permissions * regenerate tests * re-delete file operations * regenerate integration tests * Update opendevin/runtime/runtime.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix ref * add docs * remove logspam --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-05-09 19:04:49 -04:00

32 Commits