OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 13:52:43 +08:00

Author	SHA1	Message	Date
Bin Lei	ae5f130881	fix potential flake8 miss checking (#3124 ) * fix potential flake8 miss checking * Add unit test for edit_file_by_replace function with problematic file * Add unit test for edit_file_by_replace function with problematic file * Add unit test for edit_file_by_replace function with problematic file * Add unit test for edit_file_by_replace function with problematic file * Add unit test for edit_file function with problematic file * Add unit test for edit_file function with problematic file * Add unit test for edit_file function with problematic file * Add unit test for edit_file function with problematic file * Add unit test for edit_file function with problematic file * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * add test intention description * fix potential flake8 miss checking * fix potential flake8 miss checking * fix potential flake8 miss checking * fix potential flake8 miss checking * fix potential flake8 miss checking --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-08-16 09:23:55 +08:00
Graham Neubig	50b1256c49	Add tests for agent controller (#3357 ) * Add tests for agent controller * Remove dead code * Remove dead code	2024-08-15 04:58:16 +08:00
Xingyao Wang	b6243bb96b	feat: refactor image building logic into runtime builder (#3395 ) * feat: refactor building logic into runtime builder * return image name * fix testcases * use runtime builder for eventstream runtime	2024-08-14 20:02:12 +00:00
Graham Neubig	7d331acffa	Handle error observations in codeact (#3383 ) * Handle error observations in codeact * Remove comments	2024-08-14 13:47:31 +00:00
Graham Neubig	92b19ed1fb	Add unit tests for MemoryCondenser in test_condenser.py (#3379 ) * Add unit tests for MemoryCondenser in test_condenser.py * Formatting * Fix formatting etc --------- Co-authored-by: opendevin <opendevin@all-hands.dev>	2024-08-14 10:20:30 +02:00
adragos	e0b67ad2f1	feat: add Security Analyzer functionality (#3058 ) * feat: Initial work on security analyzer * feat: Add remote invariant client * chore: improve fault tolerance of client * feat: Add button to enable Invariant Security Analyzer * [feat] confirmation mode for bash actions * feat: Add Invariant Tab with security risk outputs * feat: Add modal setting for Confirmation Mode * fix: frontend tests for confirmation mode switch * fix: add missing CONFIRMATION_MODE value in SettingsModal.test.tsx * fix: update test to integrate new setting * feat: Initial work on security analyzer * feat: Add remote invariant client * chore: improve fault tolerance of client * feat: Add button to enable Invariant Security Analyzer * feat: Add Invariant Tab with security risk outputs * feat: integrate security analyzer with confirmation mode * feat: improve invariant analyzer tab * feat: Implement user confirmation for running bash/python code * fix: don't display rejected actions * fix: make confirmation show only on assistant messages * feat: download traces, update policy, implement settings, auto-approve based on defined risk * Fix: low risk not being shown because it's 0 * fix: duplicate logs in tab * fix: log duplication * chore: prepare for merge, remove logging * Merge confirmation_mode from OpenDevin main * test: update tests to pass * chore: finish merging changes, security analyzer now operational again * feat: document Security Analyzers * refactor: api, monitor * chore: lint, fix risk None, revert policy * fix: check security_risk for None * refactor: rename instances of invariant to security analyzer * feat: add /api/options/security-analyzers endpoint * Move security analyzer from tab to modal * Temporary fix lock when security analyzer is not chosen * feat: don't show lock at all when security analyzer is not enabled * refactor: - Frontend: * change type of SECURITY_ANALYZER from bool to string * add combobox to select SECURITY_ANALYZER, current options are "invariant and "" (no security analyzer) * Security is now a modal, lock in bottom right is visible only if there's a security analyzer selected - Backend: * add close to SecurityAnalyzer * instantiate SecurityAnalyzer based on provided string from frontend * fix: update close to be async, to be consistent with other close on resources * fix: max height of modal (prevent overflow) * feat: add logo * small fixes * update docs for creating a security analyzer module * fix linting * update timeout for http client * fix: move security_analyzer config from agent to session * feat: add security_risk to browser actions * add optional remark on combobox * fix: asdict not called on dataclass, remove invariant dependency * fix: exclude None values when serializing * feat: take default policy from invariant-server instead of being hardcoded * fix: check if policy is None * update image name * test: fix some failing runs * fix: security analyzer tests * refactor: merge confirmation_mode and security_analyzer into SecurityConfig. Change invariant error message for docker * test: add tests for invariant parsing actions / observations * fix: python linting for test_security.py * Apply suggestions from code review Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * use ActionSecurityRisk \| None intead of Optional * refactor action parsing * add extra check * lint parser.py * test: add field keep_prompt to test_security * docs: add information about how to enable the analyzer * test: Remove trailing whitespace in README.md text --------- Co-authored-by: Mislav Balunovic <mislav.balunovic@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-13 11:29:41 +00:00
Xingyao Wang	568e6cdb40	feat: change Jupyter cwd alone with "bash" (#3331 ) * remove unused plugin mixin * change the entire jupyter PWD with bash; print jupyter pwd in obs as well; * remove unused field * remove unused comments * change the entire jupyter PWD with bash; print jupyter pwd in obs as well; * fix runtime tests for jupyter * update intgeration tests * fix test again --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-13 06:08:31 -04:00
Xingyao Wang	bdf6df12c3	fix: pip not available in runtime (#3306 ) * try to fix pip unavailable * update test case for pip * force rebuild in CI * remove extra symlink * fix newline * added semi-colon to line 31 * Dockerfile.j2: activate env at the end * Revert "Dockerfile.j2: activate env at the end" This reverts commit cf2f5651021fe80d4ab69a35a85f0a35b29dc3d7. * cleanup Dockerfile * switch default python image * remove image agnostic (no longer used) * fix tests * switch to nikolaik/python-nodejs:python3.11-nodejs22 * fix test * fix test * revert docker * update template --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-09 15:04:43 -04:00
Xingyao Wang	2e6b08db4f	fix: workspace folder permission & app container cannot access client API (#3300 ) * also copy over pyproject and poetry lock * add missing readme * remove extra git config init since it is already done in client.py * only chown if the /workspace dir does not exists * Revert "remove extra git config init since it is already done in client.py" This reverts commit e8556cd76dcb1720b33f5e06904c56efda2e7d9f. * remove extra git config init since it is already done in client.py * fix test runtime * print container log while reconnecting * print log in more readable format * print log in more readable format * increase lines * clean up sandbox and ssh related stuff * remove ssh hostname * remove ssh hostname * fix docker app cannot access runtime API issue * remove ssh password * API HOSTNAME should be pre-fixed with SANDBOX * update config * fix typo that breaks the test	2024-08-08 19:28:34 -04:00
Xingyao Wang	a5195b0e65	chore: clean up sandbox and ssh related configs (#3301 ) * clean up sandbox and ssh related stuff * remove ssh hostname * remove ssh hostname * remove ssh password * update config * fix typo that breaks the test	2024-08-08 22:15:40 +00:00
Graham Neubig	f36639be28	Improve listen.py test coverage (#3289 ) * Add unit tests for listen.py * Added new tests * Improve test coverage for listen.py * Update tests --------- Co-authored-by: opendevin <opendevin@all-hands.dev>	2024-08-08 14:25:12 +00:00
Xingyao Wang	db302fd33c	fix: dubious ownership when running `git` (#3282 ) * switch default to eventstream runtime * remove pull docker from makefile * fix unittest * fix file store path * try deprecate server runtime * remove persist sandbox * move file utils * remove server runtime related workflow * remove unused method * attempt to remove the reliance on filestore for BE * fix async for list file * fix list_files to post * fix list files * add suffix to directory * make sure list file returns abs path; make sure other backend endpoints accpets abs path * remove server runtime test workflow * set git config in runtime * chown for workspace in client; use INIT_COMMANDS to maintain all commands that need to be run before bash start; * fix client issue; add test case for git; --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-08 13:14:45 +00:00
Xingyao Wang	90d0a62469	(arch) Switch default runtime to EventStream Runtime (#3271 ) * switch default to eventstream runtime * remove pull docker from makefile * fix unittest * fix file store path * try deprecate server runtime * remove persist sandbox * move file utils * remove server runtime related workflow * remove unused method * attempt to remove the reliance on filestore for BE * fix async for list file * fix list_files to post * fix list files * add suffix to directory * make sure list file returns abs path; make sure other backend endpoints accpets abs path * remove server runtime test workflow * set git config in runtime	2024-08-08 10:11:49 +08:00
Xingyao Wang	b30a2dd87a	completely remove update_source_code (#3280 )	2024-08-07 16:57:11 +00:00
Xingyao Wang	bb66f15ff6	[Arch] Streamline EventStream Runtime Image Building Logic (#3259 ) * remove nocache * simplify runtime build to use hash & always update source * style * try to fix temp folder issue * fix rm tree * create build folder first (to get correct hash), then copy it over to actual build folder * fix assert * fix indentation * fix copy over * add runtime documentation * fix runtime docs * fix typo * Update docs/modules/usage/runtime.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update docs/modules/usage/runtime.md Co-authored-by: Graham Neubig <neubig@gmail.com> --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-07 06:09:38 +08:00
Xingyao Wang	31b244f95e	[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230 ) * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * default to use instance image * update run_controller interface * add logging for copy * refactor swe_bench for the new design * fix action execution timeout * updatelock * remove build sandbox locally * fix runtime * use plain for-loop for single process * remove extra print * get swebench inference working * print whole `test_result` dict * got swebench patch post-process working * update swe-bench evaluation readme * refactor using shared reset_logger function * move messy swebench prompt to a different file * support the ability to specify whether to keep prompt * support the ability to specify whether to keep prompt * fix dockerfile * fix import and remove unnecessary strip logic * fix action serialization * get agentbench running * remove extra ls for agent bench * fix agentbench metric * factor out common documentation for eval * update biocoder doc * remove swe_env_box since it is no longer needed * get biocoder working * add func timeout for bird * fix jupyter pwd with ~ as user name * fix jupyter pwd with ~ as user name * get bird working * get browsing evaluation working * make eda runnable * fix id column * fix eda run_infer * unify eval output using a structured format; make swebench coompatible with that format; update client source code for every swebench run; do not inject testcmd for swebench * standardize existing benchs for the new eval output * set update source code = true * get gaia standardized * fix gaia * gorilla refactored but stuck at language.so to test * refactor and make gpqa work * refactor humanevalfix and get it working * refactor logic reasoning and get it working * refactor browser env so it works with eventstream runtime for eval * add initial version of miniwob refactor * fix browsergym environment * get miniwob working!! * allowing injecting additional dependency to OD runtime docker image * allowing injecting additional dependency to OD runtime docker image * support logic reasoning with pre-injected dependency * get mint working * update runtime build * fix mint docker * add test for keep_prompt; add missing await close for some tests * update integration tests for eventstream runtime * fix integration tests for server runtime * refactor ml bench and toolqa * refactor webarena * fix default factory * Update run_infer.py * add APIError to retry * increase timeout for swebench * make sure to hide api key when dump eval output * update the behavior of put source code to put files instead of tarball * add dishash to dependency * sendintr when timeout * fix dockerfile copy * reduce timeout * use dirhash to avoid repeat building for update source * fix runtime_build testcase * add dir_hash to docker build pipeline * revert api error * update poetry lock * add retries for swebench run infer * fix git patch * update poetry lock * adjust config order * fix mount volumns * enforce all eval to use "instance_id" * remove file store from runtime * make file_store public inside eventstream * move the runtime logic inside `main` out * support using async function for process_instance_fn * refactor run_infer with the create_time * fix file store * Update evaluation/toolqa/utils.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix typo --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: super-dainiu <78588128+super-dainiu@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-06 17:21:45 +00:00
Xingyao Wang	a69120d399	[Arch] Use hash to avoid repeat building `EventStreamRuntime` image (#3243 ) * update the behavior of put source code to put files instead of tarball * add dishash to dependency * fix dockerfile copy * use dirhash to avoid repeat building for update source * fix runtime_build testcase * add dir_hash to docker build pipeline * add additional tests for source directory * add comment * clear the assertion by explictly check existing files * also assert od is a dir	2024-08-05 03:13:32 +00:00
tobitege	abec52abfe	(fix) Revert #3233 ; more logging in runtimes (#3236 ) * ServerRuntime: config copy in init * revert #3233 but more logging * get_box_classes: reset order back to previous version * 3 logging commands switched to debug (were info) * runtimes debug output of config on initialization * removed unneeded logger message from _init_container	2024-08-04 19:13:37 +00:00
Xingyao Wang	6a12a9f83c	[Arch, Eval] Allowing injecting additional dependency to OD runtime docker image (#3237 ) * allowing injecting additional dependency to OD runtime docker image * update runtime build * make `extra_deps` optional str \| None	2024-08-04 17:38:56 +00:00
Xingyao Wang	62ce183c2d	[Agent Action] Support the ability to specify whether to keep prompt for CmdRun (#3218 ) * support the ability to specify whether to keep prompt * fix action serialization * fix jupyter pwd with ~ as user name * add test for keep_prompt; add missing await close for some tests * update integration tests for eventstream runtime * fix integration tests for server runtime	2024-08-04 20:30:25 +08:00
Kaushik Deka	415843476c	Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848 ) * add image feature * fix-linting * check model support for images * add comment * Add image support to other models * Add images to chat * fix linting * fix test issues * refactor variable names and import * fix tests * fix chat message tests * fix linting * add pydantic class message * use message * remove redundant comments * remove redundant comments * change Message class * remove unintended change * fix integration tests using regenerate.sh * rename image_bas64 to images_url, fix tests * rename Message.py to message, change reminder append logic, add unit tests * remove comment, fix error to merge * codeact_swe_agent * fix f string * update eventstream integration tests * add missing if check in codeact_swe_agent * update integration tests * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatInput.tsx * Update frontend/src/components/chat/ChatMessage.tsx --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-08-04 02:26:22 +08:00
Xingyao Wang	b7061f4497	[Eval, Browser] Refactor Browser Env so it works with `EventStreamRuntime` for Browsing Evaluation (#3235 ) * refactor browser env so it works with eventstream runtime for eval * fix browsergym environment	2024-08-03 15:06:37 +00:00
Xingyao Wang	69ecde640b	Update integration tests README.md (#3227 ) * Update README.md * lint	2024-08-02 17:29:11 +00:00
Xingyao Wang	4f0a454ed6	[Arch] Support integration tests using EventStream Runtime (#3184 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * fix action execution timeout * updatelock --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-08-01 22:07:39 +00:00
tobitege	a4cb880699	(feat) LLM class: added acompletion and streaming + unit test (#3202 ) * LLM class: added acompletion and streaming, unit test test_acompletion.py * LLM: cleanup of self.config defaults and their use * added set_missing_attributes to LLMConfig * move default checker up	2024-08-01 22:41:40 +02:00
Xingyao Wang	286f10053e	[arch] Implement `copy_to` for Runtime (#3211 ) * add copy to * implement for ServerRuntime * implement copyto for runtime (required by eval); add tests for copy to * fix exist file check * unify copy_to_behavior and fix stuff	2024-08-02 02:46:11 +08:00
Xingyao Wang	2e60d25eae	[Agent, LLM] Make sure codeact agent produce message in u/a/u/a order (#3193 ) * make sure codeact agent produce message in u/a/u/a order * integration tests * sync message changes to codeact swe * fix integration tests --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-08-02 00:17:53 +08:00
Engel Nyst	21ea9953b3	don't use realpath with non-existent files (#3200 )	2024-08-01 01:11:22 +02:00
tobitege	938ed027c2	(fix) test_runtime.py parametrization for box_class (#3186 ) * fix test_runtime.py parametrization; prevent duplicate test runs * trivial file change to unblock stuck CI workflow * fix print_method_name fixture in test_runtime (yield was missing) * revert wrong param fixtures	2024-08-01 01:30:10 +08:00
Graham Neubig	a562a7ac7d	Add unit tests for LLM init function (#3188 ) * Add unit tests for LLM init function * Fix formatting --------- Co-authored-by: OpenDevin <opendevin@all-hands.dev>	2024-07-31 16:28:50 +02:00
Xingyao Wang	bd68249fba	[Arch] Test `EventStreamRuntime` to ensure its feature parity with `ServerRuntime` (#3157 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-31 04:30:59 +08:00
tobitege	2533efabbb	(fix) split_bash_commands replaced; temp_dir fixture fix in some tests (#3160 ) * split_bash_commands replaced; temp_dir fixture fix in some tests * tweak test_runtime * skip 2 tests in test_runtime that need fixing in extra PR * reverting bash parsing changes and re-enabled tests * missed to revert a changed assert in test_runtime.py	2024-07-29 17:05:58 +00:00
Xingyao Wang	b1ea204c5b	Migrate multi-line-bash-related sandbox tests into runtime tests and fix multi-line issue (#3128 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Merge with stashed changes * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import * fix eventstream filestore for test_runtime * fix parse arg issue that cause integration test to fail * support swebench pull from custom namespace * add back simple tests for runtime * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-27 20:12:57 +00:00
Graham Neubig	275ea706cf	Remove remaining global config (#3099 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import	2024-07-26 18:43:32 +00:00
Xingyao Wang	1f6e86c932	Fix(test,CI): runtime build tests (#3126 ) * fix runtime build test * get runtime_build test to run in CI * move test involving docker from `test_ipython` to `test_sandbox`	2024-07-26 22:53:01 +08:00
tobitege	d0217b84ef	test_runtime: run tests per runtime, not alternating (#3103 )	2024-07-26 03:01:50 +08:00
Xingyao Wang	405c8a0456	[Arch] Add runtime image build CI & clean up runtime build using `jinja2` template (#3055 ) * test_runtime_client.py to test _execute_bash() * runtime_build and runtime tweaks * fix in docker script * revert bash changes * use sandbox_config.update_source_code to control source code update * add od_version to the sandbox tag * add doc instruction for update source code * do not remove whole poetry folder; add mamba clean * add missing newlines * cleanup runtime dockerfile into jinja template * make prep temp file a separate function; make that function accessible through cli * modify `runtime_build.py` so it can generate directory for building docker img * add dockerfile and sdist of runtime to gitignore since it will be dynamically generated * add runtime to build * do not rebuild new image when an `od_runtime` is provided * use default container_image for testing if possible * move runtime tests to ghcr runtime workflow * update docker base dir for runtime * fix unittest * fix image name * fix image name for test case * rename to make it consistent --------- Co-authored-by: tobitege <tobitege@gmx.de>	2024-07-24 21:56:12 +08:00
Boxuan Li	445f290beb	Validate to_replace in edit_file_by_replace AgentSkill (#3073 ) * Validate to_replace in edit_file_by_replace AgentSkill * Remove redundant replace reminder prompt * Add unit tests * Fix prompt	2024-07-22 21:01:35 -07:00
Graham Neubig	4099e48122	Removed config from agent controller (#3038 ) * Removed config from agent controller * Fix tests * Increase budget * Update tests * Update prompts * Add missing prompt * Fix mistaken deletions * Fix browsing test * Fixed browse tests	2024-07-22 17:42:57 +00:00
Graham Neubig	04877f8caf	Remove global config from tests (#3052 )	2024-07-20 23:07:09 -04:00
Boxuan Li	be6e6e3add	Bug fix: Metrics not accumulated across agent delegation (#3012 ) * Add test to reproduce cost miscalculation bug * Fix metrics bug * Copy metrics upon AgentRejectAction	2024-07-20 04:05:05 +00:00
Graham Neubig	3a21198424	Remove monologue agent (#3036 ) * Remove monologue agent * Fixes	2024-07-19 19:25:05 +00:00
jigsawlabs-student	fa6c12473e	#2220 , integrated aider style linting, currently passes related o… (#2489 ) * WIP for integrate aider linter, see OpenDevin#2220 Updated aider linter to: * Always return text and line numbers * Moved extract line number more consistently * Changed pylint to stop after first linter detects errors Updated agentskills * To get back a LintResult object and then use lines and text for error message and related line number * Moved code for extracting line number to aider linter Tests: * Added additional unit tests for aider to test for * Return values from lint failures * Confirm linter works for non-configured languages like Ruby * move to agent_skills, fixes not seeing skills error * format/lint to new code, fix failing tests, remove unused code from aider linter * small changes (remove litellm, fix readme typo) * fix failing sandbox test * keep, change dumping of metadata * WIP for integrate aider linter, see OpenDevin#2220 Updated aider linter to: * Always return text and line numbers * Moved extract line number more consistently * Changed pylint to stop after first linter detects errors Updated agentskills * To get back a LintResult object and then use lines and text for error message and related line number * Moved code for extracting line number to aider linter Tests: * Added additional unit tests for aider to test for * Return values from lint failures * Confirm linter works for non-configured languages like Ruby * move to agent_skills, fixes not seeing skills error * format/lint to new code, fix failing tests, remove unused code from aider linter * remove duplication of tree-sitter, grep-ast and update poetry.lock * revert to main branch poetry.lock version * only update necessary package * fix jupyter kernel wrong interpreter issue (only for swebench) * fix failing lint tests * update syntax error checks for flake * update poetry lock file * update poetry.lock file, which update content-hash * add grep ast * remove extra stuff caused by merge * update pyproject * remove extra pytest fixture, ruff styling fixes * lint files * update poetry.lock file --------- Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: tobitege <tobitege@gmx.de>	2024-07-19 21:58:54 +08:00
Xingyao Wang	ac27ded81f	Fix: handle the case where env var is empty (#3016 ) * handle the case where env var is empty * fix logging * include obs content in logging * change to add_env_vars	2024-07-19 13:51:06 +00:00
Xingyao Wang	ff6ddc831f	fix: runtime test for mac (#3005 ) * move use_host_network to sandbox config * fix test runtime tests * fix kwargs to make it clearer	2024-07-19 03:03:55 +00:00
Boxuan Li	9d41314d1a	State: Add local_iteration attribute (#2990 ) * Add local_iteration state attribute * Fix typos --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-07-18 14:49:19 +00:00
tobitege	5a5713009f	INT: prevent error on repeat integration tests after failed test(s) (#2935 ) * Integration tests: prevent File not found error * forgot to remove debug calls in regenerate.sh	2024-07-18 06:29:15 +02:00
Xingyao Wang	f80ecec772	[Arch] Add tests for `EventStreamRuntime` and fix bash parsing (#2933 ) * deprecating recall action * fix integration tests * fix integration tests * refractor runtime to use async * remove search memory * rename .initialize to .ainit * draft of runtime image building (separate from img agnostic) * refractor runtime build into separate file and add unit tests for it * fix image agnostic tests * move `split_bash_commands` into a separate util file * fix bash pexcept parsing for env * refractor add_env_var from sandbox to runtime; add test runtime for env var, remove it from sandbox; * remove unclear comment * capture broader error * make `add_env_var` handle multiple export at the same time * add multi env var test * fix tests with new config * make runtime tests a separate ci to avoid full disk * Update Runtime README with architecture diagram and detailed explanations * update test * remove dependency of global config in sandbox test * fix sandbox typo * runtime tests does not need ghcr build now * remove download runtime img * remove dependency of global config in sandbox test * fix sandbox typo * try to free disk before running the tests * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * try to reduce code duplication * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update opendevin/runtime/client/README.md Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * cleanup before setup * temporarily remove this enable lint test since env var are now handled by runtime * linter --------- Co-authored-by: OpenDevin <opendevin@all-hands.dev> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>	2024-07-18 06:10:45 +08:00
Xingyao Wang	cf3d2298da	Refactor: remove the use of global variable in test_sandbox (#2985 ) * remove dependency of global config in sandbox test * fix sandbox typo * try to reduce code duplication	2024-07-17 20:42:40 +00:00
Graham Neubig	c897791024	Refactor LLM config (#2953 ) * Add max_message_chars to LLM * Refactor LLM config * Fix tests * Made some functions class functions * Fix regression * Fixed comments	2024-07-17 09:16:04 -04:00

1 2 3 4

200 Commits