OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
John Eismeier	967e9e1891	Propose fix some typos and ignore emacs backup files (#11701 ) Signed-off-by: John E <jeis4wpi@outlook.com>	2025-11-11 09:20:42 -05:00
Xingyao Wang	ca424ec15d	[agent] Add LLM risk analyzer (#9349 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: llamantino <213239228+llamantino@users.noreply.github.com> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Hiep Le <69354317+hieptl@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com> Co-authored-by: Neeraj Panwar <49247372+npneeraj@users.noreply.github.com> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com> Co-authored-by: Insop <1240382+insop@users.noreply.github.com> Co-authored-by: test <test@test.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Zhonghao Jiang <zhonghao.J@outlook.com> Co-authored-by: Ray Myers <ray.myers@gmail.com>	2025-08-22 14:02:36 +00:00
Xingyao Wang	4507a25b85	Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests (#10540 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-22 13:34:02 +00:00
Graham Neubig	426350224b	Add Playwright-based end-to-end testing workflow (#10116 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-14 18:59:06 +00:00
ManOwnFire	9e72b69cf8	fix (cli): issue 9386 - show settings.json path in /settings (#9481 )	2025-07-10 14:59:06 +00:00
tofarr	3c977bd715	Fix for nested mount volumes (#8888 )	2025-06-04 09:30:57 -06:00
Engel Nyst	3c51600260	Add vscode rules/ignores to .gitignore (#8755 )	2025-05-28 15:42:11 +02:00
Kent Johnson	35d2281717	feat: Add dev container (#8589 )	2025-05-26 21:35:27 -04:00
Nan Jiang	463d4e9a46	eval: add commit0 benchmark (#5153 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-22 19:49:45 +00:00
Graham Neubig	a753babb7a	Integrate OpenHands resolver into main repository (#4964 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>	2024-11-14 09:45:46 -05:00
Ziru "Ron" Chen	db4e1dbbec	[eval] Add ScienceAgentBench. (#4645 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-01 02:30:55 +08:00
tobitege	6471d0f94d	.gitignore: ignore all `node_modules` folders (#4491 )	2024-10-20 09:17:45 +08:00
sp.wack	bfdd7fd620	feat(frontend): UI overhaul (#3604 )	2024-10-07 23:15:38 +04:00
Xingyao Wang	47774e60b0	chore: remove deprecated dockerfile (#4079 )	2024-09-27 15:03:23 +00:00
tobitege	c32cec7f89	(enh) send status messages to UI during startup (#3771 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <contact@rbren.io> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-09-24 18:46:58 +00:00
Xingyao Wang	8f0f764a85	fix: CI docker image push (#3476 ) * fix ghcr app * fix ghcr runtime push * rename od_runtime to runtime	2024-08-19 20:53:28 +00:00
Xingyao Wang	31b244f95e	[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230 ) * move multi-line bash tests to test_runtime; support multi-line bash for esruntime; * add testcase to handle PS2 prompt * use bashlex for bash parsing to handle multi-line commands; add testcases for multi-line commands * revert ghcr runtime change * Apply stash * fix run as other user; make test async; * fix test runtime for run as od * add run-as-devin to all the runtime tests * handle the case when username is root * move all run-as-devin tests from sandbox; only tests a few cases on different user to save time; * move over multi-line echo related tests to test_runtime * fix user-specific jupyter by fixing the pypoetry virtualenv folder * make plugin's init async; chdir at initialization of jupyter plugin; move ipy simple testcase to test runtime; * support agentskills import in move tests for jupyter pwd tests; overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter; make agentskills read env var lazily, in case env var is updated; * fix ServerRuntime agentskills issue * move agnostic image test to test_runtime * merge runtime tests in CI * fix enable auto lint as env var * update warning message * update warning message * test for different container images * change parsing output as debug * add exception handling for update_pwd_decorator * fix unit test indentation * add plugins as default input to Runtime class; remove init_sandbox_plugins; implement add_env_var (include jupyter) in the base class; * fix server runtime auto lint * Revert "add exception handling for update_pwd_decorator" This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50. * tries to print debugging info for agentskills * explictly setting uid (try fix permission issue) * Revert "tries to print debugging info for agentskills" This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de. * set sandbox user id during testing to hopefully fix the permission issue * add browser tools for server runtime * try to debug for old pwd * update debug cmd * only test agnostic runtime when TEST_RUNTIME is Server * fix temp dir mkdir * load TEST_RUNTIME at the beginning * remove ipython tests * only log to file when DEBUG * default logging to project root * temporarily remove log to file * fix LLM logger dir * fix logger * make set pwd an optional aux action * fix prev pwd * fix infinity recursion * simplify * do not import the whole od library to avoid logger folder by jupyter * fix browsing * increase timeout * attempt to fix agentskills yet again * clean up in testcases, since CI maybe run as non-root * add _cause attribute for event.id * remove parent * add a bunch of debugging statement again for CI :( * fix temp_dir fixture * change all temp dir to follow pytest's tmp_path_factory * remove extra bracket * clean up error printing a bit * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization * add typing for tmp dir fixture * clear the directory before running the test to avoid weird CI temp dir * remove agnostic test case for server runtime * Revert "remove agnostic test case for server runtime" This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3. * disable agnostic tests in CI * fix test * make sure plugin arg is not passed when no plugin is specified; remove redundant on_event function; * move mock prompt * rename runtime * remove extra logging * refactor run_controller's interface; support multiple runtime for integration test; filter out hostname for prompt * uncomment other tests * pass the right runtime to controller * log runtime when start * uncomment tests * improve symbol filters * add intergration test prompts that seemd ok * add integration test workflow * add python3 to default ubuntu image * symlink python and fix permission to jupyter pip * add retry for jupyter execute server * fix jupyter pip install; add post-process for jupyter pip install; simplify init by add agent_skills path to PYTHONPATH; add testcase to tests jupyter pip install; * fix bug * use ubuntu:22.04 for eventstream integration tests * add todo * update testcase * remove redundant code * fix unit test * reduce dependency for runtime * try making llama-index an optional dependency that's not installed by default * remove pip install since it seemd not needed * log ipython execution; await write message since it returns a future * update ipy testcase * do not install llama-index in CI * do not install llama-index in the app docker as well * set sandbox container image in the integration test script * log plugins & env var for runtime * update conftest for sha256 * add git * remove all non-alphanumeric chalracters * add working ipy module tests! * default to use host network * remove is_async from browser to make thing a little more reliable; retry loading browser when error; * add sleep to wait a bit for http server * kill http server before regenerate browsing tests * fix browsing * only set sandbox container image if undefined * skip empty config value * update evaluation to use the latest run_controller * revert logger in execute_server to be compatible with server runtime * revert logging level to fix jupyter * set logger level * revert the logging * chmod for workspace to fix permission * support getting timeout from action * update test for server runtime * try to fix file permission * fix test_cmd_run_action_serialization_deserialization test (added timeout) * poetry: pip 24.2, torch 2.2.2 * revert adding pip to pyproject.toml * add build to dependencies in pyproject.toml * forgot poetry lock --no-update * fix a DelegatorAgent prompt_002.log (timeout) * fix a DelegatorAgent prompt_003.log (timeout) * couple more timeout attribs in prompt files * some more prompt files * prompts galore * add clarification comment for timeout * default timeout to config * add assert * update integraton tests for eventstream * update integration tests * fix timeout for action<->dict * remove redundant on_event * default to use instance image * update run_controller interface * add logging for copy * refactor swe_bench for the new design * fix action execution timeout * updatelock * remove build sandbox locally * fix runtime * use plain for-loop for single process * remove extra print * get swebench inference working * print whole `test_result` dict * got swebench patch post-process working * update swe-bench evaluation readme * refactor using shared reset_logger function * move messy swebench prompt to a different file * support the ability to specify whether to keep prompt * support the ability to specify whether to keep prompt * fix dockerfile * fix import and remove unnecessary strip logic * fix action serialization * get agentbench running * remove extra ls for agent bench * fix agentbench metric * factor out common documentation for eval * update biocoder doc * remove swe_env_box since it is no longer needed * get biocoder working * add func timeout for bird * fix jupyter pwd with ~ as user name * fix jupyter pwd with ~ as user name * get bird working * get browsing evaluation working * make eda runnable * fix id column * fix eda run_infer * unify eval output using a structured format; make swebench coompatible with that format; update client source code for every swebench run; do not inject testcmd for swebench * standardize existing benchs for the new eval output * set update source code = true * get gaia standardized * fix gaia * gorilla refactored but stuck at language.so to test * refactor and make gpqa work * refactor humanevalfix and get it working * refactor logic reasoning and get it working * refactor browser env so it works with eventstream runtime for eval * add initial version of miniwob refactor * fix browsergym environment * get miniwob working!! * allowing injecting additional dependency to OD runtime docker image * allowing injecting additional dependency to OD runtime docker image * support logic reasoning with pre-injected dependency * get mint working * update runtime build * fix mint docker * add test for keep_prompt; add missing await close for some tests * update integration tests for eventstream runtime * fix integration tests for server runtime * refactor ml bench and toolqa * refactor webarena * fix default factory * Update run_infer.py * add APIError to retry * increase timeout for swebench * make sure to hide api key when dump eval output * update the behavior of put source code to put files instead of tarball * add dishash to dependency * sendintr when timeout * fix dockerfile copy * reduce timeout * use dirhash to avoid repeat building for update source * fix runtime_build testcase * add dir_hash to docker build pipeline * revert api error * update poetry lock * add retries for swebench run infer * fix git patch * update poetry lock * adjust config order * fix mount volumns * enforce all eval to use "instance_id" * remove file store from runtime * make file_store public inside eventstream * move the runtime logic inside `main` out * support using async function for process_instance_fn * refactor run_infer with the create_time * fix file store * Update evaluation/toolqa/utils.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix typo --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: super-dainiu <78588128+super-dainiu@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-06 17:21:45 +00:00
Xingyao Wang	405c8a0456	[Arch] Add runtime image build CI & clean up runtime build using `jinja2` template (#3055 ) * test_runtime_client.py to test _execute_bash() * runtime_build and runtime tweaks * fix in docker script * revert bash changes * use sandbox_config.update_source_code to control source code update * add od_version to the sandbox tag * add doc instruction for update source code * do not remove whole poetry folder; add mamba clean * add missing newlines * cleanup runtime dockerfile into jinja template * make prep temp file a separate function; make that function accessible through cli * modify `runtime_build.py` so it can generate directory for building docker img * add dockerfile and sdist of runtime to gitignore since it will be dynamically generated * add runtime to build * do not rebuild new image when an `od_runtime` is provided * use default container_image for testing if possible * move runtime tests to ghcr runtime workflow * update docker base dir for runtime * fix unittest * fix image name * fix image name for test case * rename to make it consistent --------- Co-authored-by: tobitege <tobitege@gmx.de>	2024-07-24 21:56:12 +08:00
Xingyao Wang	ce8a11a62f	[Arch] Shrink runtime image size (#3051 ) * test_runtime_client.py to test _execute_bash() * runtime_build and runtime tweaks * fix in docker script * revert bash changes * use sandbox_config.update_source_code to control source code update * add od_version to the sandbox tag * add doc instruction for update source code * do not remove whole poetry folder; add mamba clean * add missing newlines --------- Co-authored-by: tobitege <tobitege@gmx.de>	2024-07-22 02:34:45 +08:00
Xingyao Wang	6a0ffc5c61	[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728 ) * add newline after patch to fix patch apply * new swebench wip * add newline after patch to fix patch apply * only add newline if not empty * update swebench source and update * update gitignore for swebench eval * update old prep_eval * update gitignore * add scripts for push and pull swebench images * update eval_infer.sh * update eval_infer for new docker workflow * update script to create markdown report based on report.json * update eval infer to use update output * update readme * only move result to folder if running whole file * remove set-x * update conversion script * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md * make sure last line end with newline * switch to an fix attempt branch of swebench * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-01 23:58:30 +00:00
Shimada666	26fc3c886a	Make plugins sandbox-agnostic (#2101 ) * tmp * tmp * merge main * feat: auto build image cache * remove plugins * use config file * update mamba setup shell * support agnostic sandbox image autobuild * remove config * Update .gitignore Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * Update opendevin/runtime/docker/ssh_box.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * update setup.sh * readd sudo * add sudo in dockerfile * remove export * move od-runtime dependencies to sandbox dockerfile * factor out re-build logic into a separate util file * tweak existing plugin to use OD specific sandbox * update testcase * attempt to fix unit test using image built in ghcr * use cache tag * try to fix unit tests * add unittest * add unittest * add some unittests * revert gh workflow changes * feat: optimize sandbox image naming rule * add pull latest image hint * add opendevin python hint and use mamba to install gcc * update docker image naming rule and fix mamba issue * Update opendevin/runtime/docker/ssh_box.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix: opendevin user use correct pip * fix lint issue * fix custom sandbox base image * rename test name * add skipif --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: tobitege <tobitege@gmx.de>	2024-06-19 19:58:07 -07:00
tobitege	b431fce938	tests: more Agentskills tests; updated .gitignore (#2307 ) * added tests related to backticks * updated .gitignore * added extra linter test for #2210 * hotfix for integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-06-07 16:29:03 +00:00
Frank Xu	48151bdbb0	[feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170 ) * add webarena, and revamp messaging for webarena eval * add changes for browsergym * update infer script * fix unit tests * update * add multiple run for miniwob * update instruction, remove personal path * update * add code for getting final reward, fix integration, add results * add avg cost calculation	2024-06-06 09:01:20 +08:00
Xingyao Wang	2c0a2dbc61	fix yet another swe_bench issue (#2069 )	2024-05-26 10:01:43 -07:00
Engel Nyst	46352e890b	Logging security (#1943 ) * update .gitignore * Rename the confusing 'INFO' style to 'DETAIL' * override str and repr * feat: api_key desensitize * feat: add SensitiveDataFilter in file handler * tweak regex, add tests * more tweaks, include other attrs * add env vars, those with equivalent config * fix tests * tests are invaluable --------- Co-authored-by: Shimada666 <649940882@qq.com>	2024-05-22 18:27:38 +02:00
Xingyao Wang	2406b901df	feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 ) * add draft dockerfile for build all * add rsync for build * add all-in-one docker * update prepare scripts * Update swe_env_box.py * Add swe_entry.sh (buggy now) * Parse the test command in swe_entry.sh * Update README for instance eval in sandbox * revert specialized config * replace run_as_devin as an init arg * set container & run_as_root via args * update swe entry script * update env * remove mounting * allow error after swe_entry * update swe_env_box * move file * update gitignore * get swe_env_box a working demo * support faking user response & provide sandox ahead of time; also return state for controller * tweak main to support adding controller kwargs * add module * initialize plugin for provided sandbox * add pip cache to plugin & fix jupyter kernel waiting * better print Observation output * add run infer scripts * update readme * add utility for getting diff patch * use get_diff_patch in infer * update readme * support cost tracking for codeact * add swe agent edit hack * disable color in git diff * fix git diff cmd * fix state return * support limit eval * increase t imeout and export pip cache * add eval limit config * return state when hit turn limit * save log to file; allow agent to give up * run eval with max 50 turns * add outputs to gitignore * save swe_instance & instruction * add uuid to swebench * add streamlit dep * fix save series * fix the issue where session id might be duplicated * allow setting temperature for llm (use 0 for eval) * Get report from agent running log * support evaluating task success right after inference. * remove extra log * comment out prompt for baseline * add visualizer for eval * use plaintext for instruction * reduce timeout for all; only increase timeout for init * reduce timeout for all; only increase timeout for init * ignore sid for swe env * close sandbox in each eval loop * update visualizer instruction * increase max chars * add finish action to history too * show test result in metrics * add sidebars for visualizer * also visualize swe_instance * cleanup browser when agent controller finish runinng * do not mount workspace for swe-eval to avoid accidentally overwrite files * Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files" This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c. * Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"" This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e. * run jupyter command via copy to, instead of cp to mount * only print mixin output when failed * change ssh box logging * add visualizer for pass rate * add instance id to sandbox name * only remove container we created * use opendevin logger in main * support multi-processing infer * add back metadata, support keyboard interrupt * remove container with startswith * make pbar behave correctly * update instruction w/ multi-processing * show resolved rate by repo * rename tmp dir name * attempt to fix racing for copy to ssh_box * fix script * bump swe-bench-all version * fix ipython with self-contained commands * add jupyter demo to swe_env_box * make resolved count two column * increase height * do not add glob to url params * analyze obs length * print instance id prior to removal handler * add gold patch in visualizer * fix interactive git by adding a git --no-pager as alias * increase max_char to 10k to cover 98% of swe-bench obs cases * allow parsing note * prompt v2 * add iteration reminder * adjust user response * adjust order * fix return eval * fix typo * add reminder before logging * remove other resolve rate * re adjust to new folder structure * support adding eval note * fix eval note path * make sure first log of each instance is printed * add eval note * fix the display for visualizer * tweak visualizer for better git patch reading * exclude empty patch * add retry mechanism for swe_env_box start * fix ssh timeout issue * add stat field for apply test patch success * add visualization for fine-grained report * attempt to support monologue agent by constraining it to single thread * also log error msg when stopeed * save error as well * override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp * add retry mechanism for sshbox * remove retry for swe env box * try to handle loop state stopped * Add get report scripts * Add script to convert agent output to swe-bench format * Merge fine grained report for visualizer * Update eval readme * Update README.md * Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite * Update the script to get model report * Update get_model_report.sh * Update get_agent_report.sh * Update report merge script * Add agent output conversion script * Update swe_lite_env_setup.sh * Add example swe-bench output files * Update eval readme * Remove redundant scripts * set iteration count down to false by default * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io> * fix prepare_swe_util scripts * update builder images * update setup script * remove swe-bench build workflow * update lock * remove experiments since they are moved to hf * remove visualizer (since it is moved to hf repo) * simply jupyter execution via heredoc * update ssh_box * add initial docker readme * add pkg-config as dependency * add script for swe_bench all-in-one docker * add rsync to builder * rename var * update commit * update readme * update lock * support specify timeout for long running tasks * fix path * separate building of all deps and files * support returning states at the end of controller * remove return None * support specify timeout for long running tasks * add timeout for all existing sandbox impl * fix swe_env_box for new codebase * update llm config in config.py * support pass sandbox in * remove force set * update eval script * fix issue of overriding final state * change default eval output to hf demo * change default eval output to hf demo * fix config * only close it when it is NOT external sandbox * add scripts * tweak config * only put in hostory when state has history attr * fix agent controller on the case of run out interaction budget * always assume state is always not none * remove print of final state * catch all exception when cannot compute completion cost * Update README.md * save source into json * fix path * update docker path * return the final state on close * merge AgentState with State * fix integration test * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * move scripts for setup * update script and readme for infer * do not reset logger when n processes == 1 * update eval_infer scripts and readme * simplify readme * copy over dir after eval * copy over dir after eval * directly return get state * update lock * fix output saving of infer * replace print with logger * update eval_infer script * add back the missing .close * increase timeout * copy all swe_bench_format file * attempt to fix output parsing * log git commit id as metadata * fix eval script * update lock * update unit tests * fix argparser unit test * fix lock * the deps are now lightweight enough to be incude in make build * add spaces for tests * add eval outputs to gitignore * remove git submodule * readme * tweak git email * update upload instruction * bump codeact version for eval --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com> Co-authored-by: huybery <huybery@gmail.com> Co-authored-by: Bart Shappee <bshappee@gmail.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-05-15 16:15:55 +00:00
Robert Brennan	dcb5d1ce0a	Add permanent storage option for EventStream (#1697 ) * add storage classes * add minio * add event stream storage * storage test working * use fixture * event stream test passing * better serialization * factor out serialization pkg * move more serialization * fix tests * fix test * remove __all__ * add rehydration test * add more rehydration test * fix fixture * fix dict init * update tests * lock * regenerate tests * Update opendevin/events/stream.py * revert tests * revert old integration tests * only add fields if present * regen tests * pin pyarrow * fix unit tests * remove cause from memories * revert tests * regen tests	2024-05-14 11:09:45 -04:00
மனோஜ்குமார் பழனிச்சாமி	73693ba416	Mentioned LLM logs directory (#1587 ) * Update bug_template.yml * Pythonized * updated configs type * updated opendevin_logger * fixed bool config * fixed bool config	2024-05-09 13:31:14 -04:00
Robert Brennan	242c4a0df6	Remove extra message actions (#1608 ) * remove extra actions * remove message observations * support null obs * handle null obs * fix frontend for changes * fix the way messages flow to the UI * change think to message * add regen script * regenerate all integration tests * change task * remove gh test * fix messages * fix tests * help agent exit after hitting max iter * Update opendevin/events/observation/success.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Update agenthub/codeact_agent/codeact_agent.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-05-07 21:13:08 +00:00
Boxuan Li	e7b5ddfe06	Add integration test framework with mock llm (#1301 ) * Add integration test framework with mock llm * Fix MonologueAgent and PlannerAgent tests * Remove adhoc logging * Use existing logs * Fix SWEAgent and PlannerAgent * Check-in test log files * conftest: look up under test name folder only * Add docstring to conftest * Finish dev doc * Avoid non-determinism * Remove dependency on llm embedding model * Init embedding model only for MonologueAgent * Add adhoc fix for sandbox discrepancy * Test ssh and exec sandboxes * CI: fix missing sandbox type * conftest: Remove hack * Reword comment for TODO	2024-04-25 10:56:53 -04:00
Leo	adbcfefd8c	feat: websocket connection management and sandbox bound to session. (#559 ) * feat: websocket connection management and sandbox bound to session. * fix: set default value to id * feat: add session management. * fix for mypy * fix for mypy * fix the pnpm-lock. * fix the default model is empty will throw error.	2024-04-05 12:19:52 -05:00
Anas DORBANI	5ec0e5b7ec	Switch to Poetry (#378 ) * create the pyproject file * Fix the pyproject.toml file * Update Makefile * adapt makefile * fix some execution issues * Untrack lock files and wait for the backend to get start before frontend * Remove LangChain dependencies * Add github action for pytest * add missing dependency * rebase and fix the versions adding lock file * add torch and pymupdfb deps * some conflicts fixes * Add dependencies evaluation group * add poetry.lock * Fix unexpected operator --------- Co-authored-by: Robert Brennan <contact@rbren.io>	2024-04-05 00:27:29 +00:00
xcodebuild	d64383a520	fix: let make run output both backend and frontend (#576 ) * fix: let make run output both backend and frontend * fix: delete pipe on run	2024-04-02 20:54:16 +08:00
Alex Bäuerle	79237210f2	build(add-files-created-for-other-dev-envs-to-gitignore): Add files such as requirements.txt, .python-version, bun.lockb, and yarn.lock so that if anybody uses these systems, they don't accidentally push the files (#519 )	2024-04-01 23:21:45 -04:00
Jim Su	b1b96df8a8	Replace environment variables with configuration file (#339 ) * Replace environment variables with configuration file * Add config.toml to .gitignore * Remove unused os imports * Update README.md * Update README.md * Update README.md * Fix merge conflict * Fallback to environment variables * Use template file for config.toml * Update config.toml.template * Update config.toml.template --------- Co-authored-by: Robert Brennan <accounts@rbren.io>	2024-03-29 15:26:20 -04:00
Robert Brennan	9bc1890d33	add debug dir for prompts (#205 ) * add debug dir for prompts * add indent to dumps * only wrap completion in debug mode * fix mypy	2024-03-27 12:40:08 -04:00
Xingyao Wang	5ff96111f0	A starting point for SWE-Bench Evaluation with docker (#60 ) * a starting point for SWE-Bench evaluation with docker * fix the swe-bench uid issue * typo fixed * fix conda missing issue * move files based on new PR * Update doc and gitignore using devin prediction file from #81 * fix typo * add a sentence * fix typo in path * fix path --------- Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>	2024-03-22 12:43:49 +08:00
Robert Brennan	b84463f512	Refactor agent interface a bit (#74 ) * start moving files * initial refactor * factor out command management * fix command runner * add workspace to gitignore * factor out command manager * remove dupe add_event * update docs * fix init * fix langchain agent after merge	2024-03-21 23:35:28 +08:00
Xingyao Wang	2de75d4782	Minimal Docker Sandbox with GPT-3.5 Execution Example (#48 ) * minimal docker sandbox * make container_image as an argument (fall back to ubuntu); increase timeout to avoid return too early for long running commands; * add a minimal working (imperfect) example * fix typo * change default container name * attempt to fix "Bad file descriptor" error * handle ctrl+D * add Python gitignore * push sandbox to shared dockerhub for ease of use * move codeact example into research folder * add README for opendevin * change container image name to opendevin dockerhub * move folder; change example to a more general agent * update Message and Role * update docker sandbox to support mounting folder and switch to user with correct permission * make network as host * handle erorrs when attrs are not set yet * convert codeact agent into a compatible agent * add workspace to gitignore * make sure the agent interface adjustment works for langchain_agent	2024-03-21 21:54:56 +08:00
Binyuan Hui	a94f3d81cb	fix: merge multiple .gitignore to unify management (#61 )	2024-03-20 21:35:51 +08:00
Xingyao Wang	dcff11cd2f	add Python gitignore (#59 )	2024-03-20 16:17:16 +08:00

41 Commits