OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 13:52:43 +08:00

Author	SHA1	Message	Date
Xingyao Wang	3cf794faef	fix(runtime build): only check for image exist on exact hash tag (#4152 )	2024-10-01 22:20:25 +00:00
Robert Brennan	31b2e4b5b2	allow specifying exact remote image (#4135 )	2024-10-01 13:17:51 -04:00
Robert Brennan	8059e8e298	make runtime url configurable (#4093 )	2024-09-30 18:59:57 +00:00
Xingyao Wang	54ac340e0b	refactor: standardize linter output data structure and interface (#4077 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-01 02:40:23 +08:00
tofarr	5ccee7c8a7	Fix Bash commands now do not block and actually respect the timeout (#4058 )	2024-09-28 08:40:00 +08:00
Xingyao Wang	2bed3a424c	chore: pass logger DEBUG mode to client side (#4096 )	2024-09-28 08:21:04 +08:00
tobitege	9651368e6a	revert #3871 dockerfile template: don't write to .bashrc file (#4095 )	2024-09-27 21:49:51 +00:00
tofarr	c5025fb66e	Fix Reducing the amount being downloaded every time the hash changes. (#4078 )	2024-09-27 15:48:33 -06:00
Robert Brennan	3f9111c615	add idle time to client server (#4084 )	2024-09-27 19:41:16 +00:00
Xingyao Wang	34f3b61536	[runtime hash] fix runtime hash mismatch between inside `app` image and in "development mode" (#4039 )	2024-09-27 15:26:26 +00:00
Amir	3e5c01dfc8	Remove param from docstring that does not exist in the append_file (#4060 )	2024-09-26 22:25:11 +02:00
Xingyao Wang	081ebdbdd8	[runtime] do not keep rebuilding from generic image (#4072 )	2024-09-26 17:19:46 +00:00
tobitege	2cc1c3ef42	(enh) Docker runtime builder with BuildKit support, enh. caching (#4009 )	2024-09-26 08:50:53 +02:00
Xingyao Wang	81b3cd71b3	[eval] log evaluating warnings directly to console (#4026 )	2024-09-26 03:42:32 +08:00
tobitege	c32cec7f89	(enh) send status messages to UI during startup (#3771 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <contact@rbren.io> Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>	2024-09-24 18:46:58 +00:00
Graham Neubig	dc418e7b71	Update README.md for runtime (#4015 )	2024-09-24 02:50:15 +02:00
Xingyao Wang	3435f1e5d8	Store the file edit backup file in `/tmp` (#3958 )	2024-09-23 06:32:24 +08:00
Robert Brennan	72ca1690a7	Wait for runtime to be ready in __init__ (#3963 )	2024-09-20 17:31:30 +02:00
tobitege	45066f19dc	(fix) restore sudo-capability after recent changes (#3964 )	2024-09-19 23:08:13 +02:00
niliy01	0f6fb0f80e	(enh) unify the log output in docker build process (#3961 ) Signed-off-by: niliy <WannaTen@users.noreply.github.com>	2024-09-19 19:19:16 +02:00
tofarr	ad0b549d8b	Feat Tightening up Timeouts and interrupt conditions. (#3926 )	2024-09-18 20:50:42 +00:00
Xingyao Wang	5d7f2fd4ae	[eval] Allow evaluation of SWE-Bench patches on `RemoteRuntime` (#3927 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-18 16:07:34 -04:00
Robert Brennan	c864715b43	Fix UID management for ubuntu users (#3937 )	2024-09-18 16:29:39 +00:00
Engel Nyst	e3be71f523	Fix init order with threading (#3935 )	2024-09-18 15:26:51 +00:00
niliy01	07a094e701	(enh) Update Docker pull data in place (#3910 ) Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-09-17 10:22:07 +02:00
tobitege	52c5abccbf	(enh) Dockerfile.j2: improve env vars for bash and activate in .bashrc (#3871 )	2024-09-17 08:49:04 +02:00
tofarr	0db664986d	Tightened up the logic on retries. (#3882 )	2024-09-16 07:28:06 -06:00
tobitege	a33f61c025	(feat) Show messages' timestamp in UI (#3869 )	2024-09-16 05:41:29 +02:00
tobitege	a45b20a406	(fix) runtime: tweak _wait_until_alive tenacity and exception handling (#3878 )	2024-09-16 04:24:58 +02:00
tobitege	ecf4aed28b	(fix) Update logs after run_action (EventStreamRuntime) (#3870 )	2024-09-15 18:50:10 +02:00
tobitege	554636cf2a	(fix) Fix runtime (RT) tests and split tests in 2 actions (openhands/root) (#3791 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-09-14 21:51:30 +02:00
tobitege	57390eb26b	(enh) docker pull (if not found locally) with progress info (#3682 )	2024-09-14 06:26:42 +02:00
Xingyao Wang	78c5f58adc	refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846 )	2024-09-13 07:37:07 -04:00
Xingyao Wang	2fe2f4c530	[eval] increase timeout for SWEBench eval init/complete (#3829 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout	2024-09-12 15:20:58 +00:00
Robert Brennan	c6105f264f	Improvements to file list UI (#3794 ) * move filematching logic into server * wait until ready before returning * show loading message instead of empty * logspam * delint * fix type * add a few more default ignores	2024-09-11 09:44:37 -04:00
mamoodi	f3b2085f9b	Reduce runtime tests duration by running them across CPUs (#3779 ) * Reduce runtime tests duration by running them across CPUs * fix hardcoded image name * test two cpus * Test folder change * Up the CPU to 4 again to test * Change to 3 CPUs * Down to 2 * Add param to remove all openhands containers * Add comment * Add reruns just in case * Fix ordering of if	2024-09-10 14:31:17 -04:00
tobitege	5ffff742de	Regression fixes: LLM logging; client readiness (EventStreamRuntime) (#3776 ) * Regression fixes: LLM logging; client readiness (EventStreamRuntime) * fix llm.async_completion_wrapper bad edit in previous commit * regen couple of mock files * client: always log initialized status	2024-09-09 21:02:43 +02:00
tobitege	2b7517e542	(enh) add caching@v4 action in workflows (#3780 ) * dummy test change * regen yml: 1st install python 3.11, then poetry * fix caching for poetry; old entry for python was rather useless * fix steps order (cache before poetry) * add poetry caching to ghcr_runtime; fix fork conditions * ghcr_runtime: more caching actions; condition fixes * fix interim action error (order of steps) * cache@v4 instead of v3 * fixed interim typo for 2 fork conditions * runtime/test_env_vars: compacted multiple tests into one to reduce time * ugh if fork condition changes again	2024-09-09 10:49:49 +02:00
Robert Brennan	ab3851593d	Support interactive commands (#3653 ) * hacky solution for interactive commands * add more behavior * debug * fix continue functionality * remove prints * refactor a bit * reduce test sleep * fix python version * fix pre-commit issue * Regenerate integration tests * Update openhands/runtime/client/client.py * revert some prompt stuff * several integration mock files regenerated * execute_action: remove duplicate exception logging --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>	2024-09-08 21:45:51 +02:00
Xingyao Wang	688068a44e	Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite * add script to cleanup remote runtime * fix the cases when tag is too long * update README * update readme for cleanup * rename od to oh * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * gets API key and Runtime from env var --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-05 10:34:31 +08:00
tobitege	bc31fb15fe	(fix) CodeActAgent: fix issues with vision support in prompts (#3665 ) * CodeActAgent: fix message prep if prompt caching is not supported * fix python version in regen tests workflow * fix in conftest "mock_completion" method * add disable_vision to LLMConfig; revert change in message parsing in llm.py * format messages in several files for completion * refactored message(s) formatting (llm.py); added vision_is_active() * fix a unit test * regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags * try to fix path to logs folder in workflow * llm: prevent index error * try FORCE_USE_LLM in regenerate * tweaks everywhere... * fix 2 random unit test errors :( * added FORCE_REGENERATE_TESTS=true to regenerate CLI * fix test_lint_file_fail_typescript again * double-quotes for env vars in workflow; llm logger set to debug * fix typo in regenerate * regenerate iterations now 20; applied iteration counter fix by Li * regenerate: pass FORCE_REGENERATE flag into env * fixes for int tests. several mock files updated. * browsing_agent: fix response_parser.py adding ) to empty response * test_browse_internet: fix skipif and revert obsolete mock files * regenerate: fi bracketing for http server start/kill conditions * disable test_browse_internet for CodeActAgents; mock files updated after merge missed to include more mock files earlier * reverts after review feedback from Li * forgot one * browsing agent test, partial fixes and updated mock files * test_browse_internet works in my WSL now! * adapt unit test test_prompt_caching.py * add DEBUG to regenerate workflow command * convert regenerate workflow params to inputs * more integration test mock files updated * more files * test_prompt_caching: restored test_prompt_caching_headers purpose * file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly * reverts/changes wrt feedback from xingyao * updated docs and config template * code cleanup wrt review feedback	2024-09-04 17:58:30 +02:00
Xingyao Wang	d8a87d7ccb	[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite	2024-09-04 00:58:14 +08:00
Mislav Balunovic	f979d612ec	(fix) confirmation mode bugfix for the EventStreamRuntime (#3695 )	2024-09-02 13:27:33 +00:00
Boxuan Li	75d5591816	file_ops: Use tmp file for original linting (#3681 ) Fix a potential issue that might lead to file corruption when edit linting is enabled #3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement. So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because: Flake8 has no side-effect, so not a problem as of now. We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed. Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent. Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.	2024-09-01 23:36:57 -07:00
tobitege	7068a73ae7	(enh) Improve CodeActAgent's file editing reliability (#3610 ) * improve file editing prompts and unit test converted most raise calls to a _output_error call in file_ops.py * tweaks in test_agent_skill.py wrt to SEP separator * tweaked the separator * remove server runtime remnants and TEST_RUNTIME references * restore use of TEST_RUNTIME args and variables * fix integration tests * added hint to properly escape docstrings * revert latest prompt change --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-09-02 06:03:22 +02:00
Xingyao Wang	090c911a50	(refactor) Make `Runtime` class synchronous (#3661 ) * change runtime to be synchronous * fix test runtime with the new interface * fix arg * fix eval * fix missing config attribute * fix plugins * fix on_event by revert it back to async * update upload_file endpoint * fix argument to upload file * remove unncessary async for eval; fix evaluation run in parallel * use asyncio to run controller for eval * revert file upload * truncate eval test result output	2024-08-30 01:37:03 +00:00
Xingyao Wang	8b1f207d39	feat: support remote runtime (#3406 ) * feat: refactor building logic into runtime builder * return image name * fix testcases * use runtime builder for eventstream runtime * have runtime builder return str * add api_key to sandbox config * draft remote runtime * remove extra if clause * initialize runtime based on box class * add build logic * use base64 for file upload * get runtime image prefix from API * replace ___ with _s_ to make it a valid image name * use /build to start build and /build_status to check the build progress * update logging * fix exit code * always use port * add remote runtime * rename runtime * fix tests import * make dir first if work_dir does not exists; * update debug print to remote runtime * fix exit close_sync * update logging * add retry for stop * use all box class for test keep prompt * fix test browsing * add retry stop * merge init commands to save startup time * fix await * remove sandbox url * support execute through specific runtime url * fix file ops * simplify close * factor out runtime retry code * fix exception handling * fix content type error (e.g., bad gateway when runtime is not ready) * add retry for wait until alive; add retry for check image exists * Revert "add retry for wait until alive;" This reverts commit dd013cd2681a159cd07747497d8c95e145d01c32. * retry when wait until alive * clean up msg * directly save sdist to temp dir for _put_source_code_to_dir * support running testcases in parallel * tweak logging; try to close session * try to close session even on exception * update poetry lock * support remote to run integration tests * add warning for workspace base on remote runtime * set default runtime api * remove server runtime * update poetry lock * support running swe-bench (n=1) eval on remoteruntime * add a timeout of 30 min * add todo for docker namespace * update poetry loc	2024-08-29 15:53:37 +00:00
tobitege	8fca5a5354	linter and test_aider_linter extensions for eslint (#3543 ) * linter and test_aider_linter extensions for eslint * linter tweaks * try enabling verbose output in linter test * one more option for linter test * try conftest.py for tests/unit folder * enable verbose mode in workflow; remove conftest.py again * debug print statements of linter results * skip some tests if eslint is not installed at all * more tweaks * final test skip setups * code quality revisions * fix test again --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-29 10:40:43 +02:00
tobitege	daeff3dfaf	startup handling and logging of docker images tweaked (#3645 )	2024-08-28 22:17:58 +00:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00

1 2 3 4

168 Commits