OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 05:37:20 +08:00

Author	SHA1	Message	Date
tobitege	03b5b03bb2	(enh) CodeActAgent: improve logging; sensible retry defaults in config (#3729 ) * CodeActAgent: improve logging; sensible retry defaults for completion errors * CodeActAgent: reduce completion error message sent to UI * tweak values; docs+config template changes * fix format_messages; log exception in codeactagent again	2024-09-05 18:14:15 +00:00
niliy01	82a154f7e7	(feat) making prompt caching optional instead of enabled default (#3689 ) * (feat) making prompt caching optional instead of enabled default At present, only the Claude models support prompt caching as a experimental feature, therefore, this feature should be implemented as an optional setting rather than being enabled by default. Signed-off-by: Yi Lin <teroincn@gmail.com> * handle the conflict * fix unittest mock return value * fix lint error in whitespace --------- Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-09-05 18:52:26 +02:00
Xingyao Wang	688068a44e	Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite * add script to cleanup remote runtime * fix the cases when tag is too long * update README * update readme for cleanup * rename od to oh * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * gets API key and Runtime from env var --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-05 10:34:31 +08:00
tobitege	bc31fb15fe	(fix) CodeActAgent: fix issues with vision support in prompts (#3665 ) * CodeActAgent: fix message prep if prompt caching is not supported * fix python version in regen tests workflow * fix in conftest "mock_completion" method * add disable_vision to LLMConfig; revert change in message parsing in llm.py * format messages in several files for completion * refactored message(s) formatting (llm.py); added vision_is_active() * fix a unit test * regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags * try to fix path to logs folder in workflow * llm: prevent index error * try FORCE_USE_LLM in regenerate * tweaks everywhere... * fix 2 random unit test errors :( * added FORCE_REGENERATE_TESTS=true to regenerate CLI * fix test_lint_file_fail_typescript again * double-quotes for env vars in workflow; llm logger set to debug * fix typo in regenerate * regenerate iterations now 20; applied iteration counter fix by Li * regenerate: pass FORCE_REGENERATE flag into env * fixes for int tests. several mock files updated. * browsing_agent: fix response_parser.py adding ) to empty response * test_browse_internet: fix skipif and revert obsolete mock files * regenerate: fi bracketing for http server start/kill conditions * disable test_browse_internet for CodeActAgents; mock files updated after merge missed to include more mock files earlier * reverts after review feedback from Li * forgot one * browsing agent test, partial fixes and updated mock files * test_browse_internet works in my WSL now! * adapt unit test test_prompt_caching.py * add DEBUG to regenerate workflow command * convert regenerate workflow params to inputs * more integration test mock files updated * more files * test_prompt_caching: restored test_prompt_caching_headers purpose * file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly * reverts/changes wrt feedback from xingyao * updated docs and config template * code cleanup wrt review feedback	2024-09-04 17:58:30 +02:00
Shubham raj	2bc3e8d584	Fix: llm completion exception breaks CodeActAgent (#3678 ) * Catch exception and return finish action with an exception message in case of exception in llm completion * Remove exception logs * Raise llm response error for any exception in llm completion * Raise LLMResponseError from async completion and async streaming completion as well	2024-09-04 05:51:49 +02:00
Xingyao Wang	d8a87d7ccb	[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite	2024-09-04 00:58:14 +08:00
Mislav Balunovic	f979d612ec	(fix) confirmation mode bugfix for the EventStreamRuntime (#3695 )	2024-09-02 13:27:33 +00:00
Boxuan Li	75d5591816	file_ops: Use tmp file for original linting (#3681 ) Fix a potential issue that might lead to file corruption when edit linting is enabled #3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement. So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because: Flake8 has no side-effect, so not a problem as of now. We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed. Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent. Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.	2024-09-01 23:36:57 -07:00
tobitege	7068a73ae7	(enh) Improve CodeActAgent's file editing reliability (#3610 ) * improve file editing prompts and unit test converted most raise calls to a _output_error call in file_ops.py * tweaks in test_agent_skill.py wrt to SEP separator * tweaked the separator * remove server runtime remnants and TEST_RUNTIME references * restore use of TEST_RUNTIME args and variables * fix integration tests * added hint to properly escape docstrings * revert latest prompt change --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-09-02 06:03:22 +02:00
tobitege	c83fab8a00	llm: add NotFoundError to completion exception handling (#3668 ) Co-authored-by: tofarr <tofarr@gmail.com>	2024-09-01 07:47:21 +00:00
Boxuan Li	1e2796e168	Fix step count out-of-sync bug when child agent fails (#3680 )	2024-09-01 09:36:51 +02:00
niliy01	89e1c4f29c	feat: add more embed models that Ollama supports recently (#3641 ) Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-08-30 07:58:01 -04:00
Xingyao Wang	090c911a50	(refactor) Make `Runtime` class synchronous (#3661 ) * change runtime to be synchronous * fix test runtime with the new interface * fix arg * fix eval * fix missing config attribute * fix plugins * fix on_event by revert it back to async * update upload_file endpoint * fix argument to upload file * remove unncessary async for eval; fix evaluation run in parallel * use asyncio to run controller for eval * revert file upload * truncate eval test result output	2024-08-30 01:37:03 +00:00
Xingyao Wang	8b1f207d39	feat: support remote runtime (#3406 ) * feat: refactor building logic into runtime builder * return image name * fix testcases * use runtime builder for eventstream runtime * have runtime builder return str * add api_key to sandbox config * draft remote runtime * remove extra if clause * initialize runtime based on box class * add build logic * use base64 for file upload * get runtime image prefix from API * replace ___ with _s_ to make it a valid image name * use /build to start build and /build_status to check the build progress * update logging * fix exit code * always use port * add remote runtime * rename runtime * fix tests import * make dir first if work_dir does not exists; * update debug print to remote runtime * fix exit close_sync * update logging * add retry for stop * use all box class for test keep prompt * fix test browsing * add retry stop * merge init commands to save startup time * fix await * remove sandbox url * support execute through specific runtime url * fix file ops * simplify close * factor out runtime retry code * fix exception handling * fix content type error (e.g., bad gateway when runtime is not ready) * add retry for wait until alive; add retry for check image exists * Revert "add retry for wait until alive;" This reverts commit `dd013cd268`. * retry when wait until alive * clean up msg * directly save sdist to temp dir for _put_source_code_to_dir * support running testcases in parallel * tweak logging; try to close session * try to close session even on exception * update poetry lock * support remote to run integration tests * add warning for workspace base on remote runtime * set default runtime api * remove server runtime * update poetry lock * support running swe-bench (n=1) eval on remoteruntime * add a timeout of 30 min * add todo for docker namespace * update poetry loc	2024-08-29 15:53:37 +00:00
tobitege	a2d94c9cb1	(enh) StuckDetector: fix+enhance syntax error loop detection (#3628 ) * fix StuckDetector and add more errors for detection * more stringent error detection and more unit tests	2024-08-29 17:33:54 +02:00
tobitege	ae153aa8ab	(enh) review of logger.py; less logging in AgentController (#3648 ) * revised logger.py; agent_controller: less debug logging (every second) * agent_controller._step: removed logging upon _pending_action	2024-08-29 16:07:38 +02:00
tobitege	8fca5a5354	linter and test_aider_linter extensions for eslint (#3543 ) * linter and test_aider_linter extensions for eslint * linter tweaks * try enabling verbose output in linter test * one more option for linter test * try conftest.py for tests/unit folder * enable verbose mode in workflow; remove conftest.py again * debug print statements of linter results * skip some tests if eslint is not installed at all * more tweaks * final test skip setups * code quality revisions * fix test again --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-29 10:40:43 +02:00
tobitege	daeff3dfaf	startup handling and logging of docker images tweaked (#3645 )	2024-08-28 22:17:58 +00:00
Graham Neubig	c6ba0e8339	Remove singleton config (#3614 ) * Remove singleton config * Fix tests * Fix logging reset * Fix pre-commit	2024-08-28 20:05:49 +01:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00
Xingyao Wang	d9a8b53bc2	feat: specialize CodeAct into micro agents by providing markdown files (#3511 ) * update microagent name and update template.toml * substitute actual micro_agent_name for prompt manager * add python-frontmatter * support micro agent in codeact * add test cases * add instruction from require env var * add draft gh micro agent * update poetry lock * update poetry lock	2024-08-28 14:58:16 +00:00
Xingyao Wang	98081b9b1b	(eval) EOF fixes for SWE-Bench evaluation (#3623 ) * add error handling for client eof * remove root check * remove set -e * echo USER to fix for swebench infer * fix entry timeout * add timeout; fix runtime close	2024-08-27 21:09:31 +00:00
tobitege	0b8779447a	New README for OpenHands/openhands/runtime folder (#3576 ) * new OpenHands/openhands/runtime/README.md - made by OpenHands * move parts to server readme; fix OD runtime in docs	2024-08-27 21:04:50 +00:00
tobitege	097fbd6362	(fix) Enable and log if logging to file is enabled (#3556 ) * enable logging to file also when DEBUG is active * Log a message if logging to file is enabled * log a message if DEBUG mode is enabled	2024-08-27 22:36:33 +02:00
tobitege	1fddc77247	(feat) runtime: in _wait_until_alive upon start wait for client to have initialized too (#3612 ) * runtime: in _wait_until_alive wait initially for client to initialize * fix typo in runtime log entry	2024-08-27 17:11:32 +02:00
Kaushik Deka	5bb931e4d6	Add prompt caching (Sonnet, Haiku only) (#3411 ) * Add prompt caching * remove anthropic-version from extra_headers * change supports_prompt_caching method to attribute * change caching strat and log cache statistics * add reminder as a new message to fix caching * fix unit test * append reminder to the end of the last message content * move token logs to post completion function * fix unit test failure * fix reminder and prompt caching * unit tests for prompt caching * add test * clean up tests * separate reminder, use latest two messages * fix tests --------- Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-08-26 20:46:44 -04:00
tobitege	8fcf0817d4	(eval) Aider_bench: add eval_ids arg to run specific instance id's (#3592 ) * add eval_ids arg to run specific instance id's; fix/extend README * fix description in parser for --eval-ids * fix test_arg_parser.py to account for added arg * fix typo in README to say "summarize" instead of "summarise" for script	2024-08-27 00:49:26 +08:00
tofarr	8c4c3b18b5	Feat google cloud storage (#3574 ) * Google cloud storage implementation * Unit test refactor	2024-08-26 08:16:49 -06:00
tofarr	6ce77e157b	Fix pypi build (#3548 ) * Fix pypi build The package on pypi only included opendevin/* (the poetry default). It also needs to include agenthub/* * Bumped version so people will actually get it! * Fix package definition * Updated poetry lock file * Update package name to openhands-ai * Add py.typed to indicate that OpenHands has type annotations * Replace package name with openhands_ai * Fix tests to reflect new name --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-26 01:31:37 -06:00
Graham Neubig	f9088766e8	Allow setting of runtime container image (#3573 ) * Add runtime container image setting * Fix typo in test * Fix sandbox base container image * Update variables * Update to base_container_image * Update tests/unit/test_config.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * Fixed eval * Fixed container_image * Fix typo --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-25 23:05:41 +00:00
Robert Brennan	356d9b34be	Add CLI mode (#3564 ) * set log levels * basic cli flow * basic display * better exits * set log level * fix messages * clean up logs * better exits * better printing * add todo	2024-08-26 06:10:21 +08:00
Robert Brennan	b63dec4b2e	Add back docker caching, simplify docker builds (#3546 ) * fix multiarch * remove extra push * add back tag file * fix cache tag * add login step * fix login * try to fix save * fix output maybe * rm outputs * remove tars * fix refs * fix runtime dep * force rebuild * lowercase image * add suffix to build tags for runtime * update matrix * fix cut * fix cut again * add back matrix * Update containers/build.sh Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-23 17:01:18 +00:00
tobitege	fc5f026942	prevent 500 server error on a just removed folder when listing files (#3553 )	2024-08-23 18:05:38 +02:00
tofarr	8d47cebde9	Fix spaces in path (#3547 ) * Fix for issue where spaces in path results in error	2024-08-23 07:29:41 -06:00
Raj Maheshwari	11d8d05b1a	[Fix] Metrics should be updated when agent reaches max iterations. (#3549 )	2024-08-23 02:28:16 +00:00
Ikko Eltociear Ashimine	87cc28beca	chore: update client.py (#3542 ) occurence -> occurrence	2024-08-23 01:18:16 +08:00
Aaron Xia	dc0a1f3940	Fix wrong doc url (#3531 ) * Update custom-sandbox-guide.md update https://docs.all-hands.dev/modules/usage/architecture/runtime * Update runtime_build.py update url * Update README.md update url	2024-08-22 13:16:27 +02:00
Xingyao Wang	b19b724eae	feat: show exact python interpreter to the agent in IPython and Bash (#3448 ) * try to fix pip unavailable * update test case for pip * force rebuild in CI * remove extra symlink * fix newline * added semi-colon to line 31 * Dockerfile.j2: activate env at the end * Revert "Dockerfile.j2: activate env at the end" This reverts commit `cf2f565102`. * cleanup Dockerfile * switch default python image * remove image agnostic (no longer used) * fix tests * simplify integration tests default image * add nodejs specific runtime tests * update tests and workflows * switch to nikolaik/python-nodejs:python3.11-nodejs22 * update build sh to output image name correctly * increase custom images to test * fix test * fix test * fix double quote * try fixing ci * update ghcr workflow * fix artifact name * try to fix ghcr again * fix workflow * save built image to correct dir * remove extra -docker-image * make last tag to be human readable image tag * fix hyphen to underscore * run test runtime on all tags * revert app build * separate ghcr workflow * update dockerfile for eval * fix tag for test run * try fix tag * try fix tag via matrix output * try workflow again * update comments * try fixing test matrix * fix artifact name * try fix tag again * Revert "try fix tag again" This reverts commit `b369badd8c`. * tweak filename * try different path * fix filepath * try fix tag artifact path again * save json instead of line * update matrix * print all tags in workflow * support only streaming diff logs from the runtime client * remove strip from log line to fix indentation * get py interpreter for jupyter * rstrip to remove newline on the rightside for logging * fix blocking issue for stream logs * set python interpreter path in bash ps1 * update testcase for jupyter py interpreter path * remove accidentally added changes * remove accidentally added changes * only print dockerfile when debug * add docs * remove extra tests that weren't supposed to be in this pr * add back missing test * revert * make LogBuffer synchronous to fix hang in integration tests * fix integration tests * Update opendevin/runtime/client/client.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * fix test case * fix integration tests * change deque to list * update integration tests * rename test runtime * fix docs * rename opendevin to openhands in tests --------- Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-08-21 20:08:50 +00:00
tobitege	c7886168e1	(feat) implement typescript linting for CodeActAgent (#3452 ) * tweaks to linter.py to prep for typescript linting (not implemented yet) * fix 2 linter unit tests * simpler basic_lint output; updated unit test * fix default gpt-4o model name in aider default config * linter.py: use tsc (typescript compiler) for linting; added more tests * make typescript linting be more forgiving * use npx instead of npm to install typescript in Dockerfile.j2 * Fix merge mistake * removed npx call from Dockerfile.j2 * fix run_cmd to use code parameter; replace regex in test * fix test_lint_file_fail_typescript to ignore leading path characters * added TODO comment to extract_error_line_from * fixed bug in ts_lint with wrong line number parsing	2024-08-21 21:41:35 +02:00
tobitege	7ef5a2d1ff	(fix) Rename last opendevin occurences (#3490 ) * renaming more opendevin occurences * remove DOCKER_IMAGE variable from Makefile * Revert rename in evaluation/swe_bench/run_infer.py Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> --------- Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-08-20 16:45:26 +00:00
Mahmood Alhawaj	6487175a31	refactored all relative paths to absolute paths (#3495 )	2024-08-21 00:09:48 +08:00
Xingyao Wang	c8452f5813	fix: custom runtime image won't work for go (#3464 ) * fix request param for container_image; add test for go; * fix go version issue * update test to detect go version	2024-08-20 23:38:59 +08:00
tofarr	f5aa111ba6	Fix: Bump max_iterations when resuming due to throttling (#3410 ) * Fix: Reset iteration count when resuming due to throttling * Fix inadvertent additions * WIP * Changing max_iterations instead of iteration count * Now adjusting max_iterations or max_budget_per_task as appropriate * Fix check on iterations * Fix linter issues * AgentController: remember initial max_iterations and use it to extend state's iterations * increase task budget by initial value (not doubling it) --------- Co-authored-by: Tim O'Farrell <tofarr@gmai.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: mamoodi <mamoodiha@gmail.com>	2024-08-20 06:53:26 -06:00
Xingyao Wang	8f0f764a85	fix: CI docker image push (#3476 ) * fix ghcr app * fix ghcr runtime push * rename od_runtime to runtime	2024-08-19 20:53:28 +00:00
Robert Brennan	01ae22ef57	Rename OpenDevin to OpenHands (#3472 ) * Replace OpenDevin with OpenHands * Update CONTRIBUTING.md * Update README.md * Update README.md * update poetry lock; move opendevin folder to openhands * fix env var * revert image references in docs * revert permissions * revert permissions --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-08-20 00:44:54 +08:00

... 35 36 37 38 39

1945 Commits