OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
tofarr	ad0b549d8b	Feat Tightening up Timeouts and interrupt conditions. (#3926 )	2024-09-18 20:50:42 +00:00
Engel Nyst	47f60b8275	Don't send gemini settings when the llm is not gemini (#3940 )	2024-09-18 20:12:58 +00:00
Xingyao Wang	5d7f2fd4ae	[eval] Allow evaluation of SWE-Bench patches on `RemoteRuntime` (#3927 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-18 16:07:34 -04:00
Robert Brennan	c864715b43	Fix UID management for ubuntu users (#3937 )	2024-09-18 16:29:39 +00:00
tobitege	b4408b41c9	(feat) LLM class: add safety_settings for Gemini; improve max_output_tokens defaulting (#3925 )	2024-09-18 11:51:23 -04:00
Engel Nyst	e3be71f523	Fix init order with threading (#3935 )	2024-09-18 15:26:51 +00:00
tobitege	c3117e8c39	(feat) add --version to cli (#3924 )	2024-09-18 09:44:51 -04:00
niliy01	07a094e701	(enh) Update Docker pull data in place (#3910 ) Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-09-17 10:22:07 +02:00
tobitege	52c5abccbf	(enh) Dockerfile.j2: improve env vars for bash and activate in .bashrc (#3871 )	2024-09-17 08:49:04 +02:00
niliy01	804674bb9f	refactor the logic in agent_controller to imporve readability (#3873 ) Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-09-16 14:13:52 -04:00
Engel Nyst	41a54378dc	Add delegates events to eval trajectories (#3881 )	2024-09-16 14:10:42 -04:00
tofarr	0db664986d	Tightened up the logic on retries. (#3882 )	2024-09-16 07:28:06 -06:00
tobitege	a33f61c025	(feat) Show messages' timestamp in UI (#3869 )	2024-09-16 05:41:29 +02:00
tobitege	a45b20a406	(fix) runtime: tweak _wait_until_alive tenacity and exception handling (#3878 )	2024-09-16 04:24:58 +02:00
tobitege	ecf4aed28b	(fix) Update logs after run_action (EventStreamRuntime) (#3870 )	2024-09-15 18:50:10 +02:00
tobitege	554636cf2a	(fix) Fix runtime (RT) tests and split tests in 2 actions (openhands/root) (#3791 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-09-14 21:51:30 +02:00
tobitege	57390eb26b	(enh) docker pull (if not found locally) with progress info (#3682 )	2024-09-14 06:26:42 +02:00
tobitege	6111f530c2	(fix) StuckDetector: syntax error loops were not detected (#3663 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2024-09-13 16:53:52 +02:00
Xingyao Wang	78c5f58adc	refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846 )	2024-09-13 07:37:07 -04:00
Robert Brennan	1f13d80ddc	fix saves (#3848 )	2024-09-12 21:47:02 +00:00
Robert Brennan	58de5221f5	fix file access (#3847 )	2024-09-12 15:30:21 -04:00
Xingyao Wang	2fe2f4c530	[eval] increase timeout for SWEBench eval init/complete (#3829 ) * [eval] increase timeout for swebench eval init/complete * allow CmdRunAction to optionally block when .timeout is setted * fix unit test for serialization * fix unit tests for security analyzer * fix integration tests * add more timeout	2024-09-12 15:20:58 +00:00
Robert Brennan	c6105f264f	Improvements to file list UI (#3794 ) * move filematching logic into server * wait until ready before returning * show loading message instead of empty * logspam * delint * fix type * add a few more default ignores	2024-09-11 09:44:37 -04:00
mamoodi	f3b2085f9b	Reduce runtime tests duration by running them across CPUs (#3779 ) * Reduce runtime tests duration by running them across CPUs * fix hardcoded image name * test two cpus * Test folder change * Up the CPU to 4 again to test * Change to 3 CPUs * Down to 2 * Add param to remove all openhands containers * Add comment * Add reruns just in case * Fix ordering of if	2024-09-10 14:31:17 -04:00
Cole Murray	97a03faf33	Add Handling of Cache Prompt When Formatting Messages (#3773 ) * Add Handling of Cache Prompt When Formatting Messages * Fix Value for Cache Control * Fix Value for Cache Control * Update openhands/core/message.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Fix lint error * Serialize Messages if Propt Caching Is Enabled * Remove formatting message change --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>	2024-09-10 16:34:41 +00:00
tobitege	5ffff742de	Regression fixes: LLM logging; client readiness (EventStreamRuntime) (#3776 ) * Regression fixes: LLM logging; client readiness (EventStreamRuntime) * fix llm.async_completion_wrapper bad edit in previous commit * regen couple of mock files * client: always log initialized status	2024-09-09 21:02:43 +02:00
tobitege	2b7517e542	(enh) add caching@v4 action in workflows (#3780 ) * dummy test change * regen yml: 1st install python 3.11, then poetry * fix caching for poetry; old entry for python was rather useless * fix steps order (cache before poetry) * add poetry caching to ghcr_runtime; fix fork conditions * ghcr_runtime: more caching actions; condition fixes * fix interim action error (order of steps) * cache@v4 instead of v3 * fixed interim typo for 2 fork conditions * runtime/test_env_vars: compacted multiple tests into one to reduce time * ugh if fork condition changes again	2024-09-09 10:49:49 +02:00
Cole Murray	dadada18ce	Add Anthropic Models to Cache Prompt (#3775 ) * Add Anthropic Models to Cache Prompt * Update Cache Prompt Active Check for Partial String Matching	2024-09-08 22:09:14 +00:00
Robert Brennan	ab3851593d	Support interactive commands (#3653 ) * hacky solution for interactive commands * add more behavior * debug * fix continue functionality * remove prints * refactor a bit * reduce test sleep * fix python version * fix pre-commit issue * Regenerate integration tests * Update openhands/runtime/client/client.py * revert some prompt stuff * several integration mock files regenerated * execute_action: remove duplicate exception logging --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>	2024-09-08 21:45:51 +02:00
tobitege	57187417b7	revert enabling litellm verbose mode from testing (#3750 )	2024-09-05 20:12:04 +00:00
tobitege	03b5b03bb2	(enh) CodeActAgent: improve logging; sensible retry defaults in config (#3729 ) * CodeActAgent: improve logging; sensible retry defaults for completion errors * CodeActAgent: reduce completion error message sent to UI * tweak values; docs+config template changes * fix format_messages; log exception in codeactagent again	2024-09-05 18:14:15 +00:00
niliy01	82a154f7e7	(feat) making prompt caching optional instead of enabled default (#3689 ) * (feat) making prompt caching optional instead of enabled default At present, only the Claude models support prompt caching as a experimental feature, therefore, this feature should be implemented as an optional setting rather than being enabled by default. Signed-off-by: Yi Lin <teroincn@gmail.com> * handle the conflict * fix unittest mock return value * fix lint error in whitespace --------- Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-09-05 18:52:26 +02:00
Xingyao Wang	688068a44e	Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite * add script to cleanup remote runtime * fix the cases when tag is too long * update README * update readme for cleanup * rename od to oh * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh Co-authored-by: Graham Neubig <neubig@gmail.com> * gets API key and Runtime from env var --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-09-05 10:34:31 +08:00
tobitege	bc31fb15fe	(fix) CodeActAgent: fix issues with vision support in prompts (#3665 ) * CodeActAgent: fix message prep if prompt caching is not supported * fix python version in regen tests workflow * fix in conftest "mock_completion" method * add disable_vision to LLMConfig; revert change in message parsing in llm.py * format messages in several files for completion * refactored message(s) formatting (llm.py); added vision_is_active() * fix a unit test * regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags * try to fix path to logs folder in workflow * llm: prevent index error * try FORCE_USE_LLM in regenerate * tweaks everywhere... * fix 2 random unit test errors :( * added FORCE_REGENERATE_TESTS=true to regenerate CLI * fix test_lint_file_fail_typescript again * double-quotes for env vars in workflow; llm logger set to debug * fix typo in regenerate * regenerate iterations now 20; applied iteration counter fix by Li * regenerate: pass FORCE_REGENERATE flag into env * fixes for int tests. several mock files updated. * browsing_agent: fix response_parser.py adding ) to empty response * test_browse_internet: fix skipif and revert obsolete mock files * regenerate: fi bracketing for http server start/kill conditions * disable test_browse_internet for CodeActAgents; mock files updated after merge missed to include more mock files earlier * reverts after review feedback from Li * forgot one * browsing agent test, partial fixes and updated mock files * test_browse_internet works in my WSL now! * adapt unit test test_prompt_caching.py * add DEBUG to regenerate workflow command * convert regenerate workflow params to inputs * more integration test mock files updated * more files * test_prompt_caching: restored test_prompt_caching_headers purpose * file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly * reverts/changes wrt feedback from xingyao * updated docs and config template * code cleanup wrt review feedback	2024-09-04 17:58:30 +02:00
Shubham raj	2bc3e8d584	Fix: llm completion exception breaks CodeActAgent (#3678 ) * Catch exception and return finish action with an exception message in case of exception in llm completion * Remove exception logs * Raise llm response error for any exception in llm completion * Raise LLMResponseError from async completion and async streaming completion as well	2024-09-04 05:51:49 +02:00
Xingyao Wang	d8a87d7ccb	[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704 ) * feat: add SWE-bench fullset support * fix instance image list * update eval script and documentation * increase timeout for remote runtime * add push script * handle the case when ret push is an generator * update pbar * set SWE-Bench default to run SWE-Bench lite	2024-09-04 00:58:14 +08:00
Mislav Balunovic	f979d612ec	(fix) confirmation mode bugfix for the EventStreamRuntime (#3695 )	2024-09-02 13:27:33 +00:00
Boxuan Li	75d5591816	file_ops: Use tmp file for original linting (#3681 ) Fix a potential issue that might lead to file corruption when edit linting is enabled #3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement. So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because: Flake8 has no side-effect, so not a problem as of now. We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed. Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent. Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.	2024-09-01 23:36:57 -07:00
tobitege	7068a73ae7	(enh) Improve CodeActAgent's file editing reliability (#3610 ) * improve file editing prompts and unit test converted most raise calls to a _output_error call in file_ops.py * tweaks in test_agent_skill.py wrt to SEP separator * tweaked the separator * remove server runtime remnants and TEST_RUNTIME references * restore use of TEST_RUNTIME args and variables * fix integration tests * added hint to properly escape docstrings * revert latest prompt change --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-09-02 06:03:22 +02:00
tobitege	c83fab8a00	llm: add NotFoundError to completion exception handling (#3668 ) Co-authored-by: tofarr <tofarr@gmail.com>	2024-09-01 07:47:21 +00:00
Boxuan Li	1e2796e168	Fix step count out-of-sync bug when child agent fails (#3680 )	2024-09-01 09:36:51 +02:00
niliy01	89e1c4f29c	feat: add more embed models that Ollama supports recently (#3641 ) Signed-off-by: Yi Lin <teroincn@gmail.com>	2024-08-30 07:58:01 -04:00
Xingyao Wang	090c911a50	(refactor) Make `Runtime` class synchronous (#3661 ) * change runtime to be synchronous * fix test runtime with the new interface * fix arg * fix eval * fix missing config attribute * fix plugins * fix on_event by revert it back to async * update upload_file endpoint * fix argument to upload file * remove unncessary async for eval; fix evaluation run in parallel * use asyncio to run controller for eval * revert file upload * truncate eval test result output	2024-08-30 01:37:03 +00:00
Xingyao Wang	8b1f207d39	feat: support remote runtime (#3406 ) * feat: refactor building logic into runtime builder * return image name * fix testcases * use runtime builder for eventstream runtime * have runtime builder return str * add api_key to sandbox config * draft remote runtime * remove extra if clause * initialize runtime based on box class * add build logic * use base64 for file upload * get runtime image prefix from API * replace ___ with _s_ to make it a valid image name * use /build to start build and /build_status to check the build progress * update logging * fix exit code * always use port * add remote runtime * rename runtime * fix tests import * make dir first if work_dir does not exists; * update debug print to remote runtime * fix exit close_sync * update logging * add retry for stop * use all box class for test keep prompt * fix test browsing * add retry stop * merge init commands to save startup time * fix await * remove sandbox url * support execute through specific runtime url * fix file ops * simplify close * factor out runtime retry code * fix exception handling * fix content type error (e.g., bad gateway when runtime is not ready) * add retry for wait until alive; add retry for check image exists * Revert "add retry for wait until alive;" This reverts commit dd013cd2681a159cd07747497d8c95e145d01c32. * retry when wait until alive * clean up msg * directly save sdist to temp dir for _put_source_code_to_dir * support running testcases in parallel * tweak logging; try to close session * try to close session even on exception * update poetry lock * support remote to run integration tests * add warning for workspace base on remote runtime * set default runtime api * remove server runtime * update poetry lock * support running swe-bench (n=1) eval on remoteruntime * add a timeout of 30 min * add todo for docker namespace * update poetry loc	2024-08-29 15:53:37 +00:00
tobitege	a2d94c9cb1	(enh) StuckDetector: fix+enhance syntax error loop detection (#3628 ) * fix StuckDetector and add more errors for detection * more stringent error detection and more unit tests	2024-08-29 17:33:54 +02:00
tobitege	ae153aa8ab	(enh) review of logger.py; less logging in AgentController (#3648 ) * revised logger.py; agent_controller: less debug logging (every second) * agent_controller._step: removed logging upon _pending_action	2024-08-29 16:07:38 +02:00
tobitege	8fca5a5354	linter and test_aider_linter extensions for eslint (#3543 ) * linter and test_aider_linter extensions for eslint * linter tweaks * try enabling verbose output in linter test * one more option for linter test * try conftest.py for tests/unit folder * enable verbose mode in workflow; remove conftest.py again * debug print statements of linter results * skip some tests if eslint is not installed at all * more tweaks * final test skip setups * code quality revisions * fix test again --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-08-29 10:40:43 +02:00
tobitege	daeff3dfaf	startup handling and logging of docker images tweaked (#3645 )	2024-08-28 22:17:58 +00:00
Graham Neubig	c6ba0e8339	Remove singleton config (#3614 ) * Remove singleton config * Fix tests * Fix logging reset * Fix pre-commit	2024-08-28 20:05:49 +01:00
tobitege	9c39f07430	(enh) Aider-Bench: make resumable with skip_num arg (#3626 ) * added optional START_ID env flag to resume from that instance id * prepare_dataset: fix comparisons by using instance id's as int * aider bench complete_runtime: close runtime to close container * added matrix display of instance id for logging * fix typo in summarize_results.py saying summarise_results * changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable) * doc changes about huggingface spaces to temporarily point back to OD	2024-08-28 15:42:01 +00:00

... 32 33 34 35 36

1775 Commits