OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 13:52:43 +08:00

Author	SHA1	Message	Date
Engel Nyst	b295f5775c	Revert "Fix issue #5609 : Use litellm's modify_params with default True" (#5631 )	2024-12-16 20:39:57 +00:00
OpenHands	09735c7869	Fix issue #5609 : Use litellm's modify_params with default True (#5611 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-12-16 20:18:45 +01:00
Engel Nyst	4716955960	Remove unused codeact-SWE agent (#5600 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-12-14 20:49:44 +01:00
Ryan H. Tran	8ae2fb636e	Remove symlink use for swebench setup (#5549 )	2024-12-13 22:18:14 +08:00
Engel Nyst	b11e905988	Verify costs script (#5469 )	2024-12-10 14:20:53 +01:00
Engel Nyst	455e667739	add cost to summary (#5473 )	2024-12-10 03:14:03 +08:00
Cheng Yang	8f47547b08	docs: fix markdown linting and broken links (#5401 )	2024-12-05 01:28:04 +08:00
Xingyao Wang	9908e1b285	[Evaluation]: Log openhands version in eval output folder, instead of agent version (#5394 )	2024-12-04 03:33:43 +00:00
Xingyao Wang	990f277132	misc: Support folder-level exp analysis for SWE-Bench `summarize_outputs.py`; Handle CrashLoopBackoff for RemoteRuntime (#5385 )	2024-12-03 15:37:21 +00:00
Engel Nyst	ea994b6209	More integration tests info (#5319 )	2024-11-29 16:39:03 +01:00
Cheng Yang	b808a639d9	docs: improve evaluation README with proper links and formatting (#5221 )	2024-11-27 18:27:36 -05:00
Xingyao Wang	4d3b035e00	feat(agent): add BrowseURLAction to CodeAct (produce markdown from URL) (#5285 )	2024-11-27 21:55:57 +00:00
OpenHands	f0ca2239f3	Fix issue #5076 : Integration test github action (#5077 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-27 21:31:48 +01:00
Graham Neubig	12dd3352c5	Add remote runtime support to agent_bench (#5280 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-26 13:45:49 +00:00
OpenHands	678436da30	Fix issue #5222 : [Refactor]: Refactor the evaluation directory (#5223 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-25 08:35:52 -05:00
Nan Jiang	463d4e9a46	eval: add commit0 benchmark (#5153 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-22 19:49:45 +00:00
Xingyao Wang	ff84a3eede	chore: remove specified sid (#5127 )	2024-11-19 16:41:27 +00:00
Xingyao Wang	a531413d86	fix(eval): support setting hard timeout per evaluation instance (#5110 )	2024-11-18 21:22:55 -05:00
Xingyao Wang	bdc4513937	fix(swebench): handle error in eval_infer and run_infer (#5017 )	2024-11-15 23:04:56 +08:00
Graham Neubig	ce6f99d80e	Add GITHUB_USERNAME env var to resolver step (#4999 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-11-14 18:42:59 +00:00
Ketan Ramaneti	852c90f64a	[fix eval] Fix issues with miniwob remote runtime evaluation (#5001 )	2024-11-14 18:00:48 +00:00
Ketan Ramaneti	42b49e6c43	[fix eval] Fix issues with aider_bench remote runtime evaluation (#5000 )	2024-11-14 17:58:45 +00:00
Xingyao Wang	07f0d1ccb3	feat(llm): convert function call request for non-funcall OSS model (#4711 ) Co-authored-by: Calvin Smith <email@cjsmith.io>	2024-11-15 00:40:09 +08:00
Robert Brennan	bc3f0ac24a	fix imports (#4974 )	2024-11-13 17:04:16 +00:00
Calvin Smith	50e7da9c3d	fix(evaluation): SWE-bench evaluation script supports multiprocessing (#4943 )	2024-11-12 12:19:57 -07:00
Robert Brennan	17f4c6e1a9	Refactor sessions a bit, and fix issue where runtimes get killed (#4900 )	2024-11-12 16:20:36 +00:00
Xingyao Wang	a07e8272da	fix: improve remote runtime reliability on large-scale evaluation (#4869 )	2024-11-09 20:17:10 +00:00
Robert Brennan	be82832eb1	Use keyword matching for CodeAct microagents (#4568 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-09 11:25:02 -05:00
Xingyao Wang	4ce3b9094a	Revert "(feat): Prompt engineering to remind o1 to generate a patch" (#4846 )	2024-11-08 16:12:57 +00:00
Alejandro Cuadron Lafuente	a6810fa6ad	(feat): Prompt engineering to remind o1 to generate a patch (#4807 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-11-08 03:10:18 +00:00
Xingyao Wang	53390d9885	Fix issue #4583 : [Bug]: Unable to pull the full SWE-Bench test set (#4813 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-11-07 22:35:20 +08:00
OpenHands	025dac5d8f	Fix issue #4776 : [Bug]: Files are not uploaded to the environment (SWE-Bench) (#4795 )	2024-11-06 19:05:06 +00:00
Engel Nyst	eeb2342509	Refactor history/event stream (#3808 )	2024-11-05 03:36:14 +01:00
Xingyao Wang	1d2a616be7	Fix issue #4739 : '[Bug]: The agent doesn'"'"'t know its name' (#4740 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-11-04 21:24:35 +00:00
Xingyao Wang	966da7b7c8	feat(agent, CodeAct 2.2): native CodeAct support for Browsing (#4667 ) Co-authored-by: tofarr <tofarr@gmail.com>	2024-11-05 00:27:27 +08:00
Abhijeetsingh Meena	8857f02083	[Eval] DiscoveryBench OpenHands Integration (#4627 ) Signed-off-by: Abhijeetsingh Meena <abhijeet040403@gmail.com> Co-authored-by: Harshit Surana <surana.h@gmail.com>	2024-11-02 07:24:34 -04:00
Ziru "Ron" Chen	db4e1dbbec	[eval] Add ScienceAgentBench. (#4645 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2024-11-01 02:30:55 +08:00
Xingyao Wang	9c2b48ff5d	fix(eval): SWE-Bench instance with upper-case instance id (#4649 )	2024-10-30 21:24:18 +00:00
Xingyao Wang	6d19c93d19	[eval] add evaluation workflow (#4489 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-10-29 13:52:25 +00:00
Xingyao Wang	ae13171194	feat(agent): CodeAct with function calling (#4537 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-10-29 11:06:33 +08:00
Xingyao Wang	1f23dc89b6	fix(eval): add runtime.connect to all eval harness (#4565 )	2024-10-26 00:41:30 +08:00
Xingyao Wang	7340b78962	feat(eval): rewrite log_completions to save completions to directory (#4566 )	2024-10-25 16:36:11 +00:00
tofarr	c4f5c07be1	Refactor: shorter syntax (#4558 )	2024-10-25 06:45:28 -06:00
Graham Neubig	ce2430180f	Update README.md to fix miniwob name (#4534 )	2024-10-23 18:24:43 +00:00
Xingyao Wang	2d5b360505	refactor: re-organize different runtime implementations into an impl folder (#4346 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-23 10:10:03 +00:00
Graham Neubig	54250e3fe2	Update evaluation README.md structure (#4516 )	2024-10-22 14:42:22 +00:00
Xingyao Wang	da548d308c	[agent] LLM-based editing (#3985 ) Co-authored-by: Tim O'Farrell <tofarr@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-10-22 04:51:44 +08:00
Alejandro Cuadron Lafuente	a9a593bb21	[Fix] Added support to specify the platform on which the runtime image should be built. (#4402 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com> Co-authored-by: tofarr <tofarr@gmail.com> Co-authored-by: Robert Brennan <contact@rbren.io>	2024-10-20 09:19:05 +08:00
Xingyao Wang	91308ba4dc	feat: clean-up retries RemoteRuntime & add FatalErrorObservation (#4485 )	2024-10-18 17:23:13 +00:00
Jiayi Pan	c1b323a076	Show actual dataset name in swebench log directory (#4417 )	2024-10-17 10:32:38 +08:00

... 2 3 4 5 6 ...

406 Commits