OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Xingyao Wang	648c8ffb21	(llm): Support OpenHands LM (#7598 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-03-31 17:29:31 +00:00
Xingyao Wang	54236f9617	[eval] Support SWE-Bench Multimodal (#7122 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 07:42:44 -04:00
tofarr	1230b229b5	Replace use of requests with httpx (#7354 )	2025-03-26 13:37:10 +00:00
Zach	0b3d15a4d7	Fix missing 'fi' statement in GAIA benchmark scripts/run_infer.sh (#7465 )	2025-03-24 16:04:25 +00:00
Xingyao Wang	01e0e29a9f	Reduce bash SOFT timeout from 30 to 10 seconds (#7423 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-22 22:42:24 +00:00
Engel Nyst	83458f5146	Fix style issues with pre-commit (#7318 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-18 01:34:27 +00:00
kjain14	507afd7f06	Add TestGenEval benchmark (#5534 ) Co-authored-by: Kush Dave Jain <kdjain@pit.isri.cmu.edu> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-17 20:16:45 +00:00
Xingyao Wang	9b9e728cf6	Iterative evaluation with rule-based critic (#7293 )	2025-03-17 18:37:35 +00:00
Xingyao Wang	a4d632498c	SWE-Gym rollout stability fix & using a validated SWE-Gym set (#7182 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-17 21:15:01 +08:00
Engel Nyst	dd09d46ccb	Remove DelegatorAgent (fix #7280 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-16 16:49:28 +01:00
Calvin Smith	303b7ab180	(fix): Conditional imports resolved in SWE-bench eval script while multiprocessing enabled (#7244 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev>	2025-03-13 13:29:11 -06:00
Elena Chistova	38e866cde4	Fix official SWE-Bench docker image prefix (#7214 )	2025-03-12 18:23:19 +00:00
juanmichelini	b36deca265	Added link to paper in commit0 README (#7221 )	2025-03-12 17:17:22 +00:00
Xingyao Wang	a4908f9a75	[agent] system message + SWE-Bench instruction improvements (#7018 )	2025-03-08 00:27:02 +08:00
Nan Jiang	ec087993f1	rename commit0_bench to commit0 (#7124 )	2025-03-06 02:55:39 +00:00
Xingyao Wang	9f720a9d69	[eval] SWE-Gym Integration (#6651 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-05 20:15:02 +00:00
Xingyao Wang	bbf40c6576	docs: cleanup and update SWE-Bench documentation; and remove the support of non-instance-level image (#7118 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-03-06 03:18:40 +08:00
Xingyao Wang	4be33a079b	Update SWE-Bench README.md about RemoteRuntime (#7108 )	2025-03-05 23:00:54 +08:00
He Du	896d7b8b96	Openhands fix issue 7091 (#7092 ) Co-authored-by: 杜贺 <duhe@duhedeMacBook-Pro-2.local>	2025-03-04 18:39:28 +01:00
Rohit Malhotra	5ffb1ef704	Fix typing (#7083 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-03 20:41:11 +00:00
Engel Nyst	395c1ea9e3	[Refactor] split runtime initialization (create, connect, init) in cli scripts (#7036 )	2025-03-03 00:19:25 +01:00
Engel Nyst	660d1d1e64	Fix argument in swe-bench grading scripts (#7046 )	2025-03-02 12:37:15 +08:00
Magic Mai	8a58e724c6	fix: Remove nested git repositories before adding files in SWE-bench (#6536 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-02-28 01:19:33 +00:00
Xingyao Wang	33780f97d0	[eval] Upgrade SWE-Bench to use official image and latest harness (#6838 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-02-27 08:15:05 -05:00
Engel Nyst	4f98bce6df	Add selected_repo to command line (#6949 )	2025-02-26 20:42:59 +01:00
Mateusz Kwiatkowski	6562297615	Replace shebang with /usr/bin/env bash for improved portability (#6876 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-02-24 18:07:28 +00:00
Xingyao Wang	e52aee168e	Docs: Clarify config.toml usage in evaluation harness (#6828 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-02-20 22:16:17 -08:00
Xingyao Wang	1a7003a705	Add `sysbox` support to remote runtime for eval; Add memory monitor, stress tests to help debug memory issue (#6684 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-02-18 20:02:28 +00:00
Boxuan Li	4443417c75	A few fixes for TAC evaluation harness (#6586 )	2025-02-14 21:01:57 -08:00
Boxuan Li	ef12bc5381	Evaluation harness: Add agent config option (#6662 )	2025-02-13 15:05:03 -05:00
Graham Neubig	e930cd0aef	Better error logging in posthog (#6346 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Ray Myers <ray.myers@gmail.com>	2025-02-06 20:16:37 +00:00
Xingyao Wang	90bbd4edbe	fix: initialize default metadata with all required fields (#6583 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-02-04 02:52:11 +08:00
tofarr	bbfdc62139	Fix for issue where retries continue on a closed runtime (#6564 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2025-02-03 08:44:09 -07:00
Boxuan Li	62402cd617	The-Agent-Company evaluation harness: Support splits (#6577 )	2025-02-02 13:12:01 +08:00
Xingyao Wang	1a9971b1bf	misc: make RemoteRuntime API timeout configurable (#6518 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-30 06:30:18 +08:00
Xingyao Wang	391200510c	fix: revert #5506 for SWE-Bench performance regression (#6491 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-28 22:52:57 +08:00
Aditya Bharat Soni	aebb583779	Support for VisualWebArena evaluation in OpenHands (#4773 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-01-23 20:18:30 +00:00
Engel Nyst	b9a3f1c753	Fix eval on remote runtime (#6398 )	2025-01-21 20:49:30 +00:00
Engel Nyst	5b7fcfbe1a	Disable prompt extensions in SWE-bench (#6391 )	2025-01-21 17:18:30 +00:00
louria	7f57dbebda	Update MiniWoB README (#6385 )	2025-01-21 16:26:47 +01:00
Xingyao Wang	2b04ee2e62	feat(eval): reliability improvement for SWE-Bench eval_infer (#6347 )	2025-01-18 14:02:59 -05:00
Calvin Smith	a12087243a	Pydantic-based configuration and setting objects (#6321 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-01-17 12:33:22 -07:00
Xingyao Wang	899c1f8360	fix(bash): also show timeout reminder when no_change_timeout is triggered (#6318 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-18 03:31:23 +08:00
Xingyao Wang	72af7bbba2	feat(eval): misc SWE-Bench improvement - use different resources for different instances (#6313 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-17 02:48:41 +08:00
Xingyao Wang	0c961bfd8b	refactor(prompt): move runtime/repo info to user message and disable them in eval (#6291 )	2025-01-16 17:53:10 +00:00
Xingyao Wang	0bed17758f	fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command (#6280 )	2025-01-17 01:27:00 +08:00
Engel Nyst	b9a70c8d5c	Delegation fixes (#6165 )	2025-01-15 03:24:39 +00:00
Boxuan Li	92b8d55c2d	Rename `trajectories_path` config to `save_trajectory_path` (#6216 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-01-14 04:32:45 +00:00
tofarr	23473070b9	Revert "Config objects as Pydantic BaseModels (#6176 )" (#6214 )	2025-01-13 07:36:25 -07:00
Calvin Smith	873dddb4e8	Config objects as Pydantic BaseModels (#6176 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-01-12 15:09:45 -05:00

1 2 3 4 5 ...

326 Commits