OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Ryan H. Tran	3980ba53c9	Add option to run patch evaluation on Modal (#8607 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-05-23 00:45:45 +07:00
Engel Nyst	637cb0726a	specify condenser config for evals (#8177 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-21 22:08:57 +02:00
luolin101	1a3cb16ba6	add Visual SWE-bench benchmark (#7131 ) Co-authored-by: tsukimi <yuailun@pku.edu.cn> Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>	2025-05-19 12:08:46 +07:00
Xingyao Wang	2ecc39ffcc	[eval]: disable MCP for SWE-Bench evaluation (#8574 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Engel Nyst <engel.nyst@gmail.com>	2025-05-19 01:32:46 +00:00
Yueqi Song	3ca585b79f	Update run_infer.py to incorporate selection of task based on repo (#8509 )	2025-05-15 12:27:28 +08:00
omahs	4bb6ec2ee5	Fix typos (#8469 )	2025-05-13 09:34:21 +00:00
Graham Neubig	f317c03b1b	Fix inconsistent max_iterations in SWE-bench evaluation (#8467 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-13 02:07:57 +00:00
Graham Neubig	689d3c9046	Update pre-commit hook versions to most recent versions (#8343 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-08 03:59:13 +00:00
Engel Nyst	985e20d529	[chore] Run full agent pre-commit (#8235 )	2025-05-03 11:24:03 -04:00
Qi Liu	3d22520992	[Feat] add multi-swe-bench (#8174 ) Co-authored-by: ByteDance User <tiger@bytedance.localdomain>	2025-05-01 00:23:19 +00:00
Michael Panchenko	14564b25d6	Fix linting (#7965 )	2025-04-21 06:34:40 +08:00
Engel Nyst	a2c55cfdef	Refactor to clean up and move utility/legacy out of the agent (#7917 )	2025-04-19 01:53:33 +08:00
Xingyao Wang	7c23993344	fix(eval): typo in SWE_Bench evaluation (#7930 )	2025-04-19 00:31:08 +08:00
Engel Nyst	9b9b1291fc	[chore] Just linting on swe-bench files (#7918 )	2025-04-18 22:12:01 +08:00
Niels Mündler	4b124d5906	Add inference for SWT-Bench (#7201 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Calvin Smith <email@cjsmith.io>	2025-04-17 14:49:42 -06:00
juanmichelini	6bcebd4b9d	Jetbrains CI Benchmark (#7811 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-04-17 15:10:20 +00:00
Engel Nyst	5e5bf23f9c	[Evaluation] Fix KeyError when the instance failed prematurely (#7864 )	2025-04-15 15:19:31 +00:00
Engel Nyst	d05a6f30e1	[Refactor] Rename codeact_* agent options to simple name (#7853 )	2025-04-15 00:14:13 +02:00
sp.wack	72b5e18898	fix(backend): Return 400 if trying to open a binary file (#7825 )	2025-04-11 22:47:57 +00:00
Engel Nyst	bb98d94b35	[evaluation] fix missing metadata (#7819 )	2025-04-11 16:58:59 +00:00
juanmichelini	53c0c5a07b	SWE-bench_verified instruction baseline improvements to 60% (#7546 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-04-10 16:08:27 +00:00
Xingyao Wang	0087082643	Improve binary file handling and patch generation in SWE-bench evaluation (#7762 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-04-08 22:57:33 +00:00
Xingyao Wang	ddda30d9b7	fix(eval): iterative evaluation improvements; SWE-Bench multimodal fixes (#7739 ) Co-authored-by: Juan Michelini <juan@juan.com.uy> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-04-09 02:44:03 +08:00
Engel Nyst	22cf5144cc	Fix integration test (#7747 )	2025-04-07 22:31:50 -04:00
Boxuan Li	d7c49a0656	[Evaluation] Fix sandbox config in TAC (#7684 )	2025-04-03 08:19:10 +00:00
Boxuan Li	34bf6a6402	[Evaluation] Fix run_infer.py path in TAC (#7683 )	2025-04-03 04:34:02 +00:00
Shixian Sheng	4fb073d1ea	Fixed a few hyperlinks. Translated some texts (#7652 )	2025-04-02 22:10:19 +00:00
Rohit Malhotra	9adfcede31	(Hotfix): Track reason for Error AgentState (#7584 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 21:24:42 +00:00
Xingyao Wang	648c8ffb21	(llm): Support OpenHands LM (#7598 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-03-31 17:29:31 +00:00
Xingyao Wang	54236f9617	[eval] Support SWE-Bench Multimodal (#7122 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 07:42:44 -04:00
tofarr	1230b229b5	Replace use of requests with httpx (#7354 )	2025-03-26 13:37:10 +00:00
Zach	0b3d15a4d7	Fix missing 'fi' statement in GAIA benchmark scripts/run_infer.sh (#7465 )	2025-03-24 16:04:25 +00:00
Xingyao Wang	01e0e29a9f	Reduce bash SOFT timeout from 30 to 10 seconds (#7423 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-22 22:42:24 +00:00
Engel Nyst	83458f5146	Fix style issues with pre-commit (#7318 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-18 01:34:27 +00:00
kjain14	507afd7f06	Add TestGenEval benchmark (#5534 ) Co-authored-by: Kush Dave Jain <kdjain@pit.isri.cmu.edu> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-17 20:16:45 +00:00
Xingyao Wang	9b9e728cf6	Iterative evaluation with rule-based critic (#7293 )	2025-03-17 18:37:35 +00:00
Xingyao Wang	a4d632498c	SWE-Gym rollout stability fix & using a validated SWE-Gym set (#7182 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-17 21:15:01 +08:00
Engel Nyst	dd09d46ccb	Remove DelegatorAgent (fix #7280 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-16 16:49:28 +01:00
Calvin Smith	303b7ab180	(fix): Conditional imports resolved in SWE-bench eval script while multiprocessing enabled (#7244 ) Co-authored-by: Calvin Smith <calvin@all-hands.dev>	2025-03-13 13:29:11 -06:00
Elena Chistova	38e866cde4	Fix official SWE-Bench docker image prefix (#7214 )	2025-03-12 18:23:19 +00:00
juanmichelini	b36deca265	Added link to paper in commit0 README (#7221 )	2025-03-12 17:17:22 +00:00
Xingyao Wang	a4908f9a75	[agent] system message + SWE-Bench instruction improvements (#7018 )	2025-03-08 00:27:02 +08:00
Nan Jiang	ec087993f1	rename commit0_bench to commit0 (#7124 )	2025-03-06 02:55:39 +00:00
Xingyao Wang	9f720a9d69	[eval] SWE-Gym Integration (#6651 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-05 20:15:02 +00:00
Xingyao Wang	bbf40c6576	docs: cleanup and update SWE-Bench documentation; and remove the support of non-instance-level image (#7118 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-03-06 03:18:40 +08:00
Xingyao Wang	4be33a079b	Update SWE-Bench README.md about RemoteRuntime (#7108 )	2025-03-05 23:00:54 +08:00
He Du	896d7b8b96	Openhands fix issue 7091 (#7092 ) Co-authored-by: 杜贺 <duhe@duhedeMacBook-Pro-2.local>	2025-03-04 18:39:28 +01:00
Rohit Malhotra	5ffb1ef704	Fix typing (#7083 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-03 20:41:11 +00:00
Engel Nyst	395c1ea9e3	[Refactor] split runtime initialization (create, connect, init) in cli scripts (#7036 )	2025-03-03 00:19:25 +01:00
Engel Nyst	660d1d1e64	Fix argument in swe-bench grading scripts (#7046 )	2025-03-02 12:37:15 +08:00

1 2 3 4 5 ...

404 Commits