OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Kevin Musgrave	74ba21bad0	feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench (#10270 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-08-18 14:18:08 +00:00
Ibragim Badertdinov	19a6b6b618	feat(eval): Support evaluation on SWE-rebench (#10251 )	2025-08-12 14:05:43 +00:00
Xingyao Wang	c4f303a07b	chore(eval): Remove eval_infer_remote.sh script and related references (#10157 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-08-07 20:46:59 +00:00
juanmichelini	ea50fe4e3c	Fix: Continue evaluation when an instance fails after max retries (#8868 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Xingyao Wang <xingyaoww@gmail.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-07-16 22:42:44 +00:00
Linghao Zhang	a93b0457c6	feat(eval): Support evaluation on SWE-bench-Live (#9137 )	2025-06-15 12:30:47 +00:00
Xuhui Zhou	14498c5e25	Feature/swe run interact (#8714 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-05-27 19:35:21 +00:00
Ryan H. Tran	3980ba53c9	Add option to run patch evaluation on Modal (#8607 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-05-23 00:45:45 +07:00
Engel Nyst	637cb0726a	specify condenser config for evals (#8177 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-21 22:08:57 +02:00
Graham Neubig	f317c03b1b	Fix inconsistent max_iterations in SWE-bench evaluation (#8467 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-13 02:07:57 +00:00
Engel Nyst	9b9b1291fc	[chore] Just linting on swe-bench files (#7918 )	2025-04-18 22:12:01 +08:00
Niels Mündler	4b124d5906	Add inference for SWT-Bench (#7201 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Calvin Smith <email@cjsmith.io>	2025-04-17 14:49:42 -06:00
Shixian Sheng	4fb073d1ea	Fixed a few hyperlinks. Translated some texts (#7652 )	2025-04-02 22:10:19 +00:00
Xingyao Wang	54236f9617	[eval] Support SWE-Bench Multimodal (#7122 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 07:42:44 -04:00
Xingyao Wang	9b9e728cf6	Iterative evaluation with rule-based critic (#7293 )	2025-03-17 18:37:35 +00:00
Xingyao Wang	9f720a9d69	[eval] SWE-Gym Integration (#6651 ) Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com>	2025-03-05 20:15:02 +00:00
Xingyao Wang	bbf40c6576	docs: cleanup and update SWE-Bench documentation; and remove the support of non-instance-level image (#7118 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-03-06 03:18:40 +08:00
Xingyao Wang	4be33a079b	Update SWE-Bench README.md about RemoteRuntime (#7108 )	2025-03-05 23:00:54 +08:00
Dmitry Kozlov	17d722f3b3	Update README.md (#6076 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-01-06 17:31:19 +00:00
OpenHands	bfb191b5c7	Fix issue #5739 : [Bug]: Move ./evaluation/swe_bench/scripts/cleanup_remote_runtime.sh to general eval utils (#5740 )	2024-12-25 17:17:06 -05:00
Cheng Yang	8f47547b08	docs: fix markdown linting and broken links (#5401 )	2024-12-05 01:28:04 +08:00
OpenHands	678436da30	Fix issue #5222 : [Refactor]: Refactor the evaluation directory (#5223 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-25 08:35:52 -05:00

21 Commits