OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

Author	SHA1	Message	Date
Mateusz Kwiatkowski	6562297615	Replace shebang with /usr/bin/env bash for improved portability (#6876 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-02-24 18:07:28 +00:00
Xingyao Wang	391200510c	fix: revert #5506 for SWE-Bench performance regression (#6491 ) Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-28 22:52:57 +08:00
Xingyao Wang	72af7bbba2	feat(eval): misc SWE-Bench improvement - use different resources for different instances (#6313 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-01-17 02:48:41 +08:00
Xingyao Wang	ec70af9412	refactor: Replace pexpect with libtmux in BashSession (#4881 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io>	2025-01-04 05:22:13 +08:00
Xingyao Wang	61ebec9ff7	feat(eval): better visualization for comparing two swe-bench runs (#5993 )	2025-01-03 02:36:51 +00:00
OpenHands	8975fcd714	Fix issue #5748 : Rename "Ran a Jupyter Command" to "Ran a Python Command" in UI (#5749 ) Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-12-26 23:30:19 +08:00
OpenHands	bfb191b5c7	Fix issue #5739 : [Bug]: Move ./evaluation/swe_bench/scripts/cleanup_remote_runtime.sh to general eval utils (#5740 )	2024-12-25 17:17:06 -05:00
Xingyao Wang	c333938384	feat(eval): add standard error to swebench summarize outputs (#5700 ) Co-authored-by: openhands <openhands@all-hands.dev>	2024-12-20 08:39:43 +08:00
Xingyao Wang	9cdb8d06c0	fix(eval): Use cp -r instead of mv for SWE-Bench Initialization (#5659 )	2024-12-17 21:21:27 +00:00
Ryan H. Tran	8ae2fb636e	Remove symlink use for swebench setup (#5549 )	2024-12-13 22:18:14 +08:00
Engel Nyst	b11e905988	Verify costs script (#5469 )	2024-12-10 14:20:53 +01:00
Engel Nyst	455e667739	add cost to summary (#5473 )	2024-12-10 03:14:03 +08:00
Xingyao Wang	9908e1b285	[Evaluation]: Log openhands version in eval output folder, instead of agent version (#5394 )	2024-12-04 03:33:43 +00:00
Xingyao Wang	990f277132	misc: Support folder-level exp analysis for SWE-Bench `summarize_outputs.py`; Handle CrashLoopBackoff for RemoteRuntime (#5385 )	2024-12-03 15:37:21 +00:00
OpenHands	678436da30	Fix issue #5222 : [Refactor]: Refactor the evaluation directory (#5223 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-11-25 08:35:52 -05:00

15 Commits