Robert Brennan
b5e00f577c
Replace All-Hands-AI references with OpenHands ( #11287 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-10-26 01:52:45 +02:00
Xingyao Wang
4507a25b85
Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests ( #10540 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-22 13:34:02 +00:00
Xingyao Wang
c2f46200c0
chore(lint): Apply comprehensive linting and formatting fixes ( #10287 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 21:13:19 +02:00
Xingyao Wang
04ff4a025b
feat(cli): Use CLI to launch OpenHands UI server via Docker ( #9783 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-09 02:04:07 +08:00
Robert Brennan
205f0234e8
Rename Conversation to ServerConversation and AppConfig to OpenHandsConfig ( #8754 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-28 21:48:34 +02:00
Graham Neubig
689d3c9046
Update pre-commit hook versions to most recent versions ( #8343 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-08 03:59:13 +00:00
Xingyao Wang
a4d632498c
SWE-Gym rollout stability fix & using a validated SWE-Gym set ( #7182 )
...
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-03-17 21:15:01 +08:00
Calvin Smith
303b7ab180
(fix): Conditional imports resolved in SWE-bench eval script while multiprocessing enabled ( #7244 )
...
Co-authored-by: Calvin Smith <calvin@all-hands.dev>
2025-03-13 13:29:11 -06:00
Xingyao Wang
9f720a9d69
[eval] SWE-Gym Integration ( #6651 )
...
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-03-05 20:15:02 +00:00
Engel Nyst
660d1d1e64
Fix argument in swe-bench grading scripts ( #7046 )
2025-03-02 12:37:15 +08:00
Xingyao Wang
33780f97d0
[eval] Upgrade SWE-Bench to use official image and latest harness ( #6838 )
...
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-02-27 08:15:05 -05:00
Xingyao Wang
1a7003a705
Add sysbox support to remote runtime for eval; Add memory monitor, stress tests to help debug memory issue ( #6684 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-02-18 20:02:28 +00:00
Xingyao Wang
90bbd4edbe
fix: initialize default metadata with all required fields ( #6583 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-02-04 02:52:11 +08:00
Engel Nyst
b9a3f1c753
Fix eval on remote runtime ( #6398 )
2025-01-21 20:49:30 +00:00
Xingyao Wang
72af7bbba2
feat(eval): misc SWE-Bench improvement - use different resources for different instances ( #6313 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-01-17 02:48:41 +08:00
Xingyao Wang
0bed17758f
fix: incorrect soft-timeout implementation & fix hard-timeout follow-up command ( #6280 )
2025-01-17 01:27:00 +08:00
Xingyao Wang
ec70af9412
refactor: Replace pexpect with libtmux in BashSession ( #4881 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
2025-01-04 05:22:13 +08:00
Robert Brennan
0e4e1b3316
Factor out ActionExecutionClient ( #5796 )
2024-12-30 15:32:13 +00:00
OpenHands
678436da30
Fix issue #5222 : [Refactor]: Refactor the evaluation directory ( #5223 )
...
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-11-25 08:35:52 -05:00