Zacharias Fisches
818f743dc7
Bugfix: respect config.tom system_prompt_filename when running swe-bench ( #11091 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-10-27 21:55:05 +00:00
Robert Brennan
b5e00f577c
Replace All-Hands-AI references with OpenHands ( #11287 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-10-26 01:52:45 +02:00
Tim O'Farrell
4b303ec9b4
Fixes to unblock frontend ( #11488 )
...
Co-authored-by: Ray Myers <ray.myers@gmail.com>
2025-10-23 14:43:45 -06:00
Kevin Musgrave
19bae5ac0f
feat(evaluation): Add placeholders to swe_gpt4.j2 ( #11228 )
...
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-10-13 22:15:05 +08:00
Ryan H. Tran
df9320f8ab
Implement model routing support ( #9738 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-09-08 16:19:34 +07:00
Zacharias Fisches
20e5c40969
Fix swe-bench run_infer.py config parsing from config.toml ( #10792 )
2025-09-04 20:10:08 +08:00
Xingyao Wang
b082ccc0fb
feat(llm): add support for deepseek and gpt-5-mini, util for token count ( #10626 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-27 11:03:35 +08:00
Xingyao Wang
4507a25b85
Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests ( #10540 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-22 13:34:02 +00:00
Engel Nyst
91d3d1d20a
Fix: expose aggregated LLM metrics in State for evaluation scripts ( #10537 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-21 17:43:09 +02:00
Kevin Musgrave
74ba21bad0
feat(evaluation): Added INSTRUCTION_TEMPLATE_NAME to run_infer.py in swe_bench ( #10270 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-08-18 14:18:08 +00:00
Xingyao Wang
c2f46200c0
chore(lint): Apply comprehensive linting and formatting fixes ( #10287 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-13 21:13:19 +02:00
Ibragim Badertdinov
19a6b6b618
feat(eval): Support evaluation on SWE-rebench ( #10251 )
2025-08-12 14:05:43 +00:00
Insop
1d0d88d491
Readability improvement & remove duplicated and unused prompts ( #10241 )
2025-08-12 12:42:17 +08:00
Xingyao Wang
04ff4a025b
feat(cli): Use CLI to launch OpenHands UI server via Docker ( #9783 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-09 02:04:07 +08:00
Xingyao Wang
c4f303a07b
chore(eval): Remove eval_infer_remote.sh script and related references ( #10157 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-07 20:46:59 +00:00
Boxuan Li
7af35ab827
Evaluation: disable browser when NOT run_with_browsing ( #9837 )
2025-07-22 01:45:52 +00:00
juanmichelini
ea50fe4e3c
Fix: Continue evaluation when an instance fails after max retries ( #8868 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Xingyao Wang <xingyaoww@gmail.com>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-07-16 22:42:44 +00:00
xhguo7
9388fef0ef
feat(eval): loc acc evaluation ( #8515 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-07-11 03:22:35 +08:00
Xingyao Wang
cff5697456
eval: remove gemini-specific swebench template ( #9623 )
2025-07-08 18:34:23 +00:00
better629
432d8829dc
disable mcp in run_localize and install oh-aci[llama] for issue 9150 ( #9151 )
2025-06-16 11:03:17 +00:00
Linghao Zhang
a93b0457c6
feat(eval): Support evaluation on SWE-bench-Live ( #9137 )
2025-06-15 12:30:47 +00:00
Graham Neubig
0c307ea12e
Lint all files in the repo ( #9131 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-06-14 16:25:59 +00:00
Engel Nyst
fd3b4ac8e6
Refactor SWE-bench instruction ( #8010 )
2025-06-13 23:27:52 +02:00
Leander Maben
d84befe28f
Adding LLM Based Editing capability ( #8677 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-06-09 21:57:20 +08:00
Robert Brennan
205f0234e8
Rename Conversation to ServerConversation and AppConfig to OpenHandsConfig ( #8754 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-28 21:48:34 +02:00
Xuhui Zhou
14498c5e25
Feature/swe run interact ( #8714 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-05-27 19:35:21 +00:00
Zhaoling Chen
efe287ce34
integrate LocAgent into OpenHands ( #7371 )
...
Co-authored-by: czlll <gangda@huaihe.usc.edu>
Co-authored-by: Hoang Tran <descience.thh10@gmail.com>
2025-05-23 22:42:58 +07:00
Ryan H. Tran
3980ba53c9
Add option to run patch evaluation on Modal ( #8607 )
...
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2025-05-23 00:45:45 +07:00
Engel Nyst
637cb0726a
specify condenser config for evals ( #8177 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-21 22:08:57 +02:00
Xingyao Wang
2ecc39ffcc
[eval]: disable MCP for SWE-Bench evaluation ( #8574 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Engel Nyst <engel.nyst@gmail.com>
2025-05-19 01:32:46 +00:00
Yueqi Song
3ca585b79f
Update run_infer.py to incorporate selection of task based on repo ( #8509 )
2025-05-15 12:27:28 +08:00
Graham Neubig
f317c03b1b
Fix inconsistent max_iterations in SWE-bench evaluation ( #8467 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-13 02:07:57 +00:00
Graham Neubig
689d3c9046
Update pre-commit hook versions to most recent versions ( #8343 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-05-08 03:59:13 +00:00
Michael Panchenko
14564b25d6
Fix linting ( #7965 )
2025-04-21 06:34:40 +08:00
Engel Nyst
a2c55cfdef
Refactor to clean up and move utility/legacy out of the agent ( #7917 )
2025-04-19 01:53:33 +08:00
Xingyao Wang
7c23993344
fix(eval): typo in SWE_Bench evaluation ( #7930 )
2025-04-19 00:31:08 +08:00
Engel Nyst
9b9b1291fc
[chore] Just linting on swe-bench files ( #7918 )
2025-04-18 22:12:01 +08:00
Niels Mündler
4b124d5906
Add inference for SWT-Bench ( #7201 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Calvin Smith <email@cjsmith.io>
2025-04-17 14:49:42 -06:00
Engel Nyst
5e5bf23f9c
[Evaluation] Fix KeyError when the instance failed prematurely ( #7864 )
2025-04-15 15:19:31 +00:00
Engel Nyst
d05a6f30e1
[Refactor] Rename codeact_* agent options to simple name ( #7853 )
2025-04-15 00:14:13 +02:00
sp.wack
72b5e18898
fix(backend): Return 400 if trying to open a binary file ( #7825 )
2025-04-11 22:47:57 +00:00
Engel Nyst
bb98d94b35
[evaluation] fix missing metadata ( #7819 )
2025-04-11 16:58:59 +00:00
juanmichelini
53c0c5a07b
SWE-bench_verified instruction baseline improvements to 60% ( #7546 )
...
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2025-04-10 16:08:27 +00:00
Xingyao Wang
0087082643
Improve binary file handling and patch generation in SWE-bench evaluation ( #7762 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-04-08 22:57:33 +00:00
Xingyao Wang
ddda30d9b7
fix(eval): iterative evaluation improvements; SWE-Bench multimodal fixes ( #7739 )
...
Co-authored-by: Juan Michelini <juan@juan.com.uy>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: openhands <openhands@all-hands.dev>
2025-04-09 02:44:03 +08:00
Shixian Sheng
4fb073d1ea
Fixed a few hyperlinks. Translated some texts ( #7652 )
2025-04-02 22:10:19 +00:00
Xingyao Wang
648c8ffb21
(llm): Support OpenHands LM ( #7598 )
...
Co-authored-by: mamoodi <mamoodiha@gmail.com>
2025-03-31 17:29:31 +00:00
Xingyao Wang
54236f9617
[eval] Support SWE-Bench Multimodal ( #7122 )
...
Co-authored-by: openhands <openhands@all-hands.dev>
2025-03-31 07:42:44 -04:00
Xingyao Wang
9b9e728cf6
Iterative evaluation with rule-based critic ( #7293 )
2025-03-17 18:37:35 +00:00
Xingyao Wang
a4d632498c
SWE-Gym rollout stability fix & using a validated SWE-Gym set ( #7182 )
...
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-03-17 21:15:01 +08:00