OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 13:47:19 +08:00

Author	SHA1	Message	Date
xhguo7	9388fef0ef	feat(eval): loc acc evaluation (#8515 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-07-11 03:22:35 +08:00
Xingyao Wang	cff5697456	eval: remove gemini-specific swebench template (#9623 )	2025-07-08 18:34:23 +00:00
Ryan H. Tran	dfa54673d2	[OH-Versa] Add remaining browsing & GAIA eval improvement (#9015 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-25 12:36:15 +07:00
Maxim Evtush	653a8a7ce2	Refactor: Improve Consistency in Function Signatures and Regex Usage in compute_ism_pm_score.py (#9145 )	2025-06-18 04:22:16 +08:00
Ryan H. Tran	ddaa186971	[GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools (#9057 )	2025-06-17 13:16:50 +07:00
better629	432d8829dc	disable mcp in run_localize and install oh-aci[llama] for issue 9150 (#9151 )	2025-06-16 11:03:17 +00:00
FT	e5bff91e8e	Fix Typo: Change "accurancy" to "accuracy" in Evaluation Benchmark Comments (#9139 )	2025-06-15 12:48:26 +00:00
Linghao Zhang	a93b0457c6	feat(eval): Support evaluation on SWE-bench-Live (#9137 )	2025-06-15 12:30:47 +00:00
kilavvy	4e99aabcb2	Minor Code Comment Corrections and Clarifications (#9129 )	2025-06-14 18:57:14 +00:00
Graham Neubig	0c307ea12e	Lint all files in the repo (#9131 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-06-14 16:25:59 +00:00
ASTONE	be62ba6b35	add_versicode (#8221 )	2025-06-14 13:17:18 +00:00
leopardracer	13c298d35f	Minor Typo Fixes in Comments and Documentation (#9058 )	2025-06-14 12:51:38 +00:00
Engel Nyst	fd3b4ac8e6	Refactor SWE-bench instruction (#8010 )	2025-06-13 23:27:52 +02:00
Leander Maben	d84befe28f	Adding LLM Based Editing capability (#8677 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Engel Nyst <engel.nyst@gmail.com>	2025-06-09 21:57:20 +08:00
Sergey	49939c1f02	Fix typo in evaluation README.md (#8987 )	2025-06-08 14:14:07 +00:00
llamantino	880c05ed94	Fix all broken docs links across the project (#8830 ) Co-authored-by: llamantino <12345678+yourusername@users.noreply.github.com>	2025-05-31 21:24:59 -04:00
Robert Brennan	205f0234e8	Rename Conversation to ServerConversation and AppConfig to OpenHandsConfig (#8754 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-28 21:48:34 +02:00
Xuhui Zhou	14498c5e25	Feature/swe run interact (#8714 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-05-27 19:35:21 +00:00
Zhaoling Chen	efe287ce34	integrate LocAgent into OpenHands (#7371 ) Co-authored-by: czlll <gangda@huaihe.usc.edu> Co-authored-by: Hoang Tran <descience.thh10@gmail.com>	2025-05-23 22:42:58 +07:00
Ryan H. Tran	3980ba53c9	Add option to run patch evaluation on Modal (#8607 ) Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2025-05-23 00:45:45 +07:00
Engel Nyst	637cb0726a	specify condenser config for evals (#8177 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-21 22:08:57 +02:00
luolin101	1a3cb16ba6	add Visual SWE-bench benchmark (#7131 ) Co-authored-by: tsukimi <yuailun@pku.edu.cn> Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>	2025-05-19 12:08:46 +07:00
Xingyao Wang	2ecc39ffcc	[eval]: disable MCP for SWE-Bench evaluation (#8574 ) Co-authored-by: openhands <openhands@all-hands.dev> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Engel Nyst <engel.nyst@gmail.com>	2025-05-19 01:32:46 +00:00
Yueqi Song	3ca585b79f	Update run_infer.py to incorporate selection of task based on repo (#8509 )	2025-05-15 12:27:28 +08:00
omahs	4bb6ec2ee5	Fix typos (#8469 )	2025-05-13 09:34:21 +00:00
Graham Neubig	f317c03b1b	Fix inconsistent max_iterations in SWE-bench evaluation (#8467 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-13 02:07:57 +00:00
Graham Neubig	689d3c9046	Update pre-commit hook versions to most recent versions (#8343 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-05-08 03:59:13 +00:00
Engel Nyst	985e20d529	[chore] Run full agent pre-commit (#8235 )	2025-05-03 11:24:03 -04:00
Qi Liu	3d22520992	[Feat] add multi-swe-bench (#8174 ) Co-authored-by: ByteDance User <tiger@bytedance.localdomain>	2025-05-01 00:23:19 +00:00
Michael Panchenko	14564b25d6	Fix linting (#7965 )	2025-04-21 06:34:40 +08:00
Engel Nyst	a2c55cfdef	Refactor to clean up and move utility/legacy out of the agent (#7917 )	2025-04-19 01:53:33 +08:00
Xingyao Wang	7c23993344	fix(eval): typo in SWE_Bench evaluation (#7930 )	2025-04-19 00:31:08 +08:00
Engel Nyst	9b9b1291fc	[chore] Just linting on swe-bench files (#7918 )	2025-04-18 22:12:01 +08:00
Niels Mündler	4b124d5906	Add inference for SWT-Bench (#7201 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Calvin Smith <email@cjsmith.io>	2025-04-17 14:49:42 -06:00
juanmichelini	6bcebd4b9d	Jetbrains CI Benchmark (#7811 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-04-17 15:10:20 +00:00
Engel Nyst	5e5bf23f9c	[Evaluation] Fix KeyError when the instance failed prematurely (#7864 )	2025-04-15 15:19:31 +00:00
Engel Nyst	d05a6f30e1	[Refactor] Rename codeact_* agent options to simple name (#7853 )	2025-04-15 00:14:13 +02:00
sp.wack	72b5e18898	fix(backend): Return 400 if trying to open a binary file (#7825 )	2025-04-11 22:47:57 +00:00
Engel Nyst	bb98d94b35	[evaluation] fix missing metadata (#7819 )	2025-04-11 16:58:59 +00:00
juanmichelini	53c0c5a07b	SWE-bench_verified instruction baseline improvements to 60% (#7546 ) Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>	2025-04-10 16:08:27 +00:00
Xingyao Wang	0087082643	Improve binary file handling and patch generation in SWE-bench evaluation (#7762 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-04-08 22:57:33 +00:00
Xingyao Wang	ddda30d9b7	fix(eval): iterative evaluation improvements; SWE-Bench multimodal fixes (#7739 ) Co-authored-by: Juan Michelini <juan@juan.com.uy> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: openhands <openhands@all-hands.dev>	2025-04-09 02:44:03 +08:00
Engel Nyst	22cf5144cc	Fix integration test (#7747 )	2025-04-07 22:31:50 -04:00
Boxuan Li	d7c49a0656	[Evaluation] Fix sandbox config in TAC (#7684 )	2025-04-03 08:19:10 +00:00
Boxuan Li	34bf6a6402	[Evaluation] Fix run_infer.py path in TAC (#7683 )	2025-04-03 04:34:02 +00:00
Shixian Sheng	4fb073d1ea	Fixed a few hyperlinks. Translated some texts (#7652 )	2025-04-02 22:10:19 +00:00
Rohit Malhotra	9adfcede31	(Hotfix): Track reason for Error AgentState (#7584 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 21:24:42 +00:00
Xingyao Wang	648c8ffb21	(llm): Support OpenHands LM (#7598 ) Co-authored-by: mamoodi <mamoodiha@gmail.com>	2025-03-31 17:29:31 +00:00
Xingyao Wang	54236f9617	[eval] Support SWE-Bench Multimodal (#7122 ) Co-authored-by: openhands <openhands@all-hands.dev>	2025-03-31 07:42:44 -04:00
tofarr	1230b229b5	Replace use of requests with httpx (#7354 )	2025-03-26 13:37:10 +00:00

1 2 3 4 5 ...

373 Commits