[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728)

mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 05:37:20 +08:00

* add newline after patch to fix patch apply

* new swebench wip

* add newline after patch to fix patch apply

* only add newline if not empty

* update swebench source and update

* update gitignore for swebench eval

* update old prep_eval

* update gitignore

* add scripts for push and pull swebench images

* update eval_infer.sh

* update eval_infer for new docker workflow

* update script to create markdown report based on report.json

* update eval infer to use update output

* update readme

* only move result to folder if running whole file

* remove set-x

* update conversion script

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* make sure last line end with newline

* switch to an fix attempt branch of swebench

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

This commit is contained in:

Xingyao Wang

2024-07-02 07:58:30 +08:00

committed by

GitHub

parent 6246cb8e67

commit 6a0ffc5c61

11 changed files with 803 additions and 307 deletions

6

.gitignore vendored

View File

@@ -212,4 +212,8 @@ cache
 config.toml
 config.toml.bak
 containers/agnostic_sandbox
 containers/agnostic_sandbox
 # swe-bench-eval
 image_build_logs
 run_instance_logs

[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728)

6 .gitignore vendored Unescape Escape View File

6

.gitignore vendored

View File