[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728)

* add newline after patch to fix patch apply

* new swebench wip

* add newline after patch to fix patch apply

* only add newline if not empty

* update swebench source and update

* update gitignore for swebench eval

* update old prep_eval

* update gitignore

* add scripts for push and pull swebench images

* update eval_infer.sh

* update eval_infer for new docker workflow

* update script to create markdown report based on report.json

* update eval infer to use update output

* update readme

* only move result to folder if running whole file

* remove set-x

* update conversion script

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* make sure last line end with newline

* switch to an fix attempt branch of swebench

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
This commit is contained in:
Xingyao Wang
2024-07-02 07:58:30 +08:00
committed by GitHub
parent 6246cb8e67
commit 6a0ffc5c61
11 changed files with 803 additions and 307 deletions

6
.gitignore vendored
View File

@@ -212,4 +212,8 @@ cache
config.toml
config.toml.bak
containers/agnostic_sandbox
containers/agnostic_sandbox
# swe-bench-eval
image_build_logs
run_instance_logs