Add a roadmap for eval (#92)

This commit is contained in:
libowen2121 2024-03-22 20:27:30 +08:00 committed by GitHub
parent 2d5c8f1060
commit 40a3614e80
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -9,6 +9,14 @@ all the preprocessing/evaluation/analysis scripts.
- Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
- Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.
## Roadmap
- Sanity check. Reproduce Devin's scores on SWE-bench using the released outputs to make sure that our harness pipeline works.
- Open source model support.
- Contributors are encouraged to submit their commits to our [forked SEW-bench repo](https://github.com/OpenDevin/SWE-bench).
- Ensure compatibility with OpenAI interface for inference.
- Serve open source models, prioritizing high concurrency and throughput.
## Tasks
### SWE-bench
- notebooks