Creating the evaluation folder and adding initial analysis of Devin's evaluation outputs (#50)

* Creating the folder for experiments, adding initial analysis of Devin's outputs on SWE-bench

* typo fixed

* update devin's evaluation outputs analysis

* remove data files from commit

* add logistics for managing the experiment folder in README

* Change folder naming and update logistics.

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
This commit is contained in:
Jiaxin Pei 2024-03-20 07:59:51 -04:00 committed by GitHub
parent 1eade7d188
commit 60e043ed8b
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 280 additions and 0 deletions

15
evaluation/README.md Normal file
View File

@ -0,0 +1,15 @@
# Evaluation
This folder contains code and resources to run experiments and evaluations.
## Logistics
To better organize the evaluation folder, we should follow the rules below:
- Each subfolder contains a specific benchmark or experiment. For example, `evaluation/SWE-bench` should contain
all the preprocessing/evaluation/analysis scripts.
- Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
- Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.
## Tasks
### SWE-bench
- analysis
- devin_eval_analysis.ipynb: notebook analyzing devin's outputs

File diff suppressed because one or more lines are too long