mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 05:48:36 +08:00
Creating the evaluation folder and adding initial analysis of Devin's evaluation outputs (#50)
* Creating the folder for experiments, adding initial analysis of Devin's outputs on SWE-bench * typo fixed * update devin's evaluation outputs analysis * remove data files from commit * add logistics for managing the experiment folder in README * Change folder naming and update logistics. --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com>
This commit is contained in:
parent
1eade7d188
commit
60e043ed8b
15
evaluation/README.md
Normal file
15
evaluation/README.md
Normal file
@ -0,0 +1,15 @@
|
||||
# Evaluation
|
||||
|
||||
This folder contains code and resources to run experiments and evaluations.
|
||||
|
||||
## Logistics
|
||||
To better organize the evaluation folder, we should follow the rules below:
|
||||
- Each subfolder contains a specific benchmark or experiment. For example, `evaluation/SWE-bench` should contain
|
||||
all the preprocessing/evaluation/analysis scripts.
|
||||
- Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
|
||||
- Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.
|
||||
|
||||
## Tasks
|
||||
### SWE-bench
|
||||
- analysis
|
||||
- devin_eval_analysis.ipynb: notebook analyzing devin's outputs
|
||||
265
evaluation/SWE-bench/notebooks/devin_eval_analysis.ipynb
Normal file
265
evaluation/SWE-bench/notebooks/devin_eval_analysis.ipynb
Normal file
File diff suppressed because one or more lines are too long
Loading…
x
Reference in New Issue
Block a user