Jiaxin Pei 60e043ed8b
Creating the evaluation folder and adding initial analysis of Devin's evaluation outputs (#50)
* Creating the folder for experiments, adding initial analysis of Devin's outputs on SWE-bench

* typo fixed

* update devin's evaluation outputs analysis

* remove data files from commit

* add logistics for managing the experiment folder in README

* Change folder naming and update logistics.

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
2024-03-20 19:59:51 +08:00

698 B

Evaluation

This folder contains code and resources to run experiments and evaluations.

Logistics

To better organize the evaluation folder, we should follow the rules below:

  • Each subfolder contains a specific benchmark or experiment. For example, evaluation/SWE-bench should contain all the preprocessing/evaluation/analysis scripts.
  • Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
  • Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.

Tasks

SWE-bench

  • analysis
    • devin_eval_analysis.ipynb: notebook analyzing devin's outputs