github/OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

History

Binyuan Hui f99f4ebdaa

fix: typo in the evaluation folder name. (#66 )

2024-03-20 23:00:09 +08:00

..

SWE-bench/notebooks

fix: typo in the evaluation folder name. (#66 )

2024-03-20 23:00:09 +08:00

README.md

fix: typo in the evaluation folder name. (#66 )

2024-03-20 23:00:09 +08:00

README.md

Evaluation

This folder contains code and resources to run experiments and evaluations.

Logistics

To better organize the evaluation folder, we should follow the rules below:

Each subfolder contains a specific benchmark or experiment. For example, evaluation/SWE-bench should contain all the preprocessing/evaluation/analysis scripts.
Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.

Tasks

SWE-bench

analysis
- devin_eval_analysis.ipynb: notebook analyzing devin's outputs