mirror of https://github.com/OpenHands/OpenHands.git synced 2025-12-26 05:48:36 +08:00

History

Turn off auto linting by default, and on for swe_bench (#1861 )

Disable Python linting by default, and turn it on for SWE Bench.

It is turned off by default since this behavior is weird and somewhat annoying to end users.
It is turned on for SWE Bench because linting python files gives LLM a chance to fix the indentations.

2024-05-18 04:04:38 +00:00

regression

Feat: add stream output to exec_run (#1625 )

2024-05-16 14:37:49 +00:00

swe_bench

Turn off auto linting by default, and on for swe_bench (#1861 )

2024-05-18 04:04:38 +00:00

__init__.py

feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 )

2024-05-15 16:15:55 +00:00

README.md

feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468 )

2024-05-15 16:15:55 +00:00

README.md

Evaluation

This folder contains code and resources to run experiments and evaluations.

Logistics

To better organize the evaluation folder, we should follow the rules below:

Each subfolder contains a specific benchmark or experiment. For example, evaluation/swe_bench should contain all the preprocessing/evaluation/analysis scripts.
Raw data and experimental records should not be stored within this repo.
- For model outputs, they should be stored at this huggingface space for visualization.
Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.

Supported Benchmarks

SWE-Bench: evaluation/swe_bench

Result Visualization

Check this huggingface space for visualization of existing experimental results.

Upload your results

You can start your own fork of our huggingface evaluation outputs and submit a PR of your evaluation results to our hosted huggingface repo via PR following the guide here.