mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 13:52:43 +08:00
* add draft dockerfile for build all * add rsync for build * add all-in-one docker * update prepare scripts * Update swe_env_box.py * Add swe_entry.sh (buggy now) * Parse the test command in swe_entry.sh * Update README for instance eval in sandbox * revert specialized config * replace run_as_devin as an init arg * set container & run_as_root via args * update swe entry script * update env * remove mounting * allow error after swe_entry * update swe_env_box * move file * update gitignore * get swe_env_box a working demo * support faking user response & provide sandox ahead of time; also return state for controller * tweak main to support adding controller kwargs * add module * initialize plugin for provided sandbox * add pip cache to plugin & fix jupyter kernel waiting * better print Observation output * add run infer scripts * update readme * add utility for getting diff patch * use get_diff_patch in infer * update readme * support cost tracking for codeact * add swe agent edit hack * disable color in git diff * fix git diff cmd * fix state return * support limit eval * increase t imeout and export pip cache * add eval limit config * return state when hit turn limit * save log to file; allow agent to give up * run eval with max 50 turns * add outputs to gitignore * save swe_instance & instruction * add uuid to swebench * add streamlit dep * fix save series * fix the issue where session id might be duplicated * allow setting temperature for llm (use 0 for eval) * Get report from agent running log * support evaluating task success right after inference. * remove extra log * comment out prompt for baseline * add visualizer for eval * use plaintext for instruction * reduce timeout for all; only increase timeout for init * reduce timeout for all; only increase timeout for init * ignore sid for swe env * close sandbox in each eval loop * update visualizer instruction * increase max chars * add finish action to history too * show test result in metrics * add sidebars for visualizer * also visualize swe_instance * cleanup browser when agent controller finish runinng * do not mount workspace for swe-eval to avoid accidentally overwrite files * Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files" This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c. * Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"" This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e. * run jupyter command via copy to, instead of cp to mount * only print mixin output when failed * change ssh box logging * add visualizer for pass rate * add instance id to sandbox name * only remove container we created * use opendevin logger in main * support multi-processing infer * add back metadata, support keyboard interrupt * remove container with startswith * make pbar behave correctly * update instruction w/ multi-processing * show resolved rate by repo * rename tmp dir name * attempt to fix racing for copy to ssh_box * fix script * bump swe-bench-all version * fix ipython with self-contained commands * add jupyter demo to swe_env_box * make resolved count two column * increase height * do not add glob to url params * analyze obs length * print instance id prior to removal handler * add gold patch in visualizer * fix interactive git by adding a git --no-pager as alias * increase max_char to 10k to cover 98% of swe-bench obs cases * allow parsing note * prompt v2 * add iteration reminder * adjust user response * adjust order * fix return eval * fix typo * add reminder before logging * remove other resolve rate * re adjust to new folder structure * support adding eval note * fix eval note path * make sure first log of each instance is printed * add eval note * fix the display for visualizer * tweak visualizer for better git patch reading * exclude empty patch * add retry mechanism for swe_env_box start * fix ssh timeout issue * add stat field for apply test patch success * add visualization for fine-grained report * attempt to support monologue agent by constraining it to single thread * also log error msg when stopeed * save error as well * override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp * add retry mechanism for sshbox * remove retry for swe env box * try to handle loop state stopped * Add get report scripts * Add script to convert agent output to swe-bench format * Merge fine grained report for visualizer * Update eval readme * Update README.md * Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite * Update the script to get model report * Update get_model_report.sh * Update get_agent_report.sh * Update report merge script * Add agent output conversion script * Update swe_lite_env_setup.sh * Add example swe-bench output files * Update eval readme * Remove redundant scripts * set iteration count down to false by default * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666) * fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm * Review Feedback * Missing None Check * Review feedback and improved error handling --------- Co-authored-by: Robert Brennan <accounts@rbren.io> * fix prepare_swe_util scripts * update builder images * update setup script * remove swe-bench build workflow * update lock * remove experiments since they are moved to hf * remove visualizer (since it is moved to hf repo) * simply jupyter execution via heredoc * update ssh_box * add initial docker readme * add pkg-config as dependency * add script for swe_bench all-in-one docker * add rsync to builder * rename var * update commit * update readme * update lock * support specify timeout for long running tasks * fix path * separate building of all deps and files * support returning states at the end of controller * remove return None * support specify timeout for long running tasks * add timeout for all existing sandbox impl * fix swe_env_box for new codebase * update llm config in config.py * support pass sandbox in * remove force set * update eval script * fix issue of overriding final state * change default eval output to hf demo * change default eval output to hf demo * fix config * only close it when it is NOT external sandbox * add scripts * tweak config * only put in hostory when state has history attr * fix agent controller on the case of run out interaction budget * always assume state is always not none * remove print of final state * catch all exception when cannot compute completion cost * Update README.md * save source into json * fix path * update docker path * return the final state on close * merge AgentState with State * fix integration test * merge AgentState with State * fix integration test * add ChangeAgentStateAction to history in attempt to fix integration * add back set agent state * update tests * update tests * move scripts for setup * update script and readme for infer * do not reset logger when n processes == 1 * update eval_infer scripts and readme * simplify readme * copy over dir after eval * copy over dir after eval * directly return get state * update lock * fix output saving of infer * replace print with logger * update eval_infer script * add back the missing .close * increase timeout * copy all swe_bench_format file * attempt to fix output parsing * log git commit id as metadata * fix eval script * update lock * update unit tests * fix argparser unit test * fix lock * the deps are now lightweight enough to be incude in make build * add spaces for tests * add eval outputs to gitignore * remove git submodule * readme * tweak git email * update upload instruction * bump codeact version for eval --------- Co-authored-by: Bowen Li <libowen.ne@gmail.com> Co-authored-by: huybery <huybery@gmail.com> Co-authored-by: Bart Shappee <bshappee@gmail.com> Co-authored-by: Robert Brennan <accounts@rbren.io>
210 lines
3.6 KiB
Plaintext
210 lines
3.6 KiB
Plaintext
# Byte-compiled / optimized / DLL files
|
|
__pycache__/
|
|
*.py[cod]
|
|
*$py.class
|
|
|
|
# C extensions
|
|
*.so
|
|
|
|
# Distribution / packaging
|
|
.Python
|
|
build/
|
|
develop-eggs/
|
|
dist/
|
|
downloads/
|
|
eggs/
|
|
.eggs/
|
|
./lib/
|
|
lib64/
|
|
parts/
|
|
sdist/
|
|
var/
|
|
wheels/
|
|
share/python-wheels/
|
|
*.egg-info/
|
|
.installed.cfg
|
|
*.egg
|
|
MANIFEST
|
|
requirements.txt
|
|
|
|
# PyInstaller
|
|
# Usually these files are written by a python script from a template
|
|
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
*.manifest
|
|
*.spec
|
|
|
|
# Installer logs
|
|
pip-log.txt
|
|
pip-delete-this-directory.txt
|
|
|
|
# Unit test / coverage reports
|
|
htmlcov/
|
|
.tox/
|
|
.nox/
|
|
.coverage
|
|
.coverage.*
|
|
.cache
|
|
nosetests.xml
|
|
coverage.xml
|
|
*.cover
|
|
*.py,cover
|
|
.hypothesis/
|
|
.pytest_cache/
|
|
cover/
|
|
|
|
# Translations
|
|
*.mo
|
|
*.pot
|
|
|
|
# Django stuff:
|
|
local_settings.py
|
|
db.sqlite3
|
|
db.sqlite3-journal
|
|
|
|
# Flask stuff:
|
|
instance/
|
|
.webassets-cache
|
|
|
|
# Scrapy stuff:
|
|
.scrapy
|
|
|
|
# Sphinx documentation
|
|
docs/_build/
|
|
|
|
# PyBuilder
|
|
.pybuilder/
|
|
target/
|
|
|
|
# Jupyter Notebook
|
|
.ipynb_checkpoints
|
|
|
|
# IPython
|
|
profile_default/
|
|
ipython_config.py
|
|
|
|
# pyenv
|
|
# For a library or package, you might want to ignore these files since the code is
|
|
# intended to run in multiple environments; otherwise, check them in:
|
|
.python-version
|
|
|
|
# pipenv
|
|
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
|
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
|
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
|
# install all needed dependencies.
|
|
#Pipfile.lock
|
|
|
|
# poetry
|
|
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
|
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
# commonly ignored for libraries.
|
|
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
|
# poetry.lock
|
|
|
|
# pdm
|
|
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
|
#pdm.lock
|
|
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
|
# in version control.
|
|
# https://pdm.fming.dev/#use-with-ide
|
|
.pdm.toml
|
|
|
|
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
|
__pypackages__/
|
|
|
|
# Celery stuff
|
|
celerybeat-schedule
|
|
celerybeat.pid
|
|
|
|
# SageMath parsed files
|
|
*.sage.py
|
|
|
|
# Environments
|
|
.env
|
|
.venv
|
|
env/
|
|
venv/
|
|
ENV/
|
|
env.bak/
|
|
venv.bak/
|
|
*venv/
|
|
|
|
# Spyder project settings
|
|
.spyderproject
|
|
.spyproject
|
|
|
|
# Rope project settings
|
|
.ropeproject
|
|
|
|
# mkdocs documentation
|
|
/site
|
|
|
|
# mypy
|
|
.mypy_cache/
|
|
.dmypy.json
|
|
dmypy.json
|
|
|
|
# Pyre type checker
|
|
.pyre/
|
|
|
|
# pytype static type analyzer
|
|
.pytype/
|
|
|
|
# Cython debug symbols
|
|
cython_debug/
|
|
|
|
# PyCharm
|
|
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
|
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
|
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
|
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
|
.idea/
|
|
.vscode/
|
|
|
|
# evaluation
|
|
evaluation/SWE-bench/data
|
|
|
|
# frontend
|
|
|
|
# dependencies
|
|
frontend/node_modules
|
|
frontend/.pnp
|
|
frontend/bun.lockb
|
|
frontend/yarn.lock
|
|
.pnp.js
|
|
|
|
# testing
|
|
frontend/coverage
|
|
|
|
# production
|
|
frontend/build
|
|
frontend/dist
|
|
|
|
# misc
|
|
.DS_Store
|
|
.env.local
|
|
.env.development.local
|
|
.env.test.local
|
|
.env.production.local
|
|
|
|
npm-debug.log*
|
|
yarn-debug.log*
|
|
yarn-error.log*
|
|
|
|
logs
|
|
|
|
# agent
|
|
.envrc
|
|
/workspace
|
|
/_test_workspace
|
|
/debug
|
|
cache
|
|
|
|
# configuration
|
|
config.toml
|
|
evaluation/swe_bench/eval_workspace
|
|
evaluation/outputs
|
|
evaluation/evaluation_outputs
|
|
test_results*
|
|
/_test_files_tmp/
|