38 Commits

Author SHA1 Message Date
tobitege
554636cf2a
(fix) Fix runtime (RT) tests and split tests in 2 actions (openhands/root) (#3791)
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-09-14 21:51:30 +02:00
tobitege
57390eb26b
(enh) docker pull (if not found locally) with progress info (#3682) 2024-09-14 06:26:42 +02:00
Xingyao Wang
78c5f58adc
refactor & improve retry for the reliability of RemoteRuntime & evaluation (#3846) 2024-09-13 07:37:07 -04:00
Xingyao Wang
2fe2f4c530
[eval] increase timeout for SWEBench eval init/complete (#3829)
* [eval] increase timeout for swebench eval init/complete

* allow CmdRunAction to optionally block when .timeout is setted

* fix unit test for serialization

* fix unit tests for security analyzer

* fix integration tests

* add more timeout
2024-09-12 15:20:58 +00:00
Robert Brennan
c6105f264f
Improvements to file list UI (#3794)
* move filematching logic into server

* wait until ready before returning

* show loading message instead of empty

* logspam

* delint

* fix type

* add a few more default ignores
2024-09-11 09:44:37 -04:00
mamoodi
f3b2085f9b
Reduce runtime tests duration by running them across CPUs (#3779)
* Reduce runtime tests duration by running them across CPUs

* fix hardcoded image name

* test two cpus

* Test folder change

* Up the CPU to 4 again to test

* Change to 3 CPUs

* Down to 2

* Add param to remove all openhands containers

* Add comment

* Add reruns just in case

* Fix ordering of if
2024-09-10 14:31:17 -04:00
tobitege
5ffff742de
Regression fixes: LLM logging; client readiness (EventStreamRuntime) (#3776)
* Regression fixes: LLM logging; client readiness (EventStreamRuntime)

* fix llm.async_completion_wrapper bad edit in previous commit

* regen couple of mock files

* client: always log initialized status
2024-09-09 21:02:43 +02:00
tobitege
2b7517e542
(enh) add caching@v4 action in workflows (#3780)
* dummy test change

* regen yml: 1st install python 3.11, then poetry

* fix caching for poetry; old entry for python was rather useless

* fix steps order (cache before poetry)

* add poetry caching to ghcr_runtime; fix fork conditions

* ghcr_runtime: more caching actions; condition fixes

* fix interim action error (order of steps)

* cache@v4 instead of v3

* fixed interim typo for 2 fork conditions

* runtime/test_env_vars: compacted multiple tests into one to reduce time

* ugh if fork condition changes again
2024-09-09 10:49:49 +02:00
Robert Brennan
ab3851593d
Support interactive commands (#3653)
* hacky solution for interactive commands

* add more behavior

* debug

* fix continue functionality

* remove prints

* refactor a bit

* reduce test sleep

* fix python version

* fix pre-commit issue

* Regenerate integration tests

* Update openhands/runtime/client/client.py

* revert some prompt stuff

* several integration mock files regenerated

* execute_action: remove duplicate exception logging

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
2024-09-08 21:45:51 +02:00
Xingyao Wang
688068a44e
Fix issues for running RemoteRuntime in parallel on SWE-Bench (#3716)
* feat: add SWE-bench fullset support

* fix instance image list

* update eval script and documentation

* increase timeout for remote runtime

* add push script

* handle the case when ret push is an generator

* update pbar

* set SWE-Bench default to run SWE-Bench lite

* add script to cleanup remote runtime

* fix the cases when tag is too long

* update README

* update readme for cleanup

* rename od to oh

* Update evaluation/swe_bench/README.md

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/swe_bench/README.md

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/swe_bench/scripts/cleanup_remote_runtime.sh

Co-authored-by: Graham Neubig <neubig@gmail.com>

* gets API key and Runtime from env var

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-09-05 10:34:31 +08:00
tobitege
bc31fb15fe
(fix) CodeActAgent: fix issues with vision support in prompts (#3665)
* CodeActAgent: fix message prep if prompt caching is not supported

* fix python version in regen tests workflow

* fix in conftest "mock_completion" method

* add disable_vision to LLMConfig; revert change in message parsing in llm.py

* format messages in several files for completion

* refactored message(s) formatting (llm.py); added vision_is_active()

* fix a unit test

* regenerate: added LOG_TO_FILE and FORCE_REGENERATE env flags

* try to fix path to logs folder in workflow

* llm: prevent index error

* try FORCE_USE_LLM in regenerate

* tweaks everywhere...

* fix 2 random unit test errors :(

* added FORCE_REGENERATE_TESTS=true to regenerate CLI

* fix test_lint_file_fail_typescript again

* double-quotes for env vars in workflow; llm logger set to debug

* fix typo in regenerate

* regenerate iterations now 20; applied iteration counter fix by Li

* regenerate: pass FORCE_REGENERATE flag into env

* fixes for int tests. several mock files updated.

* browsing_agent: fix response_parser.py adding ) to empty response

* test_browse_internet: fix skipif and revert obsolete mock files

* regenerate: fi bracketing for http server start/kill conditions

* disable test_browse_internet for CodeAct*Agents; mock files updated after merge

* missed to include more mock files earlier

* reverts after review feedback from Li

* forgot one

* browsing agent test, partial fixes and updated mock files

* test_browse_internet works in my WSL now!

* adapt unit test test_prompt_caching.py

* add DEBUG to regenerate workflow command

* convert regenerate workflow params to inputs

* more integration test mock files updated

* more files

* test_prompt_caching: restored test_prompt_caching_headers purpose

* file_ops: fix potential exception, like "cross device copy"; fixed mock files accordingly

* reverts/changes wrt feedback from xingyao

* updated docs and config template

* code cleanup wrt review feedback
2024-09-04 17:58:30 +02:00
Xingyao Wang
d8a87d7ccb
[Eval] Make SWE-Bench run_infer.sh to default to run SWE-Bench Lite (#3704)
* feat: add SWE-bench fullset support

* fix instance image list

* update eval script and documentation

* increase timeout for remote runtime

* add push script

* handle the case when ret push is an generator

* update pbar

* set SWE-Bench default to run SWE-Bench lite
2024-09-04 00:58:14 +08:00
Mislav Balunovic
f979d612ec
(fix) confirmation mode bugfix for the EventStreamRuntime (#3695) 2024-09-02 13:27:33 +00:00
Boxuan Li
75d5591816
file_ops: Use tmp file for original linting (#3681)
Fix a potential issue that might lead to file corruption when edit linting is enabled

#3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement.

So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because:

Flake8 has no side-effect, so not a problem as of now.
We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed.
Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent.

Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.
2024-09-01 23:36:57 -07:00
tobitege
7068a73ae7
(enh) Improve CodeActAgent's file editing reliability (#3610)
* improve file editing prompts and unit test
converted most raise calls to a _output_error call in file_ops.py

* tweaks in test_agent_skill.py wrt to SEP separator

* tweaked the separator

* remove server runtime remnants and TEST_RUNTIME references

* restore use of TEST_RUNTIME args and variables

* fix integration tests

* added hint to properly escape docstrings

* revert latest prompt change

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-09-02 06:03:22 +02:00
Xingyao Wang
090c911a50
(refactor) Make Runtime class synchronous (#3661)
* change runtime to be synchronous

* fix test runtime with the new interface

* fix arg

* fix eval

* fix missing config attribute

* fix plugins

* fix on_event by revert it back to async

* update upload_file endpoint

* fix argument to upload file

* remove unncessary async for eval;
fix evaluation run in parallel

* use asyncio to run controller for eval

* revert file upload

* truncate eval test result output
2024-08-30 01:37:03 +00:00
Xingyao Wang
8b1f207d39
feat: support remote runtime (#3406)
* feat: refactor building logic into runtime builder

* return image name

* fix testcases

* use runtime builder for eventstream runtime

* have runtime builder return str

* add api_key to sandbox config

* draft remote runtime

* remove extra if clause

* initialize runtime based on box class

* add build logic

* use base64 for file upload

* get runtime image prefix from API

* replace ___ with _s_ to make it a valid image name

* use /build to start build and /build_status to check the build progress

* update logging

* fix exit code

* always use port

* add remote runtime

* rename runtime

* fix tests import

* make dir first if work_dir does not exists;

* update debug print to remote runtime

* fix exit close_sync

* update logging

* add retry for stop

* use all box class for test keep prompt

* fix test browsing

* add retry stop

* merge init commands to save startup time

* fix await

* remove sandbox url

* support execute through specific runtime url

* fix file ops

* simplify close

* factor out runtime retry code

* fix exception handling

* fix content type error (e.g., bad gateway when runtime is not ready)

* add retry for wait until alive;
add retry for check image exists

* Revert "add retry for wait until alive;"

This reverts commit dd013cd2681a159cd07747497d8c95e145d01c32.

* retry when wait until alive

* clean up msg

* directly save sdist to temp dir for _put_source_code_to_dir

* support running testcases in parallel

* tweak logging;
try to close session

* try to close session even on exception

* update poetry lock

* support remote to run integration tests

* add warning for workspace base on remote runtime

* set default runtime api

* remove server runtime

* update poetry lock

* support running swe-bench (n=1) eval on remoteruntime

* add a timeout of 30 min

* add todo for docker namespace

* update poetry loc
2024-08-29 15:53:37 +00:00
tobitege
8fca5a5354
linter and test_aider_linter extensions for eslint (#3543)
* linter and test_aider_linter extensions for eslint

* linter tweaks

* try enabling verbose output in linter test

* one more option for linter test

* try conftest.py for tests/unit folder

* enable verbose mode in workflow; remove conftest.py again

* debug print statements of linter results

* skip some tests if eslint is not installed at all

* more tweaks

* final test skip setups

* code quality revisions

* fix test again

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-08-29 10:40:43 +02:00
tobitege
daeff3dfaf
startup handling and logging of docker images tweaked (#3645) 2024-08-28 22:17:58 +00:00
tobitege
9c39f07430
(enh) Aider-Bench: make resumable with skip_num arg (#3626)
* added optional START_ID env flag to resume from that instance id

* prepare_dataset: fix comparisons by using instance id's as int

* aider bench complete_runtime: close runtime to close container

* added matrix display of instance id for logging

* fix typo in summarize_results.py saying summarise_results

* changed start_id to skip_num to skip rows from dataset (start_id wasn't supportable)

* doc changes about huggingface spaces to temporarily point back to OD
2024-08-28 15:42:01 +00:00
Xingyao Wang
98081b9b1b
(eval) EOF fixes for SWE-Bench evaluation (#3623)
* add error handling for client eof

* remove root check

* remove set -e

* echo USER to fix for swebench infer

* fix entry timeout

* add timeout;
fix runtime close
2024-08-27 21:09:31 +00:00
tobitege
0b8779447a
New README for OpenHands/openhands/runtime folder (#3576)
* new OpenHands/openhands/runtime/README.md - made by OpenHands

* move parts to server readme; fix OD runtime in docs
2024-08-27 21:04:50 +00:00
tobitege
1fddc77247
(feat) runtime: in _wait_until_alive upon start wait for client to have initialized too (#3612)
* runtime: in _wait_until_alive wait initially for client to initialize

* fix typo in runtime log entry
2024-08-27 17:11:32 +02:00
tofarr
6ce77e157b
Fix pypi build (#3548)
* Fix pypi build

The package on pypi only included opendevin/* (the poetry default). It also needs to include agenthub/*

* Bumped version so people will actually get it!

* Fix package definition
* Updated poetry lock file
* Update package name to openhands-ai
* Add py.typed to indicate that OpenHands has type annotations

* Replace package name with openhands_ai

* Fix tests to reflect new name

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-08-26 01:31:37 -06:00
Graham Neubig
f9088766e8
Allow setting of runtime container image (#3573)
* Add runtime container image setting

* Fix typo in test

* Fix sandbox base container image

* Update variables

* Update to base_container_image

* Update tests/unit/test_config.py

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* Fixed eval

* Fixed container_image

* Fix typo

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-08-25 23:05:41 +00:00
Robert Brennan
356d9b34be
Add CLI mode (#3564)
* set log levels

* basic cli flow

* basic display

* better exits

* set log level

* fix messages

* clean up logs

* better exits

* better printing

* add todo
2024-08-26 06:10:21 +08:00
Robert Brennan
b63dec4b2e
Add back docker caching, simplify docker builds (#3546)
* fix multiarch

* remove extra push

* add back tag file

* fix cache tag

* add login step

* fix login

* try to fix save

* fix output maybe

* rm outputs

* remove tars

* fix refs

* fix runtime dep

* force rebuild

* lowercase image

* add suffix to build tags for runtime

* update matrix

* fix cut

* fix cut again

* add back matrix

* Update containers/build.sh

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-08-23 17:01:18 +00:00
tobitege
fc5f026942
prevent 500 server error on a just removed folder when listing files (#3553) 2024-08-23 18:05:38 +02:00
tofarr
8d47cebde9
Fix spaces in path (#3547)
* Fix for issue where spaces in path results in error
2024-08-23 07:29:41 -06:00
Ikko Eltociear Ashimine
87cc28beca
chore: update client.py (#3542)
occurence -> occurrence
2024-08-23 01:18:16 +08:00
Aaron Xia
dc0a1f3940
Fix wrong doc url (#3531)
* Update custom-sandbox-guide.md

update https://docs.all-hands.dev/modules/usage/architecture/runtime

* Update runtime_build.py

update url

* Update README.md

update url
2024-08-22 13:16:27 +02:00
Xingyao Wang
b19b724eae
feat: show exact python interpreter to the agent in IPython and Bash (#3448)
* try to fix pip unavailable

* update test case for pip

* force rebuild in CI

* remove extra symlink

* fix newline

* added semi-colon to line 31

* Dockerfile.j2: activate env at the end

* Revert "Dockerfile.j2: activate env at the end"

This reverts commit cf2f5651021fe80d4ab69a35a85f0a35b29dc3d7.

* cleanup Dockerfile

* switch default python image

* remove image agnostic (no longer used)

* fix tests

* simplify integration tests default image

* add nodejs specific runtime tests

* update tests and workflows

* switch to nikolaik/python-nodejs:python3.11-nodejs22

* update build sh to output image name correctly

* increase custom images to test

* fix test

* fix test

* fix double quote

* try fixing ci

* update ghcr workflow

* fix artifact name

* try to fix ghcr again

* fix workflow

* save built image to correct dir

* remove extra -docker-image

* make last tag to be human readable image tag

* fix hyphen to underscore

* run test runtime on all tags

* revert app build

* separate ghcr workflow

* update dockerfile for eval

* fix tag for test run

* try fix tag

* try fix tag via matrix output

* try workflow again

* update comments

* try fixing test matrix

* fix artifact name

* try fix tag again

* Revert "try fix tag again"

This reverts commit b369badd8cccf4a526e36d27eafb77ea2d32f6be.

* tweak filename

* try different path

* fix filepath

* try fix tag artifact path again

* save json instead of line

* update matrix

* print all tags in workflow

* support only streaming diff logs from the runtime client

* remove strip from log line to fix indentation

* get py interpreter for jupyter

* rstrip to remove newline on the rightside for logging

* fix blocking issue for stream logs

* set python interpreter path in bash ps1

* update testcase for jupyter py interpreter path

* remove accidentally added changes

* remove accidentally added changes

* only print dockerfile when debug

* add docs

* remove extra tests that weren't supposed to be in this pr

* add back missing test

* revert

* make LogBuffer synchronous to fix hang in integration tests

* fix integration tests

* Update opendevin/runtime/client/client.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* fix test case

* fix integration tests

* change deque to list

* update integration tests

* rename test runtime

* fix docs

* rename opendevin to openhands in tests

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <10787084+tobitege@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-08-21 20:08:50 +00:00
tobitege
c7886168e1
(feat) implement typescript linting for CodeActAgent (#3452)
* tweaks to linter.py to prep for typescript linting (not implemented yet)

* fix 2 linter unit tests

* simpler basic_lint output; updated unit test

* fix default gpt-4o model name in aider default config

* linter.py: use tsc (typescript compiler) for linting; added more tests

* make typescript linting be more forgiving

* use npx instead of npm to install typescript in Dockerfile.j2

* Fix merge mistake

* removed npx call from Dockerfile.j2

* fix run_cmd to use code parameter; replace regex in test

* fix test_lint_file_fail_typescript to ignore leading path characters

* added TODO comment to extract_error_line_from

* fixed bug in ts_lint with wrong line number parsing
2024-08-21 21:41:35 +02:00
tobitege
7ef5a2d1ff
(fix) Rename last opendevin occurences (#3490)
* renaming more opendevin occurences

* remove DOCKER_IMAGE variable from Makefile

* Revert rename in evaluation/swe_bench/run_infer.py

Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>

---------

Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2024-08-20 16:45:26 +00:00
Mahmood Alhawaj
6487175a31
refactored all relative paths to absolute paths (#3495) 2024-08-21 00:09:48 +08:00
Xingyao Wang
c8452f5813
fix: custom runtime image won't work for go (#3464)
* fix request param for container_image;
add test for go;

* fix go version issue

* update test to detect go version
2024-08-20 23:38:59 +08:00
Xingyao Wang
8f0f764a85
fix: CI docker image push (#3476)
* fix ghcr app

* fix ghcr runtime push

* rename od_runtime to runtime
2024-08-19 20:53:28 +00:00
Robert Brennan
01ae22ef57
Rename OpenDevin to OpenHands (#3472)
* Replace OpenDevin with OpenHands

* Update CONTRIBUTING.md

* Update README.md

* Update README.md

* update poetry lock; move opendevin folder to openhands

* fix env var

* revert image references in docs

* revert permissions

* revert permissions

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-08-20 00:44:54 +08:00