OpenHands/tests/unit/test_runtime_build.py
Xingyao Wang 4f0a454ed6
[Arch] Support integration tests using EventStream Runtime (#3184)
* Remove global config from memory

* Remove runtime global config

* Remove from storage

* Remove global config

* Fix event stream tests

* Fix sandbox issue

* Change config

* Removed transferred tests

* Add swe env box

* Fixes on testing

* Fixed some tests

* Merge with stashed changes

* Fix typing

* Fix ipython test

* Revive function

* Make temp_dir fixture

* Remove test to avoid circular import

* fix eventstream filestore for test_runtime

* fix parse arg issue that cause integration test to fail

* support swebench pull from custom namespace

* add back simple tests for runtime

* move multi-line bash tests to test_runtime;
support multi-line bash for esruntime;

* add testcase to handle PS2 prompt

* use bashlex for bash parsing to handle multi-line commands;
add testcases for multi-line commands

* revert ghcr runtime change

* Apply stash

* fix run as other user;
make test async;

* fix test runtime for run as od

* add run-as-devin to all the runtime tests

* handle the case when username is root

* move all run-as-devin tests from sandbox;
only tests a few cases on different user to save time;

* move over multi-line echo related tests to test_runtime

* fix user-specific jupyter by fixing the pypoetry virtualenv folder

* make plugin's init async;
chdir at initialization of jupyter plugin;
move ipy simple testcase to test runtime;

* support agentskills import in
move tests for jupyter pwd tests;
overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter;
make agentskills read env var lazily, in case env var is updated;

* fix ServerRuntime agentskills issue

* move agnostic image test to test_runtime

* merge runtime tests in CI

* fix enable auto lint as env var

* update warning message

* update warning message

* test for different container images

* change parsing output as debug

* add exception handling for update_pwd_decorator

* fix unit test indentation

* add plugins as default input to Runtime class;
remove init_sandbox_plugins;
implement add_env_var (include jupyter) in the base class;

* fix server runtime auto lint

* Revert "add exception handling for update_pwd_decorator"

This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50.

* tries to print debugging info for agentskills

* explictly setting uid (try fix permission issue)

* Revert "tries to print debugging info for agentskills"

This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de.

* set sandbox user id during testing to hopefully fix the permission issue

* add browser tools for server runtime

* try to debug for old pwd

* update debug cmd

* only test agnostic runtime when TEST_RUNTIME is Server

* fix temp dir mkdir

* load TEST_RUNTIME at the beginning

* remove ipython tests

* only log to file when DEBUG

* default logging to project root

* temporarily remove log to file

* fix LLM logger dir

* fix logger

* make set pwd an optional aux action

* fix prev pwd

* fix infinity recursion

* simplify

* do not import the whole od library to avoid logger folder by jupyter

* fix browsing

* increase timeout

* attempt to fix agentskills yet again

* clean up in testcases, since CI maybe run as non-root

* add _cause attribute for event.id

* remove parent

* add a bunch of debugging statement again for CI :(

* fix temp_dir fixture

* change all temp dir to follow pytest's tmp_path_factory

* remove extra bracket

* clean up error printing a bit

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* add typing for tmp dir fixture

* clear the directory before running the test to avoid weird CI temp dir

* remove agnostic test case for server runtime

* Revert "remove agnostic test case for server runtime"

This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3.

* disable agnostic tests in CI

* fix test

* make sure plugin arg is not passed when no plugin is specified;
remove redundant on_event function;

* move mock prompt

* rename runtime

* remove extra logging

* refactor run_controller's interface;
support multiple runtime for integration test;
filter out hostname for prompt

* uncomment other tests

* pass the right runtime to controller

* log runtime when start

* uncomment tests

* improve symbol filters

* add intergration test prompts that seemd ok

* add integration test workflow

* add python3 to default ubuntu image

* symlink python and fix permission to jupyter pip

* add retry for jupyter execute server

* fix jupyter pip install;
add post-process for jupyter pip install;
simplify init by add agent_skills path to PYTHONPATH;
add testcase to tests jupyter pip install;

* fix bug

* use ubuntu:22.04 for eventstream integration tests

* add todo

* update testcase

* remove redundant code

* fix unit test

* reduce dependency for runtime

* try making llama-index an optional dependency that's not installed by default

* remove pip install since it seemd not needed

* log ipython execution;
await write message since it returns a future

* update ipy testcase

* do not install llama-index in CI

* do not install llama-index in the app docker as well

* set sandbox container image in the integration test script

* log plugins & env var for runtime

* update conftest for sha256

* add git

* remove all non-alphanumeric chalracters

* add working ipy module tests!

* default to use host network

* remove is_async from browser to make thing a little more reliable;
retry loading browser when error;

* add sleep to wait a bit for http server

* kill http server before regenerate browsing tests

* fix browsing

* only set sandbox container image if undefined

* skip empty config value

* update evaluation to use the latest run_controller

* revert logger in execute_server to be compatible with server runtime

* revert logging level to fix jupyter

* set logger level

* revert the logging

* chmod for workspace to fix permission

* support getting timeout from action

* update test for server runtime

* try to fix file permission

* fix test_cmd_run_action_serialization_deserialization test (added timeout)

* poetry: pip 24.2, torch 2.2.2

* revert adding pip to pyproject.toml

* add build to dependencies in pyproject.toml

* forgot poetry lock --no-update

* fix a DelegatorAgent prompt_002.log (timeout)

* fix a DelegatorAgent prompt_003.log (timeout)

* couple more timeout attribs in prompt files

* some more prompt files

* prompts galore

* add clarification comment for timeout

* default timeout to config

* add assert

* update integraton tests for eventstream

* update integration tests

* fix timeout for action<->dict

* remove redundant on_event

* fix action execution timeout

* updatelock

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: tobitege <tobitege@gmx.de>
2024-08-01 22:07:39 +00:00

204 lines
6.9 KiB
Python

import os
import tarfile
from importlib.metadata import version
from unittest.mock import MagicMock, patch
import pytest
import toml
from pytest import TempPathFactory
from opendevin.runtime.utils.runtime_build import (
_generate_dockerfile,
_get_package_version,
_put_source_code_to_dir,
build_runtime_image,
get_new_image_name,
)
OD_VERSION = f'od_v{_get_package_version()}'
RUNTIME_IMAGE_PREFIX = 'od_runtime'
@pytest.fixture
def temp_dir(tmp_path_factory: TempPathFactory) -> str:
return str(tmp_path_factory.mktemp('test_runtime_build'))
def test_put_source_code_to_dir(temp_dir):
folder_name = _put_source_code_to_dir(temp_dir)
# assert there is a file called 'project.tar.gz' in the temp_dir
assert os.path.exists(os.path.join(temp_dir, 'project.tar.gz'))
# untar the file
with tarfile.open(os.path.join(temp_dir, 'project.tar.gz'), 'r:gz') as tar:
tar.extractall(path=temp_dir)
# check the source file is the same as the current code base
assert os.path.exists(os.path.join(temp_dir, folder_name, 'pyproject.toml'))
# make sure the version from the pyproject.toml is the same as the current version
with open(os.path.join(temp_dir, folder_name, 'pyproject.toml'), 'r') as f:
pyproject = toml.load(f)
_pyproject_version = pyproject['tool']['poetry']['version']
assert _pyproject_version == version('opendevin')
def test_generate_dockerfile_scratch():
base_image = 'debian:11'
source_code_dirname = 'dummy'
dockerfile_content = _generate_dockerfile(
base_image,
source_code_dirname=source_code_dirname,
skip_init=False,
)
assert base_image in dockerfile_content
assert 'apt-get update' in dockerfile_content
assert 'apt-get install -y wget sudo apt-utils' in dockerfile_content
assert (
'RUN /opendevin/miniforge3/bin/mamba install conda-forge::poetry python=3.11 -y'
in dockerfile_content
)
# Check the update command
assert f'mv /opendevin/{source_code_dirname} /opendevin/code' in dockerfile_content
assert (
'/opendevin/miniforge3/bin/mamba run -n base poetry install'
in dockerfile_content
)
def test_generate_dockerfile_skip_init():
base_image = 'debian:11'
source_code_dirname = 'dummy'
dockerfile_content = _generate_dockerfile(
base_image,
source_code_dirname=source_code_dirname,
skip_init=True,
)
# These commands SHOULD NOT include in the dockerfile if skip_init is True
assert 'RUN apt update && apt install -y wget sudo' not in dockerfile_content
assert (
'RUN /opendevin/miniforge3/bin/mamba install conda-forge::poetry python=3.11 -y'
not in dockerfile_content
)
# These update commands SHOULD still in the dockerfile
assert (
f'RUN mv /opendevin/{source_code_dirname} /opendevin/code' in dockerfile_content
)
assert (
'/opendevin/miniforge3/bin/mamba run -n base poetry install'
in dockerfile_content
)
def test_get_new_image_name_eventstream():
base_image = 'debian:11'
new_image_name = get_new_image_name(base_image)
assert new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
base_image = 'ubuntu:22.04'
new_image_name = get_new_image_name(base_image)
assert (
new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_22.04'
)
base_image = 'ubuntu'
new_image_name = get_new_image_name(base_image)
assert (
new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_latest'
)
def test_get_new_image_name_eventstream_dev_mode():
base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
new_image_name = get_new_image_name(base_image, dev_mode=True)
assert (
new_image_name == f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_debian_tag_11'
)
base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_22.04'
new_image_name = get_new_image_name(base_image, dev_mode=True)
assert (
new_image_name
== f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_ubuntu_tag_22.04'
)
base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_latest'
new_image_name = get_new_image_name(base_image, dev_mode=True)
assert (
new_image_name
== f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_ubuntu_tag_latest'
)
def test_get_new_image_name_eventstream_dev_invalid_base_image():
with pytest.raises(ValueError):
base_image = 'debian:11'
get_new_image_name(base_image, dev_mode=True)
with pytest.raises(ValueError):
base_image = 'ubuntu:22.04'
get_new_image_name(base_image, dev_mode=True)
with pytest.raises(ValueError):
base_image = 'ubuntu:latest'
get_new_image_name(base_image, dev_mode=True)
@patch('opendevin.runtime.utils.runtime_build._build_sandbox_image')
@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
def test_build_runtime_image_from_scratch(mock_docker_client, mock_build_sandbox_image):
base_image = 'debian:11'
mock_docker_client.images.list.return_value = []
image_name = build_runtime_image(base_image, mock_docker_client)
assert image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
mock_build_sandbox_image.assert_called_once_with(
base_image,
f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11',
mock_docker_client,
skip_init=False,
)
@patch('opendevin.runtime.utils.runtime_build._build_sandbox_image')
@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
def test_build_runtime_image_exist_no_update_source(
mock_docker_client, mock_build_sandbox_image
):
base_image = 'debian:11'
mock_docker_client.images.list.return_value = [
MagicMock(tags=[f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'])
]
image_name = build_runtime_image(base_image, mock_docker_client)
assert image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
mock_build_sandbox_image.assert_not_called()
@patch('opendevin.runtime.utils.runtime_build._build_sandbox_image')
@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
def test_build_runtime_image_exist_with_update_source(
mock_docker_client, mock_build_sandbox_image
):
base_image = 'debian:11'
mock_docker_client.images.list.return_value = [
MagicMock(tags=[f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'])
]
image_name = build_runtime_image(
base_image, mock_docker_client, update_source_code=True
)
assert image_name == f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_debian_tag_11'
mock_build_sandbox_image.assert_called_once_with(
f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11',
f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_debian_tag_11',
mock_docker_client,
skip_init=True,
)