[Arch] Streamline EventStream Runtime Image Building Logic (#3259)

* remove nocache * simplify runtime build to use hash & always update source * style * try to fix temp folder issue * fix rm tree * create build folder first (to get correct hash), then copy it over to actual build folder * fix assert * fix indentation * fix copy over * add runtime documentation * fix runtime docs * fix typo * Update docs/modules/usage/runtime.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update docs/modules/usage/runtime.md Co-authored-by: Graham Neubig <neubig@gmail.com> --------- Co-authored-by: Graham Neubig <neubig@gmail.com>
2025-12-26 05:48:36 +08:00 · 2024-08-07 06:09:38 +08:00 · 2024-08-07 06:09:38 +08:00 · bb66f15ff6
commit bb66f15ff6
parent a22ee6656b
5 changed files with 445 additions and 265 deletions
--- a/docs/modules/usage/runtime.md
+++ b/docs/modules/usage/runtime.md
@ -0,0 +1,181 @@
+---
+sidebar_position: 4
+---
+
+# OpenDevin EventStream Runtime
+
+The OpenDevin EventStream Runtime is the core component that enables secure and flexible execution of AI agent's action.
+It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system.
+
+
+## Why do we need a sandboxed runtime?
+
+OpenDevin needs to execute arbitrary code in a secure, isolated environment for several reasons:
+
+1. Security: Executing untrusted code can pose significant risks to the host system. A sandboxed environment prevents malicious code from accessing or modifying the host system's resources.
+
+2. Consistency: A sandboxed environment ensures that code execution is consistent across different machines and setups, eliminating "it works on my machine" issues.
+
+3. Resource Control: Sandboxing allows for better control over resource allocation and usage, preventing runaway processes from affecting the host system.
+
+4. Isolation: Different projects or users can work in isolated environments without interfering with each other or the host system.
+
+5. Reproducibility: Sandboxed environments make it easier to reproduce bugs and issues, as the execution environment is consistent and controllable.
+
+## How does our Runtime work?
+
+The OpenDevin Runtime system uses a client-server architecture implemented with Docker containers. Here's an overview of how it works:
+
+```mermaid
+graph TD
+    A[User-provided Custom Docker Image] --> B[OpenDevin Backend]
+    B -->|Builds| C[OD Runtime Image]
+    C -->|Launches| D[Runtime Client]
+    D -->|Initializes| E[Browser]
+    D -->|Initializes| F[Bash Shell]
+    D -->|Initializes| G[Plugins]
+    G -->|Initializes| L[Jupyter Server]
+
+    B -->|Spawn| H[Agent]
+    B -->|Spawn| I[EventStream]
+    I <--->|Execute Action to
+    Get Observation
+    via REST API
+    | D
+
+    H -->|Generate Action| I
+    I -->|Obtain Observation| H
+
+    subgraph "Docker Container"
+    D
+    E
+    F
+    G
+    L
+    end
+```
+
+1. User Input: The user provides a custom base Docker image.
+
+2. Image Building: OpenDevin builds a new Docker image (the "OD runtime image") based on the user-provided image. This new image includes OpenDevin-specific code, primarily the "runtime client."
+
+3. Container Launch: When OpenDevin starts, it launches a Docker container using the OD runtime image.
+
+4. Client Initialization: The runtime client initializes inside the container, setting up necessary components like a bash shell and loading any specified plugins.
+
+5. Communication: The OpenDevin backend (`runtime.py`) communicates with the runtime client over RESTful API, sending actions and receiving observations.
+
+6. Action Execution: The runtime client receives actions from the backend, executes them in the sandboxed environment, and sends back observations.
+
+7. Observation Return: The client sends execution results back to the OpenDevin backend as observations.
+
+
+The role of the client is crucial:
+- It acts as an intermediary between the OpenDevin backend and the sandboxed environment.
+- It executes various types of actions (shell commands, file operations, Python code, etc.) safely within the container.
+- It manages the state of the sandboxed environment, including the current working directory and loaded plugins.
+- It formats and returns observations to the backend, ensuring a consistent interface for processing results.
+
+
+## Advanced: How OpenDevin builds and maintains OD Runtime images
+
+OpenDevin uses a sophisticated approach to build and manage runtime images. This process ensures efficiency, consistency, and flexibility in creating and maintaining Docker images for both production and development environments.
+
+Check out [relavant code](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/utils/runtime_build.py) if you are interested in more details.
+
+### Image Tagging System
+
+OpenDevin uses a dual-tagging system for its runtime images to balance reproducibility with flexibility:
+
+1. Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`
+   Example: `od_runtime:abc123def456`
+
+   - This tag is based on the MD5 hash of the Docker build folder, which includes the source code (of runtime client and related dependencies) and Dockerfile.
+   - Identical hash tags guarantee that the images were built with exactly the same source code and Dockerfile.
+   - This ensures reproducibility: the same hash always means the same image contents.
+
+2. Generic tag: `{target_image_repo}:{target_image_tag}`
+   Example: `od_runtime:od_v0.8.3_ubuntu_tag_22.04`
+
+   - This tag follows the format: `od_runtime:od_v{OD_VERSION}_{BASE_IMAGE_NAME}_tag_{BASE_IMAGE_TAG}`
+   - It represents the latest build for a particular base image and OpenDevin version combination.
+   - This tag is updated whenever a new image is built from the same base image, even if the source code changes.
+
+The hash-based tag ensures exact reproducibility, while the generic tag provides a stable reference to the latest version of a particular configuration. This dual-tagging approach allows OpenDevin to efficiently manage both development and production environments.
+
+### Build Process
+
+1. Image Naming Convention:
+   - Hash-based tag: `{target_image_repo}:{target_image_hash_tag}`
+     Example: `od_runtime:abc123def456`
+   - Generic tag: `{target_image_repo}:{target_image_tag}`
+     Example: `od_runtime:od_v0.8.3_ubuntu_tag_22.04`
+
+2. Build Process:
+   - a. Convert the base image name to an OD runtime image name.
+      Example: `ubuntu:22.04` -> `od_runtime:od_v0.8.3_ubuntu_tag_22.04`
+   - b. Generate a build context (Dockerfile and OpenDevin source code) and calculate its hash.
+   - c. Check for an existing image with the calculated hash.
+   - d. If not found, check for a recent compatible image to use as a base.
+   - e. If no compatible image exists, build from scratch using the original base image.
+   - f. Tag the new image with both hash-based and generic tags.
+
+3. Image Reuse and Rebuilding Logic:
+   The system follows these steps to determine whether to build a new image or use an existing one from a user-provided (base) image (e.g., `ubuntu:22.04`):
+
+   a. If an image exists with the same hash (e.g., `od_runtime:abc123def456`), it will be reused as is.
+
+   b. If the exact hash is not found, the system will try to rebuild using the latest generic image (e.g., `od_runtime:od_v0.8.3_ubuntu_tag_22.04`) as a base. This saves time by leveraging existing dependencies.
+
+   c. If neither the hash-tagged nor the generic-tagged image is found, the system will build the image completely from scratch.
+
+4. Caching and Efficiency:
+   - The system attempts to reuse existing images when possible to save build time.
+   - If an exact match (by hash) is found, it's used without rebuilding.
+   - If a compatible image is found, it's used as a base for rebuilding, saving time on dependency installation.
+
+Here's a flowchart illustrating the build process:
+
+```mermaid
+flowchart TD
+    A[Start] --> B{Convert base image name}
+    B --> |ubuntu:22.04 -> od_runtime:od_v0.8.3_ubuntu_tag_22.04| C[Generate build context and hash]
+    C --> D{Check for existing image with hash}
+    D -->|Found od_runtime:abc123def456| E[Use existing image]
+    D -->|Not found| F{Check for od_runtime:od_v0.8.3_ubuntu_tag_22.04}
+    F -->|Found| G[Rebuild based on recent image]
+    F -->|Not found| H[Build from scratch]
+    G --> I[Tag with hash and generic tags]
+    H --> I
+    E --> J[End]
+    I --> J
+```
+
+This approach ensures that:
+
+1. Identical source code and Dockerfile always produce the same image (via hash-based tags).
+2. The system can quickly rebuild images when minor changes occur (by leveraging recent compatible images).
+3. The generic tag (e.g., `od_runtime:od_v0.8.3_ubuntu_tag_22.04`) always points to the latest build for a particular base image and OpenDevin version combination.
+
+By using this method, OpenDevin maintains an efficient and flexible system for building and managing runtime images, adapting to both development needs and production requirements.
+
+
+## Advanced: Runtime Plugin System
+
+The OpenDevin Runtime supports a plugin system that allows for extending functionality and customizing the runtime environment. Plugins are initialized when the runtime client starts up.
+
+Check [an example of Jupyter plugin here](https://github.com/OpenDevin/OpenDevin/blob/9c44d94cef32e6426ebd8deeeb52963153b2348a/opendevin/runtime/plugins/jupyter/__init__.py#L30-L63) if you want to implement your own plugin.
+
+*More details about the Plugin system are still under construction - contributions are welcomed!*
+
+Key aspects of the plugin system:
+
+1. Plugin Definition: Plugins are defined as Python classes that inherit from a base `Plugin` class.
+
+2. Plugin Registration: Available plugins are registered in an `ALL_PLUGINS` dictionary.
+
+3. Plugin Specification: Plugins are associate with `Agent.sandbox_plugins: list[PluginRequirement]`. Users can specify which plugins to load when initializing the runtime.
+
+4. Initialization: Plugins are initialized asynchronously when the runtime client starts.
+
+5. Usage: The runtime client can use initialized plugins to extend its capabilities (e.g., the JupyterPlugin for running IPython cells).
--- a/opendevin/core/config.py
+++ b/opendevin/core/config.py
@ -152,8 +152,6 @@ class SandboxConfig(metaclass=Singleton):
        enable_auto_lint: Whether to enable auto-lint.
        use_host_network: Whether to use the host network.
        initialize_plugins: Whether to initialize plugins.
-        update_source_code: Whether to update the source code in the EventStreamRuntime.
-            Used for development of EventStreamRuntime.
        od_runtime_extra_deps: The extra dependencies to install in the runtime image (typically used for evaluation).
            This will be rendered into the end of the Dockerfile that builds the runtime image.
            It can contain any valid shell commands (e.g., pip install numpy).
@ -180,7 +178,6 @@ class SandboxConfig(metaclass=Singleton):
    )
    use_host_network: bool = False
    initialize_plugins: bool = True
-    update_source_code: bool = False
    od_runtime_extra_deps: str | None = None
    od_runtime_startup_env_vars: dict[str, str] = field(default_factory=dict)
    browsergym_eval_env: str | None = None
--- a/opendevin/runtime/client/runtime.py
+++ b/opendevin/runtime/client/runtime.py
@ -82,10 +82,6 @@ class EventStreamRuntime(Runtime):
        self.container_image = build_runtime_image(
            self.container_image,
            self.docker_client,
-            # NOTE: You can need set DEBUG=true to update the source code
-            # inside the container. This is useful when you want to test/debug the
-            # latest code in the runtime docker container.
-            update_source_code=self.config.sandbox.update_source_code,
            extra_deps=self.config.sandbox.od_runtime_extra_deps,
        )
        self.container = await self._init_container(
--- a/opendevin/runtime/utils/runtime_build.py
+++ b/opendevin/runtime/utils/runtime_build.py
@ -12,6 +12,10 @@ from jinja2 import Environment, FileSystemLoader
 import opendevin
 from opendevin.core.logger import opendevin_logger as logger

+RUNTIME_IMAGE_REPO = os.getenv(
+    'OD_RUNTIME_RUNTIME_IMAGE_REPO', 'ghcr.io/opendevin/od_runtime'
+)
+

 def _get_package_version():
    """Read the version from pyproject.toml as the other one may be outdated."""
@ -128,107 +132,86 @@ def prep_docker_build_folder(


 def _build_sandbox_image(
-    base_image: str,
-    target_image_name: str,
+    docker_folder: str,
    docker_client: docker.DockerClient,
-    skip_init: bool = False,
-    extra_deps: str | None = None,
+    target_image_repo: str,
+    target_image_hash_tag: str,
+    target_image_tag: str,
 ) -> str:
-    """Build the sandbox image and return the *hash* docker image name.
+    """Build the sandbox image.

-    The hash is calculated based on the contents of the docker build folder (source code and Dockerfile). This is useful to help prevent rebuilding the image when the source code and Dockerfile are unchanged.
+    The image will be tagged as both:
+        - target_image_repo:target_image_hash_tag
+        - target_image_repo:target_image_tag
+
+    Args:
+        docker_folder: str: the path to the docker build folder
+        docker_client: docker.DockerClient: the docker client
+        target_image_repo: str: the repository name for the target image
+        target_image_hash_tag: str: the *hash* tag for the target image that is calculated based
+            on the contents of the docker build folder (source code and Dockerfile)
+            e.g., ubuntu:latest -> od_runtime:1234567890abcdef
+        target_image_tag: str: the tag for the target image that's generic and based on the base image name
+            e.g., ubuntu:latest -> od_runtime:ubuntu_tag_latest
    """
-    target_repo, target_image_tag = target_image_name.split(':')
+    # 1. Always directly build and tag using the dir_hash
+    target_image_hash_name = f'{target_image_repo}:{target_image_hash_tag}'
    try:
-        with tempfile.TemporaryDirectory() as temp_dir:
-            if skip_init:
-                logger.info(
-                    f'Reusing existing od_sandbox image [{target_image_name}] but will update the source code in it.'
-                )
-            else:
-                logger.info(f'Building agnostic sandbox image: {target_image_name}')
-
-            dir_hash = prep_docker_build_folder(
-                temp_dir, base_image, skip_init=skip_init, extra_deps=extra_deps
-            )
-            # Use dir_hash as an alternative tag for the image
-            # This is useful to help prevent rebuilding the image when the source code/Dockerfile is the same
-            target_image_hash_name = f'{target_repo}:{dir_hash}'
-
-            # Check if the hash image exists
-            if _check_image_exists(target_image_hash_name, docker_client):
-                logger.info(f'Image {target_image_hash_name} exists, skipping build.')
-            else:
-                logger.info(
-                    f'Image {target_image_name} does not exist, neither does its hash {target_image_hash_name}.\n'
-                    'Building the image...'
-                )
-
-                api_client = docker_client.api
-                build_logs = api_client.build(
-                    path=temp_dir,
-                    tag=target_image_hash_name,
-                    rm=True,
-                    decode=True,
-                    # do not use cache when skip_init is True (i.e., when we want to update the source code in the existing image)
-                    nocache=skip_init,
-                )
-
-                if skip_init:
-                    logger.info(
-                        f'Rebuilding existing od_sandbox image [{target_image_name}] to update the source code.'
-                    )
-                for log in build_logs:
-                    if 'stream' in log:
-                        print(log['stream'].strip())
-                    elif 'error' in log:
-                        logger.error(log['error'].strip())
-                    else:
-                        logger.info(str(log))
-
-                logger.info(f'Image {target_image_hash_name} build finished.')
-                image = docker_client.images.get(target_image_hash_name)
-                image.tag(target_repo, target_image_tag)
-                logger.info(
-                    f'Tagged image {target_image_hash_name} --> {target_image_name}'
-                )
-
-        # check if the image is built successfully
-        image = docker_client.images.get(target_image_hash_name)
-        if image is None:
-            raise RuntimeError(
-                f'Build failed: Image {target_image_hash_name} / {target_image_name} not found'
-            )
-        logger.info(
-            f'Image {target_image_name} (hash: {target_image_hash_name}) built successfully'
+        build_logs = docker_client.api.build(
+            path=docker_folder,
+            tag=target_image_hash_name,
+            rm=True,
+            decode=True,
        )
-        return target_image_hash_name
    except docker.errors.BuildError as e:
        logger.error(f'Sandbox image build failed: {e}')
        raise e

+    for log in build_logs:
+        if 'stream' in log:
+            print(log['stream'].strip())
+        elif 'error' in log:
+            logger.error(log['error'].strip())
+        else:
+            logger.info(str(log))

-def get_new_image_name(base_image: str, dev_mode: bool = False) -> str:
-    if dev_mode:
-        if 'od_runtime' not in base_image:
-            raise ValueError(
-                f'Base image {base_image} must be a valid od_runtime image to be used for dev mode.'
-            )
-        # remove the 'od_runtime' prefix from the base_image
-        return base_image.replace('od_runtime', 'od_runtime_dev')
-    elif 'od_runtime' in base_image:
-        # if the base image is a valid od_runtime image, we will use it as is
-        logger.info(f'Using existing od_runtime image [{base_image}]')
-        return base_image
+    # 2. Re-tag the image with a more generic tag (as somewhat of "latest" tag)
+    logger.info(f'Image [{target_image_hash_name}] build finished.')
+    image = docker_client.images.get(target_image_hash_name)
+    image.tag(target_image_repo, target_image_tag)
+    logger.info(
+        f'Re-tagged image [{target_image_hash_name}] with more generic tag [{target_image_tag}]'
+    )
+
+    # check if the image is built successfully
+    image = docker_client.images.get(target_image_hash_name)
+    if image is None:
+        raise RuntimeError(
+            f'Build failed: Image [{target_image_repo}:{target_image_hash_tag}] not found'
+        )
+    logger.info(
+        f'Image [{target_image_repo}:{target_image_hash_tag}] (hash: [{target_image_tag}]) built successfully'
+    )
+    return target_image_hash_name
+
+
+def get_runtime_image_repo_and_tag(base_image: str) -> tuple[str, str]:
+    if RUNTIME_IMAGE_REPO in base_image:
+        logger.info(
+            f'The provided image [{base_image}] is a already a valid od_runtime image.\n'
+            f'Will try to reuse it as is.'
+        )
+        if ':' not in base_image:
+            base_image = base_image + ':latest'
+        repo, tag = base_image.split(':')
+        return repo, tag
    else:
-        prefix = 'od_runtime'
        if ':' not in base_image:
            base_image = base_image + ':latest'
        [repo, tag] = base_image.split(':')
        repo = repo.replace('/', '___')
-
        od_version = _get_package_version()
-        return f'{prefix}:od_v{od_version}_image_{repo}_tag_{tag}'
+        return RUNTIME_IMAGE_REPO, f'od_v{od_version}_image_{repo}_tag_{tag}'


 def _check_image_exists(image_name: str, docker_client: docker.DockerClient) -> bool:
@ -253,95 +236,123 @@ def _check_image_exists(image_name: str, docker_client: docker.DockerClient) ->
 def build_runtime_image(
    base_image: str,
    docker_client: docker.DockerClient,
-    update_source_code: bool = False,
-    save_to_local_store: bool = False,  # New parameter to control saving to local store
-    extra_deps: str
-    | None = None,  # whether to install extra dependencies inside the image
+    extra_deps: str | None = None,
+    docker_build_folder: str | None = None,
+    dry_run: bool = False,
 ) -> str:
    """Build the runtime image for the OpenDevin runtime.

-    This is only used for **eventstream runtime**.
+    See https://docs.all-hands.dev/modules/usage/runtime for more details.
    """
-    new_image_name = get_new_image_name(base_image)
-    if base_image == new_image_name:
+    runtime_image_repo, runtime_image_tag = get_runtime_image_repo_and_tag(base_image)
+
+    # Calculate the hash for the docker build folder (source code and Dockerfile)
+    with tempfile.TemporaryDirectory() as temp_dir:
+        from_scratch_hash = prep_docker_build_folder(
+            temp_dir,
+            base_image=base_image,
+            skip_init=False,
+            extra_deps=extra_deps,
+        )
+
+    # hash image name, if the hash matches, it means the image is already
+    # built from scratch with the *exact SAME source code* on the exact Dockerfile
+    hash_runtime_image_name = f'{runtime_image_repo}:{from_scratch_hash}'
+
+    # non-hash generic image name, it could contains *similar* dependencies
+    # but *might* not exactly match the state of the source code.
+    # It resembles the "latest" tag in the docker image naming convention for
+    # a particular {repo}:{tag} pair (e.g., ubuntu:latest -> od_runtime:ubuntu_tag_latest)
+    # we will build from IT to save time if the `from_scratch_hash` is not found
+    generic_runtime_image_name = f'{runtime_image_repo}:{runtime_image_tag}'
+
+    # 1. If the image exists with the same hash, we will reuse it as is
+    if _check_image_exists(hash_runtime_image_name, docker_client):
        logger.info(
-            f'Using existing od_runtime image [{base_image}]. Will NOT build a new image.'
+            f'Image [{hash_runtime_image_name}] exists with matched hash for Docker build folder.\n'
+            'Will reuse it as is.'
        )
-    else:
-        logger.info(f'New image name: {new_image_name}')
+        return hash_runtime_image_name

-    # Ensure new_image_name contains a colon
-    if ':' not in new_image_name:
-        raise ValueError(
-            f'Invalid image name: {new_image_name}. Expected format "repository:tag".'
+    # 2. If the exact hash is not found, we will FIRST try to re-build it
+    # by leveraging the non-hash `generic_runtime_image_name` to save some time
+    # from re-building the dependencies (e.g., poetry install, apt install)
+    elif _check_image_exists(generic_runtime_image_name, docker_client):
+        logger.info(
+            f'Cannot find matched hash for image [{hash_runtime_image_name}]\n'
+            f'Will try to re-build it from latest [{generic_runtime_image_name}] image to potentially save '
+            f'time for dependencies installation.\n'
        )

-    # Detect if the sandbox image is built
-    image_exists = _check_image_exists(new_image_name, docker_client)
-    if image_exists:
-        logger.info(f'Image {new_image_name} exists')
+        cur_docker_build_folder = docker_build_folder or tempfile.mkdtemp()
+        _skip_init_hash = prep_docker_build_folder(
+            cur_docker_build_folder,
+            # we want to use the existing generic image as base
+            # so that we can leverage existing dependencies already installed in the image
+            base_image=generic_runtime_image_name,
+            skip_init=True,  # skip init since we are re-using the existing image
+            extra_deps=extra_deps,
+        )
+        assert (
+            _skip_init_hash != from_scratch_hash
+        ), f'The skip_init hash [{_skip_init_hash}] should not match the existing hash [{from_scratch_hash}]'
+
+        if not dry_run:
+            _build_sandbox_image(
+                docker_folder=cur_docker_build_folder,
+                docker_client=docker_client,
+                target_image_repo=runtime_image_repo,
+                # NOTE: WE ALWAYS use the "from_scratch_hash" tag for the target image
+                # otherwise, even if the source code is exactly the same, the image *might* be re-built
+                # because the same source code will generate different hash when skip_init=True/False
+                # since the Dockerfile is slightly different
+                target_image_hash_tag=from_scratch_hash,
+                target_image_tag=runtime_image_tag,
+            )
+        else:
+            logger.info(
+                f'Dry run: Skipping image build for [{generic_runtime_image_name}]'
+            )
+        if docker_build_folder is None:
+            shutil.rmtree(cur_docker_build_folder)
+
+    # 3. If the image is not found AND we cannot re-use the non-hash latest relavant image,
+    # we will build it completely from scratch
    else:
-        logger.info(f'Image {new_image_name} does not exist')
+        cur_docker_build_folder = docker_build_folder or tempfile.mkdtemp()
+        _new_from_scratch_hash = prep_docker_build_folder(
+            cur_docker_build_folder,
+            base_image,
+            skip_init=False,
+            extra_deps=extra_deps,
+        )
+        assert (
+            _new_from_scratch_hash == from_scratch_hash
+        ), f'The new from scratch hash [{_new_from_scratch_hash}] does not match the existing hash [{from_scratch_hash}]'

-    skip_init = False
-    if image_exists and not update_source_code:
-        # If (1) Image exists & we are not updating the source code, we can reuse the existing production image
-        logger.info('No image build done (not updating source code)')
-        return new_image_name
+        if not dry_run:
+            _build_sandbox_image(
+                docker_folder=cur_docker_build_folder,
+                docker_client=docker_client,
+                target_image_repo=runtime_image_repo,
+                # NOTE: WE ALWAYS use the "from_scratch_hash" tag for the target image
+                target_image_hash_tag=from_scratch_hash,
+                target_image_tag=runtime_image_tag,
+            )
+        else:
+            logger.info(
+                f'Dry run: Skipping image build for [{generic_runtime_image_name}]'
+            )

-    elif image_exists and update_source_code:
-        # If (2) Image exists & we plan to update the source code (in dev mode), we need to rebuild the image
-        # and give it a special name
-        # e.g., od_runtime:ubuntu_tag_latest -> od_runtime_dev:ubuntu_tag_latest
-        logger.info('Image exists, but updating source code requested')
-        base_image = new_image_name
-        new_image_name = get_new_image_name(base_image, dev_mode=True)
+        if docker_build_folder is None:
+            shutil.rmtree(cur_docker_build_folder)

-        skip_init = True  # since we only need to update the source code
-
-    else:
-        # If (3) Image does not exist, we need to build it from scratch
-        # e.g., ubuntu:latest -> od_runtime:ubuntu_tag_latest
-        # This snippet would allow to load from archive:
-        # tar_path = f'{new_image_name.replace(":", "_")}.tar'
-        # if os.path.exists(tar_path):
-        #     logger.info(f'Loading image from {tar_path}')
-        #     load_command = ['docker', 'load', '-i', tar_path]
-        #     subprocess.run(load_command, check=True)
-        #     logger.info(f'Image {new_image_name} loaded from {tar_path}')
-        #     return new_image_name
-        skip_init = False
-
-    if not skip_init:
-        logger.info(f'Building image [{new_image_name}] from scratch')
-
-    new_image_name = _build_sandbox_image(
-        base_image,
-        new_image_name,
-        docker_client,
-        skip_init=skip_init,
-        extra_deps=extra_deps,
-    )
-
-    # Only for development: allow to save image as archive:
-    if not image_exists and save_to_local_store:
-        tar_path = f'{new_image_name.replace(":", "_")}.tar'
-        save_command = ['docker', 'save', '-o', tar_path, new_image_name]
-        subprocess.run(save_command, check=True)
-        logger.info(f'Image saved to {tar_path}')
-
-        load_command = ['docker', 'load', '-i', tar_path]
-        subprocess.run(load_command, check=True)
-        logger.info(f'Image {new_image_name} loaded back into Docker from {tar_path}')
-
-    return new_image_name
+    return f'{runtime_image_repo}:{from_scratch_hash}'


 if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--base_image', type=str, default='ubuntu:22.04')
-    parser.add_argument('--update_source_code', action='store_true')
-    parser.add_argument('--save_to_local_store', action='store_true')
    parser.add_argument('--build_folder', type=str, default=None)
    args = parser.parse_args()

@ -353,31 +364,40 @@ if __name__ == '__main__':
        logger.info(
            f'Will prepare a build folder by copying the source code and generating the Dockerfile: {build_folder}'
        )
-        new_image_path = get_new_image_name(args.base_image)
-        dir_hash = prep_docker_build_folder(
-            build_folder, args.base_image, skip_init=args.update_source_code
+        runtime_image_repo, runtime_image_tag = get_runtime_image_repo_and_tag(
+            args.base_image
        )
-        new_image_name, new_image_tag = new_image_path.split(':')
+        with tempfile.TemporaryDirectory() as temp_dir:
+            runtime_image_hash_name = build_runtime_image(
+                args.base_image,
+                docker_client=docker.from_env(),
+                docker_build_folder=temp_dir,
+                dry_run=True,
+            )
+            _runtime_image_repo, runtime_image_hash_tag = runtime_image_hash_name.split(
+                ':'
+            )
+            # Move contents of temp_dir to build_folder
+            shutil.copytree(temp_dir, build_folder, dirs_exist_ok=True)
+        logger.info(
+            f'Build folder [{build_folder}] is ready: {os.listdir(build_folder)}'
+        )
+
        with open(os.path.join(build_folder, 'config.sh'), 'a') as file:
            file.write(
                (
                    f'\n'
-                    f'DOCKER_IMAGE={new_image_name}\n'
-                    f'DOCKER_IMAGE_TAG={new_image_tag}\n'
-                    f'DOCKER_IMAGE_HASH_TAG={dir_hash}\n'
+                    f'DOCKER_IMAGE={runtime_image_repo}\n'
+                    f'DOCKER_IMAGE_TAG={runtime_image_tag}\n'
+                    f'DOCKER_IMAGE_HASH_TAG={runtime_image_hash_tag}\n'
                )
            )
        logger.info(
-            f'`config.sh` is updated with the new image name [{new_image_name}] and tag [{new_image_tag}]'
+            f'`config.sh` is updated with the new image name [{runtime_image_repo}] and tag [{runtime_image_tag}, {runtime_image_hash_tag}]'
        )
        logger.info(f'Dockerfile and source distribution are ready in {build_folder}')
    else:
        logger.info('Building image in a temporary folder')
        client = docker.from_env()
-        image_name = build_runtime_image(
-            args.base_image,
-            client,
-            update_source_code=args.update_source_code,
-            save_to_local_store=args.save_to_local_store,
-        )
+        image_name = build_runtime_image(args.base_image, client)
        print(f'\nBUILT Image: {image_name}\n')
--- a/tests/unit/test_runtime_build.py
+++ b/tests/unit/test_runtime_build.py
@ -1,23 +1,23 @@
 import os
 import tempfile
 from importlib.metadata import version
-from unittest.mock import ANY, MagicMock, patch
+from unittest.mock import ANY, MagicMock, call, patch

 import pytest
 import toml
 from pytest import TempPathFactory

 from opendevin.runtime.utils.runtime_build import (
+    RUNTIME_IMAGE_REPO,
    _generate_dockerfile,
    _get_package_version,
    _put_source_code_to_dir,
    build_runtime_image,
-    get_new_image_name,
+    get_runtime_image_repo_and_tag,
    prep_docker_build_folder,
 )

 OD_VERSION = f'od_v{_get_package_version()}'
-RUNTIME_IMAGE_PREFIX = 'od_runtime'


@pytest.fixture
@ -170,69 +170,40 @@ def test_generate_dockerfile_skip_init():
    )


-def test_get_new_image_name_eventstream():
+def test_get_runtime_image_repo_and_tag_eventstream():
    base_image = 'debian:11'
-    new_image_name = get_new_image_name(base_image)
-    assert new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
+    img_repo, img_tag = get_runtime_image_repo_and_tag(base_image)
+    assert (
+        img_repo == f'{RUNTIME_IMAGE_REPO}'
+        and img_tag == f'{OD_VERSION}_image_debian_tag_11'
+    )

    base_image = 'ubuntu:22.04'
-    new_image_name = get_new_image_name(base_image)
+    img_repo, img_tag = get_runtime_image_repo_and_tag(base_image)
    assert (
-        new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_22.04'
+        img_repo == f'{RUNTIME_IMAGE_REPO}'
+        and img_tag == f'{OD_VERSION}_image_ubuntu_tag_22.04'
    )

    base_image = 'ubuntu'
-    new_image_name = get_new_image_name(base_image)
+    img_repo, img_tag = get_runtime_image_repo_and_tag(base_image)
    assert (
-        new_image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_latest'
+        img_repo == f'{RUNTIME_IMAGE_REPO}'
+        and img_tag == f'{OD_VERSION}_image_ubuntu_tag_latest'
    )


-def test_get_new_image_name_eventstream_dev_mode():
-    base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
-    new_image_name = get_new_image_name(base_image, dev_mode=True)
-    assert (
-        new_image_name == f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_debian_tag_11'
-    )
-
-    base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_22.04'
-    new_image_name = get_new_image_name(base_image, dev_mode=True)
-    assert (
-        new_image_name
-        == f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_ubuntu_tag_22.04'
-    )
-
-    base_image = f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_ubuntu_tag_latest'
-    new_image_name = get_new_image_name(base_image, dev_mode=True)
-    assert (
-        new_image_name
-        == f'{RUNTIME_IMAGE_PREFIX}_dev:{OD_VERSION}_image_ubuntu_tag_latest'
-    )
-
-
-def test_get_new_image_name_eventstream_dev_invalid_base_image():
-    with pytest.raises(ValueError):
-        base_image = 'debian:11'
-        get_new_image_name(base_image, dev_mode=True)
-
-    with pytest.raises(ValueError):
-        base_image = 'ubuntu:22.04'
-        get_new_image_name(base_image, dev_mode=True)
-
-    with pytest.raises(ValueError):
-        base_image = 'ubuntu:latest'
-        get_new_image_name(base_image, dev_mode=True)
-
-
@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
 def test_build_runtime_image_from_scratch(mock_docker_client, temp_dir):
    base_image = 'debian:11'
+
    mock_docker_client.images.list.return_value = []
+
    # for image.tag(target_repo, target_image_tag)
    mock_image = MagicMock()
    mock_docker_client.images.get.return_value = mock_image

-    dir_hash = prep_docker_build_folder(
+    from_scratch_hash = prep_docker_build_folder(
        temp_dir,
        base_image,
        skip_init=False,
@ -242,67 +213,82 @@ def test_build_runtime_image_from_scratch(mock_docker_client, temp_dir):

    # The build call should be called with the hash tag
    mock_docker_client.api.build.assert_called_once_with(
-        path=ANY,
-        tag=f'{RUNTIME_IMAGE_PREFIX}:{dir_hash}',
-        rm=True,
-        decode=True,
-        nocache=False,
+        path=ANY, tag=f'{RUNTIME_IMAGE_REPO}:{from_scratch_hash}', rm=True, decode=True
    )
    # Then the hash tag should be tagged to the version
    mock_image.tag.assert_called_once_with(
-        f'{RUNTIME_IMAGE_PREFIX}', f'{OD_VERSION}_image_debian_tag_11'
+        f'{RUNTIME_IMAGE_REPO}', f'{OD_VERSION}_image_debian_tag_11'
    )
-    assert image_name == f'{RUNTIME_IMAGE_PREFIX}:{dir_hash}'
+    assert image_name == f'{RUNTIME_IMAGE_REPO}:{from_scratch_hash}'


@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
-def test_build_runtime_image_exist_no_update_source(mock_docker_client):
+def test_build_runtime_image_exact_hash_exist(mock_docker_client, temp_dir):
    base_image = 'debian:11'
+
+    from_scratch_hash = prep_docker_build_folder(
+        temp_dir,
+        base_image,
+        skip_init=False,
+    )
+
    mock_docker_client.images.list.return_value = [
-        MagicMock(tags=[f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'])
+        MagicMock(tags=[f'{RUNTIME_IMAGE_REPO}:{from_scratch_hash}'])
    ]

    image_name = build_runtime_image(base_image, mock_docker_client)
-    assert image_name == f'{RUNTIME_IMAGE_PREFIX}:{OD_VERSION}_image_debian_tag_11'
-
+    assert image_name == f'{RUNTIME_IMAGE_REPO}:{from_scratch_hash}'
    mock_docker_client.api.build.assert_not_called()


+@patch('opendevin.runtime.utils.runtime_build._build_sandbox_image')
@patch('opendevin.runtime.utils.runtime_build.docker.DockerClient')
-def test_build_runtime_image_exist_with_update_source(mock_docker_client, temp_dir):
+def test_build_runtime_image_exact_hash_not_exist(
+    mock_docker_client, mock_build_sandbox_image, temp_dir
+):
    base_image = 'debian:11'
-    expected_new_image_tag = f'{OD_VERSION}_image_debian_tag_11'
-    od_runtime_base_image = f'{RUNTIME_IMAGE_PREFIX}:{expected_new_image_tag}'
+    repo, latest_image_tag = get_runtime_image_repo_and_tag(base_image)
+    latest_image_name = f'{repo}:{latest_image_tag}'

-    mock_docker_client.images.list.return_value = [
-        MagicMock(tags=[od_runtime_base_image])
-    ]
-    # for image.tag(target_repo, target_image_tag)
-    mock_image = MagicMock()
-    mock_docker_client.images.get.return_value = mock_image
-
-    # call the function to get the dir_hash to calculate the new image name
-    dir_hash = prep_docker_build_folder(
+    from_scratch_hash = prep_docker_build_folder(
        temp_dir,
-        od_runtime_base_image,
-        skip_init=True,
+        base_image,
+        skip_init=False,
    )
+    with tempfile.TemporaryDirectory() as temp_dir_2:
+        non_from_scratch_hash = prep_docker_build_folder(
+            temp_dir_2,
+            base_image,
+            skip_init=True,
+        )

-    # actual call to build the image
-    image_name = build_runtime_image(
-        base_image, mock_docker_client, update_source_code=True
-    )
+    # latest image exists BUT not the exact hash
+    mock_docker_client.images.list.return_value = [MagicMock(tags=[latest_image_name])]

-    # check the build call
-    mock_docker_client.api.build.assert_called_once_with(
-        path=ANY,
-        tag=f'{RUNTIME_IMAGE_PREFIX}_dev:{dir_hash}',
-        rm=True,
-        decode=True,
-        nocache=True,
-    )
-    # Then check the hash tag should be tagged to expected image tag
-    mock_image.tag.assert_called_once_with(
-        f'{RUNTIME_IMAGE_PREFIX}_dev', expected_new_image_tag
-    )
-    assert image_name == f'{RUNTIME_IMAGE_PREFIX}_dev:{dir_hash}'
+    with patch(
+        'opendevin.runtime.utils.runtime_build.prep_docker_build_folder'
+    ) as mock_prep_docker_build_folder:
+        mock_prep_docker_build_folder.side_effect = [
+            from_scratch_hash,
+            non_from_scratch_hash,
+        ]
+
+        image_name = build_runtime_image(base_image, mock_docker_client)
+
+        mock_prep_docker_build_folder.assert_has_calls(
+            [
+                call(ANY, base_image=base_image, skip_init=False, extra_deps=None),
+                call(
+                    ANY, base_image=latest_image_name, skip_init=True, extra_deps=None
+                ),
+            ]
+        )
+
+        mock_build_sandbox_image.assert_called_once_with(
+            docker_folder=ANY,
+            docker_client=mock_docker_client,
+            target_image_repo=repo,
+            target_image_hash_tag=from_scratch_hash,
+            target_image_tag=latest_image_tag,
+        )
+        assert image_name == f'{repo}:{from_scratch_hash}'