41 Commits

Author SHA1 Message Date
John Eismeier
967e9e1891
Propose fix some typos and ignore emacs backup files (#11701)
Signed-off-by: John E <jeis4wpi@outlook.com>
2025-11-11 09:20:42 -05:00
Xingyao Wang
ca424ec15d
[agent] Add LLM risk analyzer (#9349)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: llamantino <213239228+llamantino@users.noreply.github.com>
Co-authored-by: mamoodi <mamoodiha@gmail.com>
Co-authored-by: Tim O'Farrell <tofarr@gmail.com>
Co-authored-by: Hiep Le <69354317+hieptl@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ryan H. Tran <descience.thh10@gmail.com>
Co-authored-by: Neeraj Panwar <49247372+npneeraj@users.noreply.github.com>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
Co-authored-by: Insop <1240382+insop@users.noreply.github.com>
Co-authored-by: test <test@test.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Zhonghao Jiang <zhonghao.J@outlook.com>
Co-authored-by: Ray Myers <ray.myers@gmail.com>
2025-08-22 14:02:36 +00:00
Xingyao Wang
4507a25b85
Evaluation: redirect sessions to repo-local .eval_sessions via helper; apply across entrypoints; add tests (#10540)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-22 13:34:02 +00:00
Graham Neubig
426350224b
Add Playwright-based end-to-end testing workflow (#10116)
Co-authored-by: openhands <openhands@all-hands.dev>
2025-08-14 18:59:06 +00:00
ManOwnFire
9e72b69cf8
fix (cli): issue 9386 - show settings.json path in /settings (#9481) 2025-07-10 14:59:06 +00:00
tofarr
3c977bd715
Fix for nested mount volumes (#8888) 2025-06-04 09:30:57 -06:00
Engel Nyst
3c51600260
Add vscode rules/ignores to .gitignore (#8755) 2025-05-28 15:42:11 +02:00
Kent Johnson
35d2281717
feat: Add dev container (#8589) 2025-05-26 21:35:27 -04:00
Nan Jiang
463d4e9a46
eval: add commit0 benchmark (#5153)
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2024-11-22 19:49:45 +00:00
Graham Neubig
a753babb7a
Integrate OpenHands resolver into main repository (#4964)
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: Rohit Malhotra <rohitvinodmalhotra@gmail.com>
2024-11-14 09:45:46 -05:00
Ziru "Ron" Chen
db4e1dbbec
[eval] Add ScienceAgentBench. (#4645)
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
2024-11-01 02:30:55 +08:00
tobitege
6471d0f94d
.gitignore: ignore all node_modules folders (#4491) 2024-10-20 09:17:45 +08:00
sp.wack
bfdd7fd620
feat(frontend): UI overhaul (#3604) 2024-10-07 23:15:38 +04:00
Xingyao Wang
47774e60b0
chore: remove deprecated dockerfile (#4079) 2024-09-27 15:03:23 +00:00
tobitege
c32cec7f89
(enh) send status messages to UI during startup (#3771)
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Robert Brennan <contact@rbren.io>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-09-24 18:46:58 +00:00
Xingyao Wang
8f0f764a85
fix: CI docker image push (#3476)
* fix ghcr app

* fix ghcr runtime push

* rename od_runtime to runtime
2024-08-19 20:53:28 +00:00
Xingyao Wang
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
* move multi-line bash tests to test_runtime;
support multi-line bash for esruntime;

* add testcase to handle PS2 prompt

* use bashlex for bash parsing to handle multi-line commands;
add testcases for multi-line commands

* revert ghcr runtime change

* Apply stash

* fix run as other user;
make test async;

* fix test runtime for run as od

* add run-as-devin to all the runtime tests

* handle the case when username is root

* move all run-as-devin tests from sandbox;
only tests a few cases on different user to save time;

* move over multi-line echo related tests to test_runtime

* fix user-specific jupyter by fixing the pypoetry virtualenv folder

* make plugin's init async;
chdir at initialization of jupyter plugin;
move ipy simple testcase to test runtime;

* support agentskills import in
move tests for jupyter pwd tests;
overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter;
make agentskills read env var lazily, in case env var is updated;

* fix ServerRuntime agentskills issue

* move agnostic image test to test_runtime

* merge runtime tests in CI

* fix enable auto lint as env var

* update warning message

* update warning message

* test for different container images

* change parsing output as debug

* add exception handling for update_pwd_decorator

* fix unit test indentation

* add plugins as default input to Runtime class;
remove init_sandbox_plugins;
implement add_env_var (include jupyter) in the base class;

* fix server runtime auto lint

* Revert "add exception handling for update_pwd_decorator"

This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50.

* tries to print debugging info for agentskills

* explictly setting uid (try fix permission issue)

* Revert "tries to print debugging info for agentskills"

This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de.

* set sandbox user id during testing to hopefully fix the permission issue

* add browser tools for server runtime

* try to debug for old pwd

* update debug cmd

* only test agnostic runtime when TEST_RUNTIME is Server

* fix temp dir mkdir

* load TEST_RUNTIME at the beginning

* remove ipython tests

* only log to file when DEBUG

* default logging to project root

* temporarily remove log to file

* fix LLM logger dir

* fix logger

* make set pwd an optional aux action

* fix prev pwd

* fix infinity recursion

* simplify

* do not import the whole od library to avoid logger folder by jupyter

* fix browsing

* increase timeout

* attempt to fix agentskills yet again

* clean up in testcases, since CI maybe run as non-root

* add _cause attribute for event.id

* remove parent

* add a bunch of debugging statement again for CI :(

* fix temp_dir fixture

* change all temp dir to follow pytest's tmp_path_factory

* remove extra bracket

* clean up error printing a bit

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* add typing for tmp dir fixture

* clear the directory before running the test to avoid weird CI temp dir

* remove agnostic test case for server runtime

* Revert "remove agnostic test case for server runtime"

This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3.

* disable agnostic tests in CI

* fix test

* make sure plugin arg is not passed when no plugin is specified;
remove redundant on_event function;

* move mock prompt

* rename runtime

* remove extra logging

* refactor run_controller's interface;
support multiple runtime for integration test;
filter out hostname for prompt

* uncomment other tests

* pass the right runtime to controller

* log runtime when start

* uncomment tests

* improve symbol filters

* add intergration test prompts that seemd ok

* add integration test workflow

* add python3 to default ubuntu image

* symlink python and fix permission to jupyter pip

* add retry for jupyter execute server

* fix jupyter pip install;
add post-process for jupyter pip install;
simplify init by add agent_skills path to PYTHONPATH;
add testcase to tests jupyter pip install;

* fix bug

* use ubuntu:22.04 for eventstream integration tests

* add todo

* update testcase

* remove redundant code

* fix unit test

* reduce dependency for runtime

* try making llama-index an optional dependency that's not installed by default

* remove pip install since it seemd not needed

* log ipython execution;
await write message since it returns a future

* update ipy testcase

* do not install llama-index in CI

* do not install llama-index in the app docker as well

* set sandbox container image in the integration test script

* log plugins & env var for runtime

* update conftest for sha256

* add git

* remove all non-alphanumeric chalracters

* add working ipy module tests!

* default to use host network

* remove is_async from browser to make thing a little more reliable;
retry loading browser when error;

* add sleep to wait a bit for http server

* kill http server before regenerate browsing tests

* fix browsing

* only set sandbox container image if undefined

* skip empty config value

* update evaluation to use the latest run_controller

* revert logger in execute_server to be compatible with server runtime

* revert logging level to fix jupyter

* set logger level

* revert the logging

* chmod for workspace to fix permission

* support getting timeout from action

* update test for server runtime

* try to fix file permission

* fix test_cmd_run_action_serialization_deserialization test (added timeout)

* poetry: pip 24.2, torch 2.2.2

* revert adding pip to pyproject.toml

* add build to dependencies in pyproject.toml

* forgot poetry lock --no-update

* fix a DelegatorAgent prompt_002.log (timeout)

* fix a DelegatorAgent prompt_003.log (timeout)

* couple more timeout attribs in prompt files

* some more prompt files

* prompts galore

* add clarification comment for timeout

* default timeout to config

* add assert

* update integraton tests for eventstream

* update integration tests

* fix timeout for action<->dict

* remove redundant on_event

* default to use instance image

* update run_controller interface

* add logging for copy

* refactor swe_bench for the new design

* fix action execution timeout

* updatelock

* remove build sandbox locally

* fix runtime

* use plain for-loop for single process

* remove extra print

* get swebench inference working

* print whole `test_result` dict

* got swebench patch post-process working

* update swe-bench evaluation readme

* refactor using shared reset_logger function

* move messy swebench prompt to a different file

* support the ability to specify whether to keep prompt

* support the ability to specify whether to keep prompt

* fix dockerfile

* fix import and remove unnecessary strip logic

* fix action serialization

* get agentbench running

* remove extra ls for agent bench

* fix agentbench metric

* factor out common documentation for eval

* update biocoder doc

* remove swe_env_box since it is no longer needed

* get biocoder working

* add func timeout for bird

* fix jupyter pwd with ~ as user name

* fix jupyter pwd with ~ as user name

* get bird working

* get browsing evaluation working

* make eda runnable

* fix id column

* fix eda run_infer

* unify eval output using a structured format;
make swebench coompatible with that format;
update client source code for every swebench run;
do not inject testcmd for swebench

* standardize existing benchs for the new eval output

* set update source code = true

* get gaia standardized

* fix gaia

* gorilla refactored but stuck at language.so to test

* refactor and make gpqa work

* refactor humanevalfix and get it working

* refactor logic reasoning and get it working

* refactor browser env so it works with eventstream runtime for eval

* add initial version of miniwob refactor

* fix browsergym environment

* get miniwob working!!

* allowing injecting additional dependency to OD runtime docker image

* allowing injecting additional dependency to OD runtime docker image

* support logic reasoning with pre-injected dependency

* get mint working

* update runtime build

* fix mint docker

* add test for keep_prompt;
add missing await close for some tests

* update integration tests for eventstream runtime

* fix integration tests for server runtime

* refactor ml bench and toolqa

* refactor webarena

* fix default factory

* Update run_infer.py

* add APIError to retry

* increase timeout for swebench

* make sure to hide api key when dump eval output

* update the behavior of put source code to put files instead of tarball

* add dishash to dependency

* sendintr when timeout

* fix dockerfile copy

* reduce timeout

* use dirhash to avoid repeat building for update source

* fix runtime_build testcase

* add dir_hash to docker build pipeline

* revert api error

* update poetry lock

* add retries for swebench run infer

* fix git patch

* update poetry lock

* adjust config order

* fix mount volumns

* enforce all eval to use "instance_id"

* remove file store from runtime

* make file_store public inside eventstream

* move the runtime logic inside `main` out

* support using async function for process_instance_fn

* refactor run_infer with the create_time

* fix file store

* Update evaluation/toolqa/utils.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix typo

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: super-dainiu <78588128+super-dainiu@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-08-06 17:21:45 +00:00
Xingyao Wang
405c8a0456
[Arch] Add runtime image build CI & clean up runtime build using jinja2 template (#3055)
* test_runtime_client.py to test _execute_bash()

* runtime_build and runtime tweaks

* fix in docker script

* revert bash changes

* use sandbox_config.update_source_code to control source code update

* add od_version to the sandbox tag

* add doc instruction for update source code

* do not remove whole poetry folder;
add mamba clean

* add missing newlines

* cleanup runtime dockerfile into jinja template

* make prep temp file a separate function;
make that function accessible through cli

* modify `runtime_build.py` so it can generate directory for building docker img

* add dockerfile and sdist of runtime to gitignore since it will be dynamically generated

* add runtime to build

* do not rebuild new image when an `od_runtime` is provided

* use default container_image for testing if possible

* move runtime tests to ghcr runtime workflow

* update docker base dir for runtime

* fix unittest

* fix image name

* fix image name for test case

* rename to make it consistent

---------

Co-authored-by: tobitege <tobitege@gmx.de>
2024-07-24 21:56:12 +08:00
Xingyao Wang
ce8a11a62f
[Arch] Shrink runtime image size (#3051)
* test_runtime_client.py to test _execute_bash()

* runtime_build and runtime tweaks

* fix in docker script

* revert bash changes

* use sandbox_config.update_source_code to control source code update

* add od_version to the sandbox tag

* add doc instruction for update source code

* do not remove whole poetry folder;
add mamba clean

* add missing newlines

---------

Co-authored-by: tobitege <tobitege@gmx.de>
2024-07-22 02:34:45 +08:00
Xingyao Wang
6a0ffc5c61
[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728)
* add newline after patch to fix patch apply

* new swebench wip

* add newline after patch to fix patch apply

* only add newline if not empty

* update swebench source and update

* update gitignore for swebench eval

* update old prep_eval

* update gitignore

* add scripts for push and pull swebench images

* update eval_infer.sh

* update eval_infer for new docker workflow

* update script to create markdown report based on report.json

* update eval infer to use update output

* update readme

* only move result to folder if running whole file

* remove set-x

* update conversion script

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

* make sure last line end with newline

* switch to an fix attempt branch of swebench

* Update evaluation/swe_bench/README.md

* Update evaluation/swe_bench/README.md

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-07-01 23:58:30 +00:00
Shimada666
26fc3c886a
Make plugins sandbox-agnostic (#2101)
* tmp

* tmp

* merge main

* feat: auto build image cache

* remove plugins

* use config file

* update mamba setup shell

* support agnostic sandbox image autobuild

* remove config

* Update .gitignore

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* Update opendevin/runtime/docker/ssh_box.py

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>

* update setup.sh

* readd sudo

* add sudo in dockerfile

* remove export

* move od-runtime dependencies to sandbox dockerfile

* factor out re-build logic into a separate util file

* tweak existing plugin to use OD specific sandbox

* update testcase

* attempt to fix unit test using image built in ghcr

* use cache tag

* try to fix unit tests

* add unittest

* add unittest

* add some unittests

* revert gh workflow changes

* feat: optimize sandbox image naming rule

* add pull latest image hint

* add opendevin python hint and use mamba to install gcc

* update docker image naming rule and fix mamba issue

* Update opendevin/runtime/docker/ssh_box.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* fix: opendevin user use correct pip

* fix lint issue

* fix custom sandbox base image

* rename test name

* add skipif

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: tobitege <tobitege@gmx.de>
2024-06-19 19:58:07 -07:00
tobitege
b431fce938
tests: more Agentskills tests; updated .gitignore (#2307)
* added tests related to backticks

* updated .gitignore

* added extra linter test for #2210

* hotfix for integration test

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-06-07 16:29:03 +00:00
Frank Xu
48151bdbb0
[feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170)
* add webarena, and revamp messaging for webarena eval

* add changes for browsergym

* update infer script

* fix unit tests

* update

* add multiple run for miniwob

* update instruction, remove personal path

* update

* add code for getting final reward, fix integration, add results

* add avg cost calculation
2024-06-06 09:01:20 +08:00
Xingyao Wang
2c0a2dbc61
fix yet another swe_bench issue (#2069) 2024-05-26 10:01:43 -07:00
Engel Nyst
46352e890b
Logging security (#1943)
* update .gitignore

* Rename the confusing 'INFO' style to 'DETAIL'

* override str and repr

* feat: api_key desensitize

* feat: add SensitiveDataFilter in file handler

* tweak regex, add tests

* more tweaks, include other attrs

* add env vars, those with equivalent config

* fix tests

* tests are invaluable

---------

Co-authored-by: Shimada666 <649940882@qq.com>
2024-05-22 18:27:38 +02:00
Xingyao Wang
2406b901df
feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
* add draft dockerfile for build all

* add rsync for build

* add all-in-one docker

* update prepare scripts

* Update swe_env_box.py

* Add swe_entry.sh (buggy now)

* Parse the test command in swe_entry.sh

* Update README for instance eval in sandbox

* revert specialized config

* replace run_as_devin as an init arg

* set container & run_as_root via args

* update swe entry script

* update env

* remove mounting

* allow error after swe_entry

* update swe_env_box

* move file

* update gitignore

* get swe_env_box a working demo

* support faking user response & provide sandox ahead of time;
also return state for controller

* tweak main to support adding controller kwargs

* add module

* initialize plugin for provided sandbox

* add pip cache to plugin & fix jupyter kernel waiting

* better print Observation output

* add run infer scripts

* update readme

* add utility for getting diff patch

* use get_diff_patch in infer

* update readme

* support cost tracking for codeact

* add swe agent edit hack

* disable color in git diff

* fix git diff cmd

* fix state return

* support limit eval

* increase t imeout and export pip cache

* add eval limit config

* return state when hit turn limit

* save log to file; allow agent to give up

* run eval with max 50 turns

* add outputs to gitignore

* save swe_instance & instruction

* add uuid to swebench

* add streamlit dep

* fix save series

* fix the issue where session id might be duplicated

* allow setting temperature for llm (use 0 for eval)

* Get report from agent running log

* support evaluating task success right after inference.

* remove extra log

* comment out prompt for baseline

* add visualizer for eval

* use plaintext for instruction

* reduce timeout for all; only increase timeout for init

* reduce timeout for all; only increase timeout for init

* ignore sid for swe env

* close sandbox in each eval loop

* update visualizer instruction

* increase max chars

* add finish action to history too

* show test result in metrics

* add sidebars for visualizer

* also visualize swe_instance

* cleanup browser when agent controller finish runinng

* do not mount workspace for swe-eval to avoid accidentally overwrite files

* Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files"

This reverts commit 8ef77390543e562e6f0a5a9992418014d8b3010c.

* Revert "Revert "do not mount workspace for swe-eval to avoid accidentally overwrite files""

This reverts commit 016cfbb9f0475f32bacbad5822996b4eaff24a5e.

* run jupyter command via copy to, instead of cp to mount

* only print mixin output when failed

* change ssh box logging

* add visualizer for pass rate

* add instance id to sandbox name

* only remove container we created

* use opendevin logger in main

* support multi-processing infer

* add back metadata, support keyboard interrupt

* remove container with startswith

* make pbar behave correctly

* update instruction w/ multi-processing

* show resolved rate by repo

* rename tmp dir name

* attempt to fix racing for copy to ssh_box

* fix script

* bump swe-bench-all version

* fix ipython with self-contained commands

* add jupyter demo to swe_env_box

* make resolved count two column

* increase height

* do not add glob to url params

* analyze obs length

* print instance id prior to removal handler

* add gold patch in visualizer

* fix interactive git by adding a git --no-pager as alias

* increase max_char to 10k to cover 98% of swe-bench obs cases

* allow parsing note

* prompt v2

* add iteration reminder

* adjust user response

* adjust order

* fix return eval

* fix typo

* add reminder before logging

* remove other resolve rate

* re adjust to new folder structure

* support adding eval note

* fix eval note path

* make sure first log of each instance is printed

* add eval note

* fix the display for visualizer

* tweak visualizer for better git patch reading

* exclude empty patch

* add retry mechanism for swe_env_box start

* fix ssh timeout issue

* add stat field for apply test patch success

* add visualization for fine-grained report

* attempt to support monologue agent by constraining it to single thread

* also log error msg when stopeed

* save error as well

* override WORKSPACE_MOUNT_PATH and WORKSPACE_BASE for monologue to work in mp

* add retry mechanism for sshbox

* remove retry for swe env box

* try to handle loop state stopped

* Add get report scripts

* Add script to convert agent output to swe-bench format

* Merge fine grained report for visualizer

* Update eval readme

* Update README.md

* Add CodeAct gpt4-1106 output and eval logs on swe-bench-lite

* Update the script to get model report

* Update get_model_report.sh

* Update get_agent_report.sh

* Update report merge script

* Add agent output conversion script

* Update swe_lite_env_setup.sh

* Add example swe-bench output files

* Update eval readme

* Remove redundant scripts

* set iteration count down to false by default

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm (#1666)

* fix: Issue where CodeAct agent was trying to log cost on local llm and throwing Undefined Model execption out of litellm

* Review Feedback

* Missing None Check

* Review feedback and improved error handling

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>

* fix prepare_swe_util scripts

* update builder images

* update setup script

* remove swe-bench build workflow

* update lock

* remove experiments since they are moved to hf

* remove visualizer (since it is moved to hf repo)

* simply jupyter execution via heredoc

* update ssh_box

* add initial docker readme

* add pkg-config as dependency

* add script for swe_bench all-in-one docker

* add rsync to builder

* rename var

* update commit

* update readme

* update lock

* support specify timeout for long running tasks

* fix path

* separate building of all deps and files

* support returning states at the end of controller

* remove return None

* support specify timeout for long running tasks

* add timeout for all existing sandbox impl

* fix swe_env_box for new codebase

* update llm config in config.py

* support pass sandbox in

* remove force set

* update eval script

* fix issue of overriding final state

* change default eval output to hf demo

* change default eval output to hf demo

* fix config

* only close it when it is NOT external sandbox

* add scripts

* tweak config

* only put in hostory when state has history attr

* fix agent controller on the case of run out interaction budget

* always assume state is always not none

* remove print of final state

* catch all exception when cannot compute completion cost

* Update README.md

* save source into json

* fix path

* update docker path

* return the final state on close

* merge AgentState with State

* fix integration test

* merge AgentState with State

* fix integration test

* add ChangeAgentStateAction to history in attempt to fix integration

* add back set agent state

* update tests

* update tests

* move scripts for setup

* update script and readme for infer

* do not reset logger when n processes == 1

* update eval_infer scripts and readme

* simplify readme

* copy over dir after eval

* copy over dir after eval

* directly return get state

* update lock

* fix output saving of infer

* replace print with logger

* update eval_infer script

* add back the missing .close

* increase timeout

* copy all swe_bench_format file

* attempt to fix output parsing

* log git commit id as metadata

* fix eval script

* update lock

* update unit tests

* fix argparser unit test

* fix lock

* the deps are now lightweight enough to be incude in make build

* add spaces for tests

* add eval outputs to gitignore

* remove git submodule

* readme

* tweak git email

* update upload instruction

* bump codeact version for eval

---------

Co-authored-by: Bowen Li <libowen.ne@gmail.com>
Co-authored-by: huybery <huybery@gmail.com>
Co-authored-by: Bart Shappee <bshappee@gmail.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-05-15 16:15:55 +00:00
Robert Brennan
dcb5d1ce0a
Add permanent storage option for EventStream (#1697)
* add storage classes

* add minio

* add event stream storage

* storage test working

* use fixture

* event stream test passing

* better serialization

* factor out serialization pkg

* move more serialization

* fix tests

* fix test

* remove __all__

* add rehydration test

* add more rehydration test

* fix fixture

* fix dict init

* update tests

* lock

* regenerate tests

* Update opendevin/events/stream.py

* revert tests

* revert old integration tests

* only add fields if present

* regen tests

* pin pyarrow

* fix unit tests

* remove cause from memories

* revert tests

* regen tests
2024-05-14 11:09:45 -04:00
மனோஜ்குமார் பழனிச்சாமி
73693ba416
Mentioned LLM logs directory (#1587)
* Update bug_template.yml

* Pythonized

* updated configs type

* updated opendevin_logger

* fixed bool config

* fixed bool config
2024-05-09 13:31:14 -04:00
Robert Brennan
242c4a0df6
Remove extra message actions (#1608)
* remove extra actions

* remove message observations

* support null obs

* handle null obs

* fix frontend for changes

* fix the way messages flow to the UI

* change think to message

* add regen script

* regenerate all integration tests

* change task

* remove gh test

* fix messages

* fix tests

* help agent exit after hitting max iter

* Update opendevin/events/observation/success.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* Update agenthub/codeact_agent/codeact_agent.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-07 21:13:08 +00:00
Boxuan Li
e7b5ddfe06
Add integration test framework with mock llm (#1301)
* Add integration test framework with mock llm

* Fix MonologueAgent and PlannerAgent tests

* Remove adhoc logging

* Use existing logs

* Fix SWEAgent and PlannerAgent

* Check-in test log files

* conftest: look up under test name folder only

* Add docstring to conftest

* Finish dev doc

* Avoid non-determinism

* Remove dependency on llm embedding model

* Init embedding model only for MonologueAgent

* Add adhoc fix for sandbox discrepancy

* Test ssh and exec sandboxes

* CI: fix missing sandbox type

* conftest: Remove hack

* Reword comment for TODO
2024-04-25 10:56:53 -04:00
Leo
adbcfefd8c
feat: websocket connection management and sandbox bound to session. (#559)
* feat: websocket connection management and sandbox bound to session.

* fix: set default value to id

* feat: add session management.

* fix for mypy

* fix for mypy

* fix the pnpm-lock.

* fix the default model is empty will throw error.
2024-04-05 12:19:52 -05:00
Anas DORBANI
5ec0e5b7ec
Switch to Poetry (#378)
* create the pyproject file

* Fix the pyproject.toml file

* Update Makefile

* adapt makefile

* fix some execution issues

* Untrack lock files and wait for the backend to get start before frontend

* Remove LangChain dependencies

* Add github action for pytest

* add missing dependency

* rebase and fix the versions adding lock file

* add torch and pymupdfb deps

* some conflicts fixes

* Add dependencies evaluation group

* add poetry.lock

* Fix unexpected operator

---------

Co-authored-by: Robert Brennan <contact@rbren.io>
2024-04-05 00:27:29 +00:00
xcodebuild
d64383a520
fix: let make run output both backend and frontend (#576)
* fix: let make run output both backend and frontend

* fix: delete pipe on run
2024-04-02 20:54:16 +08:00
Alex Bäuerle
79237210f2
build(add-files-created-for-other-dev-envs-to-gitignore): Add files such as requirements.txt, .python-version, bun.lockb, and yarn.lock so that if anybody uses these systems, they don't accidentally push the files (#519) 2024-04-01 23:21:45 -04:00
Jim Su
b1b96df8a8
Replace environment variables with configuration file (#339)
* Replace environment variables with configuration file

* Add config.toml to .gitignore

* Remove unused os imports

* Update README.md

* Update README.md

* Update README.md

* Fix merge conflict

* Fallback to environment variables

* Use template file for config.toml

* Update config.toml.template

* Update config.toml.template

---------

Co-authored-by: Robert Brennan <accounts@rbren.io>
2024-03-29 15:26:20 -04:00
Robert Brennan
9bc1890d33
add debug dir for prompts (#205)
* add debug dir for prompts

* add indent to dumps

* only wrap completion in debug mode

* fix mypy
2024-03-27 12:40:08 -04:00
Xingyao Wang
5ff96111f0
A starting point for SWE-Bench Evaluation with docker (#60)
* a starting point for SWE-Bench evaluation with docker

* fix the swe-bench uid issue

* typo fixed

* fix conda missing issue

* move files based on new PR

* Update doc and gitignore using devin prediction file from #81

* fix typo

* add a sentence

* fix typo in path

* fix path

---------

Co-authored-by: Binyuan Hui <binyuan.hby@alibaba-inc.com>
2024-03-22 12:43:49 +08:00
Robert Brennan
b84463f512
Refactor agent interface a bit (#74)
* start moving files

* initial refactor

* factor out command management

* fix command runner

* add workspace to gitignore

* factor out command manager

* remove dupe add_event

* update docs

* fix init

* fix langchain agent after merge
2024-03-21 23:35:28 +08:00
Xingyao Wang
2de75d4782
Minimal Docker Sandbox with GPT-3.5 Execution Example (#48)
* minimal docker sandbox

* make container_image as an argument (fall back to ubuntu);
increase timeout to avoid return too early for long running commands;

* add a minimal working (imperfect) example

* fix typo

* change default container name

* attempt to fix "Bad file descriptor" error

* handle ctrl+D

* add Python gitignore

* push sandbox to shared dockerhub for ease of use

* move codeact example into research folder

* add README for opendevin

* change container image name to opendevin dockerhub

* move folder; change example to a more general agent

* update Message and Role

* update docker sandbox to support mounting folder and switch to user with correct permission

* make network as host

* handle erorrs when attrs are not set yet

* convert codeact agent into a compatible agent

* add workspace to gitignore

* make sure the agent interface adjustment works for langchain_agent
2024-03-21 21:54:56 +08:00
Binyuan Hui
a94f3d81cb
fix: merge multiple .gitignore to unify management (#61) 2024-03-20 21:35:51 +08:00
Xingyao Wang
dcff11cd2f
add Python gitignore (#59) 2024-03-20 16:17:16 +08:00