131 Commits

Author SHA1 Message Date
Xingyao Wang
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
* move multi-line bash tests to test_runtime;
support multi-line bash for esruntime;

* add testcase to handle PS2 prompt

* use bashlex for bash parsing to handle multi-line commands;
add testcases for multi-line commands

* revert ghcr runtime change

* Apply stash

* fix run as other user;
make test async;

* fix test runtime for run as od

* add run-as-devin to all the runtime tests

* handle the case when username is root

* move all run-as-devin tests from sandbox;
only tests a few cases on different user to save time;

* move over multi-line echo related tests to test_runtime

* fix user-specific jupyter by fixing the pypoetry virtualenv folder

* make plugin's init async;
chdir at initialization of jupyter plugin;
move ipy simple testcase to test runtime;

* support agentskills import in
move tests for jupyter pwd tests;
overload `add_env_vars` for EventStreamRuntime to update env var also in Jupyter;
make agentskills read env var lazily, in case env var is updated;

* fix ServerRuntime agentskills issue

* move agnostic image test to test_runtime

* merge runtime tests in CI

* fix enable auto lint as env var

* update warning message

* update warning message

* test for different container images

* change parsing output as debug

* add exception handling for update_pwd_decorator

* fix unit test indentation

* add plugins as default input to Runtime class;
remove init_sandbox_plugins;
implement add_env_var (include jupyter) in the base class;

* fix server runtime auto lint

* Revert "add exception handling for update_pwd_decorator"

This reverts commit 2b668b1506e02145cb8f87e321aad62febca3d50.

* tries to print debugging info for agentskills

* explictly setting uid (try fix permission issue)

* Revert "tries to print debugging info for agentskills"

This reverts commit 8be4c86756f0e3fc62957b327ba2ac4999c419de.

* set sandbox user id during testing to hopefully fix the permission issue

* add browser tools for server runtime

* try to debug for old pwd

* update debug cmd

* only test agnostic runtime when TEST_RUNTIME is Server

* fix temp dir mkdir

* load TEST_RUNTIME at the beginning

* remove ipython tests

* only log to file when DEBUG

* default logging to project root

* temporarily remove log to file

* fix LLM logger dir

* fix logger

* make set pwd an optional aux action

* fix prev pwd

* fix infinity recursion

* simplify

* do not import the whole od library to avoid logger folder by jupyter

* fix browsing

* increase timeout

* attempt to fix agentskills yet again

* clean up in testcases, since CI maybe run as non-root

* add _cause attribute for event.id

* remove parent

* add a bunch of debugging statement again for CI :(

* fix temp_dir fixture

* change all temp dir to follow pytest's tmp_path_factory

* remove extra bracket

* clean up error printing a bit

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* jupyter chdir to self.config.workspace_mount_path_in_sandbox on initialization

* add typing for tmp dir fixture

* clear the directory before running the test to avoid weird CI temp dir

* remove agnostic test case for server runtime

* Revert "remove agnostic test case for server runtime"

This reverts commit 30e2181c3fc1410e69596c2dcd06be01f1d016b3.

* disable agnostic tests in CI

* fix test

* make sure plugin arg is not passed when no plugin is specified;
remove redundant on_event function;

* move mock prompt

* rename runtime

* remove extra logging

* refactor run_controller's interface;
support multiple runtime for integration test;
filter out hostname for prompt

* uncomment other tests

* pass the right runtime to controller

* log runtime when start

* uncomment tests

* improve symbol filters

* add intergration test prompts that seemd ok

* add integration test workflow

* add python3 to default ubuntu image

* symlink python and fix permission to jupyter pip

* add retry for jupyter execute server

* fix jupyter pip install;
add post-process for jupyter pip install;
simplify init by add agent_skills path to PYTHONPATH;
add testcase to tests jupyter pip install;

* fix bug

* use ubuntu:22.04 for eventstream integration tests

* add todo

* update testcase

* remove redundant code

* fix unit test

* reduce dependency for runtime

* try making llama-index an optional dependency that's not installed by default

* remove pip install since it seemd not needed

* log ipython execution;
await write message since it returns a future

* update ipy testcase

* do not install llama-index in CI

* do not install llama-index in the app docker as well

* set sandbox container image in the integration test script

* log plugins & env var for runtime

* update conftest for sha256

* add git

* remove all non-alphanumeric chalracters

* add working ipy module tests!

* default to use host network

* remove is_async from browser to make thing a little more reliable;
retry loading browser when error;

* add sleep to wait a bit for http server

* kill http server before regenerate browsing tests

* fix browsing

* only set sandbox container image if undefined

* skip empty config value

* update evaluation to use the latest run_controller

* revert logger in execute_server to be compatible with server runtime

* revert logging level to fix jupyter

* set logger level

* revert the logging

* chmod for workspace to fix permission

* support getting timeout from action

* update test for server runtime

* try to fix file permission

* fix test_cmd_run_action_serialization_deserialization test (added timeout)

* poetry: pip 24.2, torch 2.2.2

* revert adding pip to pyproject.toml

* add build to dependencies in pyproject.toml

* forgot poetry lock --no-update

* fix a DelegatorAgent prompt_002.log (timeout)

* fix a DelegatorAgent prompt_003.log (timeout)

* couple more timeout attribs in prompt files

* some more prompt files

* prompts galore

* add clarification comment for timeout

* default timeout to config

* add assert

* update integraton tests for eventstream

* update integration tests

* fix timeout for action<->dict

* remove redundant on_event

* default to use instance image

* update run_controller interface

* add logging for copy

* refactor swe_bench for the new design

* fix action execution timeout

* updatelock

* remove build sandbox locally

* fix runtime

* use plain for-loop for single process

* remove extra print

* get swebench inference working

* print whole `test_result` dict

* got swebench patch post-process working

* update swe-bench evaluation readme

* refactor using shared reset_logger function

* move messy swebench prompt to a different file

* support the ability to specify whether to keep prompt

* support the ability to specify whether to keep prompt

* fix dockerfile

* fix import and remove unnecessary strip logic

* fix action serialization

* get agentbench running

* remove extra ls for agent bench

* fix agentbench metric

* factor out common documentation for eval

* update biocoder doc

* remove swe_env_box since it is no longer needed

* get biocoder working

* add func timeout for bird

* fix jupyter pwd with ~ as user name

* fix jupyter pwd with ~ as user name

* get bird working

* get browsing evaluation working

* make eda runnable

* fix id column

* fix eda run_infer

* unify eval output using a structured format;
make swebench coompatible with that format;
update client source code for every swebench run;
do not inject testcmd for swebench

* standardize existing benchs for the new eval output

* set update source code = true

* get gaia standardized

* fix gaia

* gorilla refactored but stuck at language.so to test

* refactor and make gpqa work

* refactor humanevalfix and get it working

* refactor logic reasoning and get it working

* refactor browser env so it works with eventstream runtime for eval

* add initial version of miniwob refactor

* fix browsergym environment

* get miniwob working!!

* allowing injecting additional dependency to OD runtime docker image

* allowing injecting additional dependency to OD runtime docker image

* support logic reasoning with pre-injected dependency

* get mint working

* update runtime build

* fix mint docker

* add test for keep_prompt;
add missing await close for some tests

* update integration tests for eventstream runtime

* fix integration tests for server runtime

* refactor ml bench and toolqa

* refactor webarena

* fix default factory

* Update run_infer.py

* add APIError to retry

* increase timeout for swebench

* make sure to hide api key when dump eval output

* update the behavior of put source code to put files instead of tarball

* add dishash to dependency

* sendintr when timeout

* fix dockerfile copy

* reduce timeout

* use dirhash to avoid repeat building for update source

* fix runtime_build testcase

* add dir_hash to docker build pipeline

* revert api error

* update poetry lock

* add retries for swebench run infer

* fix git patch

* update poetry lock

* adjust config order

* fix mount volumns

* enforce all eval to use "instance_id"

* remove file store from runtime

* make file_store public inside eventstream

* move the runtime logic inside `main` out

* support using async function for process_instance_fn

* refactor run_infer with the create_time

* fix file store

* Update evaluation/toolqa/utils.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* fix typo

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: super-dainiu <78588128+super-dainiu@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-08-06 17:21:45 +00:00
Kaushik Deka
415843476c
Feat: Add Vision Input Support for LLM with Vision Capabilities (#2848)
* add image feature

* fix-linting

* check model support for images

* add comment

* Add image support to other models

* Add images to chat

* fix linting

* fix test issues

* refactor variable names and import

* fix tests

* fix chat message tests

* fix linting

* add pydantic class message

* use message

* remove redundant comments

* remove redundant comments

* change Message class

* remove unintended change

* fix integration tests using regenerate.sh

* rename image_bas64 to images_url, fix tests

* rename Message.py to message, change reminder append logic, add unit tests

* remove comment, fix error to merge

* codeact_swe_agent

* fix f string

* update eventstream integration tests

* add missing if check in codeact_swe_agent

* update integration tests

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatInput.tsx

* Update frontend/src/components/chat/ChatMessage.tsx

---------

Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: sp.wack <83104063+amanape@users.noreply.github.com>
2024-08-04 02:26:22 +08:00
Engel Nyst
9ed95abf83
Fix max budget per task error in headless mode (#3147)
* set agent in ERROR instead of PAUSED when in headless mode

* fallback to config value for budget
2024-07-27 17:35:40 +00:00
Engel Nyst
a29c795418
clear last_error when restoring a session (#3146) 2024-07-27 15:34:36 +00:00
Engel Nyst
f07280153a
restore logging of user messages when using cli (#3145) 2024-07-27 12:58:23 +00:00
Graham Neubig
275ea706cf
Remove remaining global config (#3099)
* Remove global config from memory

* Remove runtime global config

* Remove from storage

* Remove global config

* Fix event stream tests

* Fix sandbox issue

* Change config

* Removed transferred tests

* Add swe env box

* Fixes on testing

* Fixed some tests

* Fix typing

* Fix ipython test

* Revive function

* Make temp_dir fixture

* Remove test to avoid circular import
2024-07-26 18:43:32 +00:00
Xingyao Wang
41a8bb3cf1
[eval,fix]: metrics get carried across eval instances (#3072)
* fix: make max_budget_per_task optional in `run_agent_controller`

* update arg for each run infer

* fix: metrics logging carried along; reset llm metric with the agent;

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-07-23 03:30:28 +00:00
Graham Neubig
4099e48122
Removed config from agent controller (#3038)
* Removed config from agent controller

* Fix tests

* Increase budget

* Update tests

* Update prompts

* Add missing prompt

* Fix mistaken deletions

* Fix browsing test

* Fixed browse tests
2024-07-22 17:42:57 +00:00
Boxuan Li
be6e6e3add
Bug fix: Metrics not accumulated across agent delegation (#3012)
* Add test to reproduce cost miscalculation bug

* Fix metrics bug

* Copy metrics upon AgentRejectAction
2024-07-20 04:05:05 +00:00
Xingyao Wang
6b16a5da0b
[Eval,Arch] Update GPTQ eval and add headless_mode for Controller (#2994)
* update and polish gptq eval

* fix typo

* Update evaluation/gpqa/README.md

Co-authored-by: Graham Neubig <neubig@gmail.com>

* Update evaluation/gpqa/run_infer.py

Co-authored-by: Graham Neubig <neubig@gmail.com>

* add headless mode to all appropriate agent controller call

* delegate set to error when in headless mode

* try to deduplicate a bit

* make headless_mode default to True and only change it to false for AgentSession

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-07-20 03:35:48 +00:00
Graham Neubig
dada004fac
Remove config from files (#3039) 2024-07-19 23:20:44 -04:00
Boxuan Li
9d41314d1a
State: Add local_iteration attribute (#2990)
* Add local_iteration state attribute

* Fix typos

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-07-18 14:49:19 +00:00
Graham Neubig
c897791024
Refactor LLM config (#2953)
* Add max_message_chars to LLM

* Refactor LLM config

* Fix tests

* Made some functions class functions

* Fix regression

* Fixed comments
2024-07-17 09:16:04 -04:00
Anush Kumar V
8f76587e5c
docs: updated docstrings using ruff's autofix feature (#2923)
* Updated documentation using ruff's autofix feature

* Updated pyproject.toml to include docstring validations

* Updated documentation using ruff's autofix feature

* Updated pyproject.toml to include docstring validations

* Updated docstrings using ruff's autfix feature

* Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-07-16 01:35:33 +00:00
Xingyao Wang
e45ddeb2a2
arch: deprecating recall action and search_memory (#2900)
* deprecating recall action

* fix integration tests

* fix integration tests

* remove search memory
2024-07-12 19:23:21 +00:00
adragos
5f61885e44
feat: Implement user confirmation mode, request confirmation when running bash/python code in this mode (#2774)
* [feat] confirmation mode for bash actions

* feat: Add modal setting for Confirmation Mode

* fix: frontend tests for confirmation mode switch

* fix: add missing CONFIRMATION_MODE value in SettingsModal.test.tsx

* fix: update test to integrate new setting

* feat: Implement user confirmation for running bash/python code

* fix: don't display rejected actions

* fix: linting, rename/refactor based on feedback

* fix: add property only to commands, pass serialization tests

* fix: package-lock.json, lint test_action_serialization.py

* test: add is_confirmed to integration test outputs

---------

Co-authored-by: Mislav Balunovic <mislav.balunovic@gmail.com>
2024-07-11 14:57:21 +03:00
Boxuan Li
c68478f470
Customize LLM config per agent (#2756)
Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent).

Partially solves #2075 (web GUI improvement is not the goal of this PR)
2024-07-09 22:05:54 -07:00
Engel Nyst
2df1d67007
History clean up (#2849)
* clean up add_history

* refactor last agent message
2024-07-08 05:10:21 +02:00
Engel Nyst
d37b2973b2
Refactoring: event stream based agent history (#2709)
* add to event stream sync

* remove async from tests

* small logging spam fix

* remove swe agent

* arch refactoring: use history from the event stream

* refactor agents

* monologue agent

* ruff

* planner agent

* micro-agents

* refactor history in evaluations

* evals history refactoring

* adapt evals and tests

* unit testing stuck

* testing micro agents, event stream

* fix planner agent

* fix tests

* fix stuck after rename

* fix test

* small clean up

* fix merge

* fix merge issue

* fix integration tests

* Update agenthub/dummy_agent/agent.py

* fix tests

* rename more clearly; add todo; clean up
2024-07-07 21:04:23 +00:00
மனோஜ்குமார் பழனிச்சாமி
34c765688b
Streamline Logging Events (#2532)
* Skip duplicate log

* log user actions

* fix tests

* log all action _step

* refactor log

* revert test

* refactor log

* visual diff

* disable overriding event source

* Revert "disable overriding event source"

This reverts commit b0047cc0cdead215044ad8ce73f5bad47df99e08.

* Refactor logic

* refactored runtime on_event

* fix merge conflict

in Web UI, it shows as red color (seems deletion but added)

* linted

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
2024-07-08 05:34:47 +09:00
மனோஜ்குமார் பழனிச்சாமி
adf1a0d556
Bugfix: add missing f-string for logging debug message in task creation (#2836) 2024-07-07 17:36:19 +02:00
Xingyao Wang
a47713ecb0
[Arch] Remove supports for Background Commands (#2803)
* depracting docker exec box

* remove doc exec from workflow and docs

* remove background commands

* Update tests/unit/test_sandbox.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* replace for-loop with assignment

* fix integration tests

* fix integration tests for shell script

* fix integration tests

* increase max iter to fix some monologue agent issue

* fix integration test again

* fix integration tests (seems related to run_user issue)

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-07-06 03:38:05 +08:00
Engel Nyst
0b8d357bef
Add event synchronously (#2700)
* add to event stream sync

* remove async from tests
2024-07-05 00:15:51 +02:00
Leo
c2f557edde
refactor: multiple code improvements (#2771) 2024-07-04 18:51:22 +08:00
Graham Neubig
ffd3c7144c
Remove global args (#2760)
* Remove global args

* Remove global args

* Update files

* Update main

* Bug fixes

* Fix logging
2024-07-03 20:07:52 +09:00
Leo
5e6fb6131f
refactor: Renamed variables to resolve naming conflicts and eliminate warnings (#2732)
* refactor: Renamed variables to resolve naming conflicts and eliminate warnings

Signed-off-by: ifuryst <ifuryst@gmail.com>

* Fix lint failed.

Signed-off-by: ifuryst <ifuryst@gmail.com>

* Combine set_initial_state methods, rename _filed to f, and adjust the AppConfig update codes.

Signed-off-by: ifuryst <ifuryst@gmail.com>

---------

Signed-off-by: ifuryst <ifuryst@gmail.com>
2024-07-02 15:00:58 +08:00
Boxuan Li
8dae1f9307
Bypass MAX_ITERATIONS and MAX_BUDGET_PER_TASK on web GUI (#2697)
Closes #1493

Introduced TRAFFIC_CONTROL_STATE to allow OpenDevin to switch between normal traffic limiting mode and temporarily disabled mode.
2024-06-30 13:19:45 -07:00
Engel Nyst
2d9bb56763
Add ability to restore the cli session (optional) (#2699)
* add ability to restore the main session

* add quick log

* rename to cli session
2024-06-30 06:56:55 +00:00
Engel Nyst
4b1cc56682
Sync history to stream (#2640)
* add event to stream before budget check

* make the budget check before the step

* Update opendevin/controller/agent_controller.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-29 21:19:00 +00:00
Boxuan Li
e45b311c35
Remove MAX_CHARS traffic control (#2694)
* Remove MAX_CHARS limiting

* More cleanup
2024-06-29 12:59:41 -07:00
Boxuan Li
7766a3283e
CodeActAgent: Fix delegate history (#2672) 2024-06-28 16:37:23 +09:00
மனோஜ்குமார் பழனிச்சாமி
af9385322b
Refactor: Simplify message formatting (#2670)
Removed redundant `str()` conversion in f-string.
2024-06-28 07:34:26 +02:00
Engel Nyst
58b06cced7
Revert "Show relevant error in UI (#2516)" (#2657)
This reverts commit d0bdae232f0925ba4340ef10c172f2e3383d3efd.
2024-06-27 08:55:41 +00:00
Boxuan Li
ee86d8d25e
Frontend support for delegation and rejection (#2608)
1. Add support for rejection action on frontend
2. Show users the reason for rejection
3. Get rid of weird empty box after delegation
4. On web GUI, show customer when a delegation starts and ends
2024-06-26 00:30:10 -07:00
Boxuan Li
7e78fde48f
Bug fix: add error observation to history (#2610)
* Bug fix: add error observation to history

* Regenerate to demonstrate format error
2024-06-24 21:24:17 -07:00
Boxuan Li
39d90c0b2a
Track metrics throughout delegation & Polish UX for out of budget error (#2595)
* Track metrics (costs) throught delegation

* Metrics should be shared across agents for better UX

* Update cost before starting delegate
2024-06-23 18:38:52 -07:00
மனோஜ்குமார் பழனிச்சாமி
d0bdae232f
Show relevant error in UI (#2516) 2024-06-19 15:58:48 +05:30
Engel Nyst
b2307db010
Document, rename Agent* exceptions to LLM* (#2508)
* rename "Agent" exceptions to LLM*, document

* LLMResponseError
2024-06-18 22:30:22 +00:00
Engel Nyst
bb4ea1e6cb
Adjust is-stuck check for the same steps to 3 until it's stopped (#2437) 2024-06-14 19:20:12 +05:30
Yufan Song
f7491bd2fa
Refactor response to action in agent step (#2350)
* refactor action parser

* Fix typos

* fix typo

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-06-10 10:17:30 +00:00
Boxuan Li
a9a2f10170
Revamp AgentRejectAction and allow ManagerAgent to handle rejection (#1735)
* Fix AgentRejectAction handling

* Add ManagerAgent to integration tests

* Fix regenerate.sh

* Fix merge

* Update README for micro-agents

* Add test reject to regenerate.sh

* regenerate.sh: Add support for running a specific test and/or agent

* Refine reject schema, and allow ManagerAgent to handle reject

* Add test artifacts for test_simple_task_rejection

* Fix manager agent tests

* Fix README

* test_simple_task_rejection: check final agent state

* Integration test: exit if mock prompt not found

* Update test_simple_task_rejection tests

* Fix test_edits test artifacts after prompt update

* Fix ManagerAgent test_edits

* WIP

* Fix tests

* update test_edits for ManagerAgent

* Skip local sandbox for reject test

* Fix test comparison
2024-06-08 23:12:30 -07:00
Aaron Xia
42c6b506b5
Lazy launching BrowseEnv / making BrowseEnv optional (#2155)
* feat: lazy launching browser; browser optional for diffrent agents.

* style: lint

* fix: integration test fail due to browser not started.

* fix: run by cli and integration test failed.

* fix: lint

* fix: lint

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
2024-05-31 16:40:42 -04:00
Aaron Xia
b1ec8e5dc2
style: Update agent_controller.py to clean log (#2124) 2024-05-29 18:56:11 -07:00
Boxuan Li
9b371b1b5f
Refactor agent delegation and tweak micro agents (#1910)
This PR fixes #1897. In addition, this PR fixes and tweaks a few micro-agents.

For the first time, I am able to use ManagerAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds ManagerAgent as part of integration tests. test_write_simple_script involves delegation to CoderAgent while test_edits involves delegation to TypoFixerAgent.

Also for the first time, I am able to use DelegateAgent to complete test_write_simple_script and test_edits tasks in integration tests, so this PR also adds DelegateAgent as part of integration tests. It involves delegation to StudyRepoForTaskAgent, CoderAgent and VerifierAgent.

This PR is a blocker for #1735 and likely #1945.
2024-05-28 20:01:16 -07:00
Aleksandar
18d07bda89
feat: add max_budget_per_task configuration to control task cost (#2070)
* feat: add max_budget_per_task configuration to control task cost

* Fix test_arg_parser.py

* Use the config.max_budget_per_task as default value

* Add max_budget_per_task to core/main.py as well

* Update opendevin/controller/agent_controller.py

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
2024-05-27 02:04:31 +08:00
Engel Nyst
783fea62a0
Ignore pid for loop detection (Was: override eq...) (#2045)
* rewrite, implement pid ignore in the controller

* make the helper method private
2024-05-26 19:27:12 +02:00
Xingyao Wang
e731048ccf
Improve action and observation logging for the CLI interface (#2035)
* properly log user messages;
format browser action/obs, summarize action, messages properly for logging

* add source to message

* add spaces for printing
2024-05-24 08:21:25 -04:00
Robert Brennan
ea9c785075
fix session state after resuming (#1999)
* fix state resuming

* fix session reconnection

* fix lint
2024-05-23 11:47:36 -04:00
Engel Nyst
0eccf31604
Refactor monologue and SWE agent to use the messages in state history (#1863)
* Refactor monologue to use the messages in state history

* add messages, clean up

* fix monologue

* update integration tests

* move private method

* update SWE agent to use the history from State

* integration tests for SWE agent

* rename monologue to initial_thoughts, since that is what it is
2024-05-23 07:29:12 +00:00
Boxuan Li
acb430eef5
Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests

* Switch back to python 3.11

* Install Docker for macos pipeline

* regenerate.sh: Use environmental variable for sandbox type

* Pack different agents' tests into a single check

* Fix CodeAct tests

* Reduce file match and extensive debug logs

* Add TEST_IN_CI mode that reports codecov

* Small fix: don't quit if reusing old responses failed

* Merge codecov results

* Fix typos

* Remove coverage merge step - codecov automatically does that

* Make mac integration tests as optional - too slow

* Fix codecov args

* Add comments in yaml

* Include sandbox type in codecov report name

* Fix codecov report merge

* Revert renaming of test_matrix_success

* Remove SWEAgent and PlannerAgent from tests

* Mark planner agent and SWE agent as deprecated

* CodeCov: Ignore planner and sweagent

* Revert "Remove SWEAgent and PlannerAgent from tests"

This reverts commit 040cb3bfb9496090d4516f7b45f38376f95de7be.

* Remove all tests for SWE Agent

* Only keep basic tests for MonologueAgent and PlannerAgent

* Mark SWE Agent as deprecated, and ignore code coverage for it

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
2024-05-22 20:38:57 -07:00