tobitege
70dd705418
Fix: apply config arguments for miniwob get_sandbox() from loaded config ( #3198 )
2024-07-31 19:38:15 +00:00
Engel Nyst
93433fa849
pass swe-bench box config parameter ( #3189 )
2024-07-31 15:31:50 +00:00
மனோஜ்குமார் பழனிச்சாமி
563ebd406d
Fix: Add missing arguments for SSHBox in evaluation ( #3075 )
...
* Fix WebArena evaluation script to connect to SSH session
* Update run_infer.py
* Add missing arguments for DockerSSHBox
2024-07-29 23:09:39 +08:00
Xingyao Wang
1c813a2fa0
support swebench pull from custom namespace ( #3136 )
2024-07-26 18:46:36 +00:00
Graham Neubig
275ea706cf
Remove remaining global config ( #3099 )
...
* Remove global config from memory
* Remove runtime global config
* Remove from storage
* Remove global config
* Fix event stream tests
* Fix sandbox issue
* Change config
* Removed transferred tests
* Add swe env box
* Fixes on testing
* Fixed some tests
* Fix typing
* Fix ipython test
* Revive function
* Make temp_dir fixture
* Remove test to avoid circular import
2024-07-26 18:43:32 +00:00
Xingyao Wang
da17665cab
fix: make max_budget_per_task optional in run_agent_controller ( #3071 )
...
* fix: make max_budget_per_task optional in `run_agent_controller`
* update arg for each run infer
2024-07-22 21:47:00 -04:00
Xingyao Wang
a61ac5a214
remove extra arg from swebench ssh box ( #3054 )
2024-07-21 14:58:16 +08:00
Xingyao Wang
6b16a5da0b
[Eval,Arch] Update GPTQ eval and add headless_mode for Controller ( #2994 )
...
* update and polish gptq eval
* fix typo
* Update evaluation/gpqa/README.md
Co-authored-by: Graham Neubig <neubig@gmail.com >
* Update evaluation/gpqa/run_infer.py
Co-authored-by: Graham Neubig <neubig@gmail.com >
* add headless mode to all appropriate agent controller call
* delegate set to error when in headless mode
* try to deduplicate a bit
* make headless_mode default to True and only change it to false for AgentSession
---------
Co-authored-by: Graham Neubig <neubig@gmail.com >
2024-07-20 03:35:48 +00:00
Raj Maheshwari
9cf2b5b74b
[FIX] Update SWEBenchSSHBox after global config was removed from sandbox in #2961 ( #3014 )
...
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
2024-07-19 14:41:50 -07:00
Graham Neubig
3a21198424
Remove monologue agent ( #3036 )
...
* Remove monologue agent
* Fixes
2024-07-19 19:25:05 +00:00
jigsawlabs-student
fa6c12473e
#2220 , integrated aider style linting, currently passes related o… ( #2489 )
...
* WIP for integrate aider linter, see OpenDevin#2220
Updated aider linter to:
* Always return text and line numbers
* Moved extract line number more consistently
* Changed pylint to stop after first linter detects errors
Updated agentskills
* To get back a LintResult object and then use lines and text for error message and related line number
* Moved code for extracting line number to aider linter
Tests:
* Added additional unit tests for aider to test for
* Return values from lint failures
* Confirm linter works for non-configured languages like Ruby
* move to agent_skills, fixes not seeing skills error
* format/lint to new code, fix failing tests, remove unused code from aider linter
* small changes (remove litellm, fix readme typo)
* fix failing sandbox test
* keep, change dumping of metadata
* WIP for integrate aider linter, see OpenDevin#2220
Updated aider linter to:
* Always return text and line numbers
* Moved extract line number more consistently
* Changed pylint to stop after first linter detects errors
Updated agentskills
* To get back a LintResult object and then use lines and text for error message and related line number
* Moved code for extracting line number to aider linter
Tests:
* Added additional unit tests for aider to test for
* Return values from lint failures
* Confirm linter works for non-configured languages like Ruby
* move to agent_skills, fixes not seeing skills error
* format/lint to new code, fix failing tests, remove unused code from aider linter
* remove duplication of tree-sitter, grep-ast and update poetry.lock
* revert to main branch poetry.lock version
* only update necessary package
* fix jupyter kernel wrong interpreter issue (only for swebench)
* fix failing lint tests
* update syntax error checks for flake
* update poetry lock file
* update poetry.lock file, which update content-hash
* add grep ast
* remove extra stuff caused by merge
* update pyproject
* remove extra pytest fixture, ruff styling fixes
* lint files
* update poetry.lock file
---------
Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com >
Co-authored-by: yufansong <yufan@risingwave-labs.com >
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev >
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
Co-authored-by: tobitege <tobitege@gmx.de >
2024-07-19 21:58:54 +08:00
Xingyao Wang
ff6ddc831f
fix: runtime test for mac ( #3005 )
...
* move use_host_network to sandbox config
* fix test runtime tests
* fix kwargs to make it clearer
2024-07-19 03:03:55 +00:00
Xingyao Wang
cf910dfa9d
fix eval api_key leak in metadata; fix llm config in run infer ( #2998 )
2024-07-18 15:46:59 +00:00
Jiayi Pan
7111e8ee14
Support Instance Level Images for SWE-Bench Evaluation ( #2874 )
...
* rename pulled instance images
* Swebench: add support to instance level images
* Update evaluation/swe_bench/run_infer.py
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
* instance swebench: use env var and docker tags instead
* swebench disable instance report for instance images
* Update evaluation/swe_bench/README.md
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
---------
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
2024-07-17 01:31:42 +08:00
Xingyao Wang
f45a2ff04e
[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy ( #2948 )
...
* fix json import
* pass llm to delegation action so that sub-agent shares the same llm for cost accum purpose
* add inference script for browser delegation
* add readme
* Update agenthub/codeact_agent/action_parser.py
Co-authored-by: Graham Neubig <neubig@gmail.com >
* revert action parser changes.
* Rework --llm-config CLI arg
* Revert "pass llm to delegation action so that sub-agent shares the same llm for cost accum purpose"
This reverts commit 81034c486e .
* remove view summary
* update readme
* update comment
* update readme
---------
Co-authored-by: Graham Neubig <neubig@gmail.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
2024-07-16 15:51:29 +00:00
Anush Kumar V
8f76587e5c
docs: updated docstrings using ruff's autofix feature ( #2923 )
...
* Updated documentation using ruff's autofix feature
* Updated pyproject.toml to include docstring validations
* Updated documentation using ruff's autofix feature
* Updated pyproject.toml to include docstring validations
* Updated docstrings using ruff's autfix feature
* Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main
---------
Co-authored-by: Graham Neubig <neubig@gmail.com >
2024-07-16 01:35:33 +00:00
Boxuan Li
4b4fa1c390
Remove legacy swe_bench/scripts/summarise_results.py ( #2932 )
...
* Remove swe_bench/scripts/summarise_results.py
* Remove mention of legacy script
2024-07-15 15:03:07 -04:00
Boxuan Li
b834b354e5
Add compare_patch_filename.py ( #2934 )
2024-07-15 23:55:45 +08:00
Yufan Song
959d21c48f
remove useless code ( #2922 )
2024-07-13 15:20:31 -07:00
Boxuan Li
c68478f470
Customize LLM config per agent ( #2756 )
...
Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent).
Partially solves #2075 (web GUI improvement is not the goal of this PR)
2024-07-09 22:05:54 -07:00
Engel Nyst
2df1d67007
History clean up ( #2849 )
...
* clean up add_history
* refactor last agent message
2024-07-08 05:10:21 +02:00
Engel Nyst
d37b2973b2
Refactoring: event stream based agent history ( #2709 )
...
* add to event stream sync
* remove async from tests
* small logging spam fix
* remove swe agent
* arch refactoring: use history from the event stream
* refactor agents
* monologue agent
* ruff
* planner agent
* micro-agents
* refactor history in evaluations
* evals history refactoring
* adapt evals and tests
* unit testing stuck
* testing micro agents, event stream
* fix planner agent
* fix tests
* fix stuck after rename
* fix test
* small clean up
* fix merge
* fix merge issue
* fix integration tests
* Update agenthub/dummy_agent/agent.py
* fix tests
* rename more clearly; add todo; clean up
2024-07-07 21:04:23 +00:00
Graham Neubig
d0384cafdd
Two fixes to swe bench eval ( #2831 )
...
* Two fixes to swe bench eval
* Add error message
* Change dumping of metadata
2024-07-07 07:21:50 +00:00
Bin Lei
c8e5848add
fix git diff TIMEOUT problem in swe_bench evaluation ( #2828 )
...
* fix git diff TIMEOUT problem in swe_bench evaluation
* fix git diff TIMEOUT problem in swe_bench evaluation
* Update evaluation/swe_bench/swe_env_box.py
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
---------
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
2024-07-07 06:30:59 +00:00
Xingyao Wang
f6dc89b41a
[Evaluation] Simplify eval & and multi-processing related fixes ( #2810 )
...
* initialize agent inside process_instance_fn;
* remove dependency on `config.max_iterations`
* switch back to only include llm config to metadata
2024-07-06 07:18:46 +08:00
Xingyao Wang
a47713ecb0
[Arch] Remove supports for Background Commands ( #2803 )
...
* depracting docker exec box
* remove doc exec from workflow and docs
* remove background commands
* Update tests/unit/test_sandbox.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
* replace for-loop with assignment
* fix integration tests
* fix integration tests for shell script
* fix integration tests
* increase max iter to fix some monologue agent issue
* fix integration test again
* fix integration tests (seems related to run_user issue)
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
2024-07-06 03:38:05 +08:00
Graham Neubig
a081935fd8
Simplify eval code ( #2775 )
...
* Start simplifying eval code
* Update
* Add EDA
* Updated GAIA
* Update gpqa
* Add humanevalfix
* Fix logic_reasoning
* Add miniwob
* Add mint and ml_bench
* toolqa
* Added swe-bench
* Fixed webarena
* Refactor parameters
2024-07-05 19:33:08 +09:00
மனோஜ்குமார் பழனிச்சாமி
143f38d25a
Refactored sandbox config and added fast boot ( #2455 )
...
* Refactored sandbox config and added fastboot
* added tests
* fixed tests
* fixed tests
* intimate user about breaking change
* remove default config from eval
* check for lowercase env
* add test
* Revert Migration
* migrate old sandbox configs
* resolve merge conflict
* revert migration 2
* Revert "remove default config from eval"
This reverts commit de57c588db .
* change type to box_type
* fix var name
* linted
* lint
* lint comments
* fix tests
* fix tests
* fix typo
* fix box_type, remove fast_boot
* add tests for sandbox config
* fix test
* update eval docs
* small removal comments
* adapt toml template
* old fields shouldn't be in the app dataclass
* fix old keys in app config
* clean up exec box
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
2024-07-05 03:30:21 +00:00
Xingyao Wang
298956c78a
[Eval] initialize llm inside process_instance to circumvent "AttributeError:… ( #2805 )
...
* initialize llm inside process_instance to circumvent "AttributeError: Can't pickle local object"
* update kwargs
2024-07-05 01:26:03 +00:00
Xingyao Wang
e6cdf18d3b
[Evaluation] Log empty patch stats for SWE-Bench ( #2776 )
...
* bump swebench version since the fix PR is merged
* add empy generation stats from latest pr
* delete eval_outputs if it already exists
* handle non string patch
2024-07-05 07:03:27 +08:00
Graham Neubig
ffd3c7144c
Remove global args ( #2760 )
...
* Remove global args
* Remove global args
* Update files
* Update main
* Bug fixes
* Fix logging
2024-07-03 20:07:52 +09:00
Xingyao Wang
4d0c4f37d6
[Evaluation] fix SWE-Bench docker image name ( #2751 )
...
* fix double underscore
* remove unused script
2024-07-03 04:30:38 +08:00
Xingyao Wang
41ddba84bd
[Agent] (Potentially) improve Editing using diff ( #2685 )
...
* add replace-based block edit & preliminary test case fix
* further fix the insert behavior
* make edit only work on first occurence
* bump codeact version since we now use new edit agentskills
* update prompt for new agentskills
* update integration tests
* make run_infer.sh executable
* remove code block for edit_file
* update integration test for prompt changes
* default to not use hint for eval
* fix insert emptyfile bug
* throw value error when `to_replace` is empty
* make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt)
* add todo
* update integration test
* fix sandbox test for this PR
2024-07-02 11:50:15 +09:00
Xingyao Wang
6a0ffc5c61
[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation ( #2728 )
...
* add newline after patch to fix patch apply
* new swebench wip
* add newline after patch to fix patch apply
* only add newline if not empty
* update swebench source and update
* update gitignore for swebench eval
* update old prep_eval
* update gitignore
* add scripts for push and pull swebench images
* update eval_infer.sh
* update eval_infer for new docker workflow
* update script to create markdown report based on report.json
* update eval infer to use update output
* update readme
* only move result to folder if running whole file
* remove set-x
* update conversion script
* Update evaluation/swe_bench/README.md
* Update evaluation/swe_bench/README.md
* Update evaluation/swe_bench/README.md
* make sure last line end with newline
* switch to an fix attempt branch of swebench
* Update evaluation/swe_bench/README.md
* Update evaluation/swe_bench/README.md
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
2024-07-01 23:58:30 +00:00
Engel Nyst
2d9bb56763
Add ability to restore the cli session (optional) ( #2699 )
...
* add ability to restore the main session
* add quick log
* rename to cli session
2024-06-30 06:56:55 +00:00
Engel Nyst
874b4c9075
CLI concurrency ( #2695 )
...
* add session id in cli, evals
* fix main sid
2024-06-30 04:04:30 +02:00
Xingyao Wang
15e0c524f4
default to not use hint for eval ( #2696 )
2024-06-29 21:27:57 +00:00
Xingyao Wang
e8cb6803df
[Evaluation] Improve patch apply in SWE-Bench ( #2684 )
...
* add newline after patch to fix patch apply
* only add newline if not empty
2024-06-29 14:11:07 +08:00
மனோஜ்குமார் பழனிச்சாமி
af9385322b
Refactor: Simplify message formatting ( #2670 )
...
Removed redundant `str()` conversion in f-string.
2024-06-28 07:34:26 +02:00
Jiayi Pan
917d96e06f
Fix doc error in evals ( #2654 )
2024-06-27 16:13:47 +00:00
Xavier Vergés
cd91d45b44
Allow SANDBOX_CONTAINER_IMAGEs built from opendevin/sandbox:main ( #2622 )
2024-06-26 12:05:07 +08:00
Xingyao Wang
6de584d77d
update swe-bench output with eval results ( #2606 )
2024-06-24 08:07:28 +09:00
Graham Neubig
cab7a288ca
Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings ( #2597 )
...
* Add NUM_WORKERS variable to run_infer.sh scripts for configurable worker settings
* Update evaluation/webarena/scripts/run_infer.sh
---------
Co-authored-by: OpenDevin <opendevin@all-hands.dev >
2024-06-23 03:43:43 +00:00
மனோஜ்குமார் பழனிச்சாமி
41564c2eac
Use :main instead of :latest ( #2539 )
...
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
2024-06-21 03:57:50 +00:00
Boxuan Li
feabc97aba
Evaluation time travel: build sandbox on the fly ( #2491 )
2024-06-20 20:22:02 -06:00
Xingyao Wang
b569ba710d
docs: Add visualizer instruction for SWE-Bench ( #2529 )
...
* Update README.md for visualizer instruction
* Polish the visualization guidance (#2531 )
* fix conda create error
* fix and polish the readme for visualization
* Update README.md
---------
Co-authored-by: Haofei Yu <haofeiy@cs.cmu.edu >
2024-06-19 20:41:09 +00:00
Xingyao Wang
1f379bebc2
Update README.md ( #2505 )
...
LGTM
2024-06-18 18:14:21 +02:00
Boxuan Li
6f235937cf
Evaluation time travel: allow evaluation on a specific version ( #2356 )
...
* Time travel for evaluation
* Fix source script path
* Exit script if given version doesn't exist
* Exit on failure
* Update README
* Change scripts of all other benchmarks
* Modify README files
* Fix logic_reasoning README
2024-06-16 10:25:14 -04:00
super-dainiu
563bc41fd3
Use LLM to analyze ML-Bench failure cases ( #2399 )
...
* add ml-bench w/o exec env
* fix typos (#1956 )
no functional change
* Refactored Logs (#1939 )
* [Feat] A competitive Web Browsing agent (#1856 )
* initial attempt at a browsing only agent
* add browsing agent
* update
* implement agent
* update
* fix comments
* remove unnecessary things from memory extras
* update image processing
---------
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
* Update README.md SWE-bench score (#1959 )
* Update README.md SWE-bench score
Our most recent results on swe-bench lite are 25%, so this updates the README accordingly.
* Update
* fix: llm is_local function logic error (#1961 )
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
* doc: update documentation about poetry update (#1962 )
* add doc
* Update Development.md
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* feat: add metrics related to cost for better observability (#1944 )
* add metrics for total_cost
* make lint
* refact codeact
* change metrics into llm
* add costs list, add into state
* refactor log completion
* refactor and test others
* make lint
* Update opendevin/core/metrics.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update opendevin/llm/llm.py
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
* refactor
* add code
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
* doc: add more cmd in unit test documentation (#1963 )
* --- (#1975 )
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1976 )
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Logging security (#1943 )
* update .gitignore
* Rename the confusing 'INFO' style to 'DETAIL'
* override str and repr
* feat: api_key desensitize
* feat: add SensitiveDataFilter in file handler
* tweak regex, add tests
* more tweaks, include other attrs
* add env vars, those with equivalent config
* fix tests
* tests are invaluable
---------
Co-authored-by: Shimada666 <649940882@qq.com >
* --- (#1967 )
updated-dependencies:
- dependency-name: react-dom
dependency-type: direct:production
update-type: version-update:semver-minor
- dependency-name: "@types/react-dom"
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1968 )
updated-dependencies:
- dependency-name: "@reduxjs/toolkit"
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1969 )
updated-dependencies:
- dependency-name: husky
dependency-type: direct:development
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1970 )
updated-dependencies:
- dependency-name: tailwind-merge
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1971 )
updated-dependencies:
- dependency-name: i18next
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
* Refactor session management (#1810 )
* refactor session mgmt
* defer file handling to runtime
* add todo
* refactor sessions a bit more
* remove messages logic from FE
* fix up socket handshake
* refactor frontend auth a bit
* first pass at redoing file explorer
* implement directory suffix
* fix up file tree
* close agent on websocket close
* remove session saving
* move file refresh
* remove getWorkspace
* plumb path/code differently
* fix build issues
* fix the tests
* fix npm build
* add session rehydration
* fix event serialization
* logspam
* fix user message rehydration
* add get_event fn
* agent state restoration
* change history tracking for codeact
* fix responsiveness of init
* fix lint
* lint
* delint
* fix prop
* update tests
* logspam
* lint
* fix test
* revert codeact
* change fileService to use API
* fix up session loading
* delint
* delint
* fix integration tests
* revert test
* fix up access to options endpoints
* fix initial files load
* delint
* fix file initialization
* fix mock server
* fixl int
* fix auth for html
* Update frontend/src/i18n/translation.json
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
* refactor sessions and sockets
* avoid reinitializing the same session
* fix reconnect issue
* change up intro message
* more guards on reinit
* rename agent_session
* delint
* fix a bunch of tests
* delint
* fix last test
* remove code editor context
* fix build
* fix any
* fix dot notation
* Update frontend/src/services/api.ts
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* fix up error handling
* Update opendevin/server/session/agent.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update opendevin/server/session/agent.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update frontend/src/services/session.ts
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* fix build errs
* fix else
* add closed state
* delint
* Update opendevin/server/session/session.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
---------
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
Co-authored-by: Graham Neubig <neubig@gmail.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
* fix #1960 (#1964 )
* Add ruff for shared mutable defaults (B) (#1938 )
* Add ruff for shared mutable defaults (B)
* Apply B006, B008 on current files, except fast API
* Update agenthub/SWE_agent/prompts.py
Co-authored-by: Graham Neubig <neubig@gmail.com >
* fix unintended behavior change
* this is correct, tell Ruff to leave it alone
---------
Co-authored-by: Graham Neubig <neubig@gmail.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888 )
* Add MacOS to integration tests
* Switch back to python 3.11
* Install Docker for macos pipeline
* regenerate.sh: Use environmental variable for sandbox type
* Pack different agents' tests into a single check
* Fix CodeAct tests
* Reduce file match and extensive debug logs
* Add TEST_IN_CI mode that reports codecov
* Small fix: don't quit if reusing old responses failed
* Merge codecov results
* Fix typos
* Remove coverage merge step - codecov automatically does that
* Make mac integration tests as optional - too slow
* Fix codecov args
* Add comments in yaml
* Include sandbox type in codecov report name
* Fix codecov report merge
* Revert renaming of test_matrix_success
* Remove SWEAgent and PlannerAgent from tests
* Mark planner agent and SWE agent as deprecated
* CodeCov: Ignore planner and sweagent
* Revert "Remove SWEAgent and PlannerAgent from tests"
This reverts commit 040cb3bfb9 .
* Remove all tests for SWE Agent
* Only keep basic tests for MonologueAgent and PlannerAgent
* Mark SWE Agent as deprecated, and ignore code coverage for it
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
* Fix Repeated Responses in Chat by Adding IPythonRunCellObservation (#1987 )
Co-authored-by: jianghongwei <jianghongwei@58.com >
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
* Save CI cycles for backend tests (#1985 )
* Fix typo in prompt (#1992 )
* Refactor monologue and SWE agent to use the messages in state history (#1863 )
* Refactor monologue to use the messages in state history
* add messages, clean up
* fix monologue
* update integration tests
* move private method
* update SWE agent to use the history from State
* integration tests for SWE agent
* rename monologue to initial_thoughts, since that is what it is
* fix: catch session file not existed exception when init EventStream(maybe creating a new session with no session files stored). (#1994 )
* add ml-bench in readme
* Bump boto3 from 1.34.110 to 1.34.111 (#2001 )
Bumps [boto3](https://github.com/boto/boto3 ) from 1.34.110 to 1.34.111.
- [Release notes](https://github.com/boto/boto3/releases )
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst )
- [Commits](https://github.com/boto/boto3/compare/1.34.110...1.34.111 )
---
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump docker from 7.0.0 to 7.1.0 (#2002 )
Bumps [docker](https://github.com/docker/docker-py ) from 7.0.0 to 7.1.0.
- [Release notes](https://github.com/docker/docker-py/releases )
- [Commits](https://github.com/docker/docker-py/compare/7.0.0...7.1.0 )
---
updated-dependencies:
- dependency-name: docker
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump litellm from 1.37.20 to 1.38.0 (#2005 )
Bumps [litellm](https://github.com/BerriAI/litellm ) from 1.37.20 to 1.38.0.
- [Release notes](https://github.com/BerriAI/litellm/releases )
- [Commits](https://github.com/BerriAI/litellm/compare/v1.37.20...v1.38.0 )
---
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix SWE-Bench evaluation due to setuptools version (#1995 )
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* Revert "correctly setup plugins for swebench eval"
This reverts commit 2bd1055673 .
* bump version
* fix session state after resuming (#1999 )
* fix state resuming
* fix session reconnection
* fix lint
* Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941 )
* add draft for skills
* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file
* Remove new_sample.txt file
* add some work from opendevin w/ fixes
* Add unit tests for agentskills module
* fix some issues and updated tests
* add more tests for open
* tweak and handle goto_line
* add tests for some edge cases
* add tests for scrolling
* add tests for edit
* add tests for search_dir
* update tests to use pytest
* use pytest --forked to avoid file op unit tests to interfere with each other via global var
* update doc based on swe agent tool
* update and add tests for find_file and search_file
* move agent_skills to plugins
* add agentskills as plugin and docs
* add agentskill to ssh box and fix sandbox integration
* remove extra returns in doc
* add agentskills to initial tool for jupyter
* support re-init jupyter kernel (for agentskills) after restart
* fix print window's issue with indentation and add testcases
* add prompt for codeact with the newest edit primitives
* modify the way line number is presented (remove leading space)
* change prompt to the newest display format
* support tracking of costs via metrics
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* implement and add tests for py linting
* remove extra text arg for incompatible subprocess ver
* remove sample.txt
* update test_edits integration tests
* fix all integration
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* Update opendevin/runtime/plugins/agent_skills/agentskills.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* Revert "correctly setup plugins for swebench eval"
This reverts commit 2bd1055673 .
* bump version
* remove _AGENT_SKILLS_DOCS
* move flake8 to test dep
* update poetry.lock
* remove extra arg
* reduce max iter for eval
* update poetry
* fix integration tests
---------
Co-authored-by: OpenDevin <opendevin@opendevin.ai >
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
* build: Add poetry command to use Python 3.11 for environment setup (#1972 )
* Bump @react-types/shared from 3.23.0 to 3.23.1 in /frontend (#2006 )
Bumps [@react-types/shared](https://github.com/adobe/react-spectrum ) from 3.23.0 to 3.23.1.
- [Release notes](https://github.com/adobe/react-spectrum/releases )
- [Commits](https://github.com/adobe/react-spectrum/compare/@react-types/shared@3.23.0...@react-types/shared@3.23.1 )
---
updated-dependencies:
- dependency-name: "@react-types/shared"
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump @types/react-syntax-highlighter in /frontend (#2007 )
Bumps [@types/react-syntax-highlighter](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-syntax-highlighter ) from 15.5.11 to 15.5.13.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases )
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-syntax-highlighter )
---
updated-dependencies:
- dependency-name: "@types/react-syntax-highlighter"
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump @typescript-eslint/parser from 7.9.0 to 7.10.0 in /frontend (#2008 )
Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser ) from 7.9.0 to 7.10.0.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases )
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md )
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.10.0/packages/parser )
---
updated-dependencies:
- dependency-name: "@typescript-eslint/parser"
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump lint-staged from 15.2.2 to 15.2.4 in /frontend (#2009 )
Bumps [lint-staged](https://github.com/okonet/lint-staged ) from 15.2.2 to 15.2.4.
- [Release notes](https://github.com/okonet/lint-staged/releases )
- [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md )
- [Commits](https://github.com/okonet/lint-staged/compare/v15.2.2...v15.2.4 )
---
updated-dependencies:
- dependency-name: lint-staged
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update README.md
* Update README.md
* add run_infer.sh
* fix input output
* fix docker sandbox
* fix run
* update and clean run_infer.py
* add script to clean up dockers
* update repo uid
* add description
* new
* Update README.md
* use root for sandbox
* update readme
* update ml-bench conda env
* update readme
* update readme
* use try except
* modify raise exception
* add int
* update README
* longer time
* fix existing issues
* fix existing issue
* new docker image
* add metrics of cost
* add result parsing cost
* fix
* fix
* update summarize
* fix
* add analyze
* update readme
* use 4o
* add eval output
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-157.ec2.internal >
Co-authored-by: RainRat <rainrat78@yahoo.ca >
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
Co-authored-by: Frank Xu <frankxu2004@gmail.com >
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com >
Co-authored-by: Graham Neubig <neubig@gmail.com >
Co-authored-by: Shimada666 <649940882@qq.com >
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk >
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com >
Co-authored-by: Robert Brennan <accounts@rbren.io >
Co-authored-by: Rahul Anand <62982824+zeul22@users.noreply.github.com >
Co-authored-by: jiangleo <jiangleo@users.noreply.github.com >
Co-authored-by: jianghongwei <jianghongwei@58.com >
Co-authored-by: Jeremi Joslin <jeremi@newlogic.com >
Co-authored-by: Aaron Xia <zhhuaxia@gmail.com >
Co-authored-by: OpenDevin <opendevin@opendevin.ai >
Co-authored-by: DaxServer <7479937+DaxServer@users.noreply.github.com >
Co-authored-by: Robert <871607149@qq.com >
2024-06-13 09:30:55 +08:00
Xingyao Wang
b3bdc44292
mkdir infer_logs instead of logs ( #2382 )
2024-06-11 07:18:19 +08:00