OpenHands

mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 13:47:19 +08:00

Author	SHA1	Message	Date
tobitege	70dd705418	Fix: apply config arguments for miniwob get_sandbox() from loaded config (#3198 )	2024-07-31 19:38:15 +00:00
Engel Nyst	93433fa849	pass swe-bench box config parameter (#3189 )	2024-07-31 15:31:50 +00:00
மனோஜ்குமார் பழனிச்சாமி	563ebd406d	Fix: Add missing arguments for SSHBox in evaluation (#3075 ) * Fix WebArena evaluation script to connect to SSH session * Update run_infer.py * Add missing arguments for DockerSSHBox	2024-07-29 23:09:39 +08:00
Xingyao Wang	1c813a2fa0	support swebench pull from custom namespace (#3136 )	2024-07-26 18:46:36 +00:00
Graham Neubig	275ea706cf	Remove remaining global config (#3099 ) * Remove global config from memory * Remove runtime global config * Remove from storage * Remove global config * Fix event stream tests * Fix sandbox issue * Change config * Removed transferred tests * Add swe env box * Fixes on testing * Fixed some tests * Fix typing * Fix ipython test * Revive function * Make temp_dir fixture * Remove test to avoid circular import	2024-07-26 18:43:32 +00:00
Xingyao Wang	da17665cab	fix: make max_budget_per_task optional in `run_agent_controller` (#3071 ) * fix: make max_budget_per_task optional in `run_agent_controller` * update arg for each run infer	2024-07-22 21:47:00 -04:00
Xingyao Wang	a61ac5a214	remove extra arg from swebench ssh box (#3054 )	2024-07-21 14:58:16 +08:00
Xingyao Wang	6b16a5da0b	[Eval,Arch] Update GPTQ eval and add `headless_mode` for Controller (#2994 ) * update and polish gptq eval * fix typo * Update evaluation/gpqa/README.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/gpqa/run_infer.py Co-authored-by: Graham Neubig <neubig@gmail.com> * add headless mode to all appropriate agent controller call * delegate set to error when in headless mode * try to deduplicate a bit * make headless_mode default to True and only change it to false for AgentSession --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-20 03:35:48 +00:00
Raj Maheshwari	9cf2b5b74b	[FIX] Update SWEBenchSSHBox after global config was removed from sandbox in #2961 (#3014 ) Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-07-19 14:41:50 -07:00
Graham Neubig	3a21198424	Remove monologue agent (#3036 ) * Remove monologue agent * Fixes	2024-07-19 19:25:05 +00:00
jigsawlabs-student	fa6c12473e	#2220 , integrated aider style linting, currently passes related o… (#2489 ) * WIP for integrate aider linter, see OpenDevin#2220 Updated aider linter to: * Always return text and line numbers * Moved extract line number more consistently * Changed pylint to stop after first linter detects errors Updated agentskills * To get back a LintResult object and then use lines and text for error message and related line number * Moved code for extracting line number to aider linter Tests: * Added additional unit tests for aider to test for * Return values from lint failures * Confirm linter works for non-configured languages like Ruby * move to agent_skills, fixes not seeing skills error * format/lint to new code, fix failing tests, remove unused code from aider linter * small changes (remove litellm, fix readme typo) * fix failing sandbox test * keep, change dumping of metadata * WIP for integrate aider linter, see OpenDevin#2220 Updated aider linter to: * Always return text and line numbers * Moved extract line number more consistently * Changed pylint to stop after first linter detects errors Updated agentskills * To get back a LintResult object and then use lines and text for error message and related line number * Moved code for extracting line number to aider linter Tests: * Added additional unit tests for aider to test for * Return values from lint failures * Confirm linter works for non-configured languages like Ruby * move to agent_skills, fixes not seeing skills error * format/lint to new code, fix failing tests, remove unused code from aider linter * remove duplication of tree-sitter, grep-ast and update poetry.lock * revert to main branch poetry.lock version * only update necessary package * fix jupyter kernel wrong interpreter issue (only for swebench) * fix failing lint tests * update syntax error checks for flake * update poetry lock file * update poetry.lock file, which update content-hash * add grep ast * remove extra stuff caused by merge * update pyproject * remove extra pytest fixture, ruff styling fixes * lint files * update poetry.lock file --------- Co-authored-by: Jeff Katzy <jeffreyerickatz@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Xingyao Wang <xingyao@all-hands.dev> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: tobitege <tobitege@gmx.de>	2024-07-19 21:58:54 +08:00
Xingyao Wang	ff6ddc831f	fix: runtime test for mac (#3005 ) * move use_host_network to sandbox config * fix test runtime tests * fix kwargs to make it clearer	2024-07-19 03:03:55 +00:00
Xingyao Wang	cf910dfa9d	fix eval api_key leak in metadata; fix llm config in run infer (#2998 )	2024-07-18 15:46:59 +00:00
Jiayi Pan	7111e8ee14	Support Instance Level Images for SWE-Bench Evaluation (#2874 ) * rename pulled instance images * Swebench: add support to instance level images * Update evaluation/swe_bench/run_infer.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * instance swebench: use env var and docker tags instead * swebench disable instance report for instance images * Update evaluation/swe_bench/README.md Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>	2024-07-17 01:31:42 +08:00
Xingyao Wang	f45a2ff04e	[Agent, Eval] Fixes LLM config issue for delegation & Add eval to measure the delegation accuracy (#2948 ) * fix json import * pass llm to delegation action so that sub-agent shares the same llm for cost accum purpose * add inference script for browser delegation * add readme * Update agenthub/codeact_agent/action_parser.py Co-authored-by: Graham Neubig <neubig@gmail.com> * revert action parser changes. * Rework --llm-config CLI arg * Revert "pass llm to delegation action so that sub-agent shares the same llm for cost accum purpose" This reverts commit `81034c486e`. * remove view summary * update readme * update comment * update readme --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-07-16 15:51:29 +00:00
Anush Kumar V	8f76587e5c	docs: updated docstrings using ruff's autofix feature (#2923 ) * Updated documentation using ruff's autofix feature * Updated pyproject.toml to include docstring validations * Updated documentation using ruff's autofix feature * Updated pyproject.toml to include docstring validations * Updated docstrings using ruff's autfix feature * Deleted opendevin/runtime/utils/soource.py, Keeping in sync with main --------- Co-authored-by: Graham Neubig <neubig@gmail.com>	2024-07-16 01:35:33 +00:00
Boxuan Li	4b4fa1c390	Remove legacy swe_bench/scripts/summarise_results.py (#2932 ) * Remove swe_bench/scripts/summarise_results.py * Remove mention of legacy script	2024-07-15 15:03:07 -04:00
Boxuan Li	b834b354e5	Add compare_patch_filename.py (#2934 )	2024-07-15 23:55:45 +08:00
Yufan Song	959d21c48f	remove useless code (#2922 )	2024-07-13 15:20:31 -07:00
Boxuan Li	c68478f470	Customize LLM config per agent (#2756 ) Currently, OpenDevin uses a global singleton LLM config and a global singleton agent config. This PR allows customers to configure an LLM config for each agent. A hypothetically useful scenario is to use a cheaper LLM for repo exploration / code search, and a more powerful LLM to actually do the problem solving (CodeActAgent). Partially solves #2075 (web GUI improvement is not the goal of this PR)	2024-07-09 22:05:54 -07:00
Engel Nyst	2df1d67007	History clean up (#2849 ) * clean up add_history * refactor last agent message	2024-07-08 05:10:21 +02:00
Engel Nyst	d37b2973b2	Refactoring: event stream based agent history (#2709 ) * add to event stream sync * remove async from tests * small logging spam fix * remove swe agent * arch refactoring: use history from the event stream * refactor agents * monologue agent * ruff * planner agent * micro-agents * refactor history in evaluations * evals history refactoring * adapt evals and tests * unit testing stuck * testing micro agents, event stream * fix planner agent * fix tests * fix stuck after rename * fix test * small clean up * fix merge * fix merge issue * fix integration tests * Update agenthub/dummy_agent/agent.py * fix tests * rename more clearly; add todo; clean up	2024-07-07 21:04:23 +00:00
Graham Neubig	d0384cafdd	Two fixes to swe bench eval (#2831 ) * Two fixes to swe bench eval * Add error message * Change dumping of metadata	2024-07-07 07:21:50 +00:00
Bin Lei	c8e5848add	fix git diff TIMEOUT problem in swe_bench evaluation (#2828 ) * fix git diff TIMEOUT problem in swe_bench evaluation * fix git diff TIMEOUT problem in swe_bench evaluation * Update evaluation/swe_bench/swe_env_box.py Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> --------- Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2024-07-07 06:30:59 +00:00
Xingyao Wang	f6dc89b41a	[Evaluation] Simplify eval & and multi-processing related fixes (#2810 ) * initialize agent inside process_instance_fn; * remove dependency on `config.max_iterations` * switch back to only include llm config to metadata	2024-07-06 07:18:46 +08:00
Xingyao Wang	a47713ecb0	[Arch] Remove supports for Background Commands (#2803 ) * depracting docker exec box * remove doc exec from workflow and docs * remove background commands * Update tests/unit/test_sandbox.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * replace for-loop with assignment * fix integration tests * fix integration tests for shell script * fix integration tests * increase max iter to fix some monologue agent issue * fix integration test again * fix integration tests (seems related to run_user issue) --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-06 03:38:05 +08:00
Graham Neubig	a081935fd8	Simplify eval code (#2775 ) * Start simplifying eval code * Update * Add EDA * Updated GAIA * Update gpqa * Add humanevalfix * Fix logic_reasoning * Add miniwob * Add mint and ml_bench * toolqa * Added swe-bench * Fixed webarena * Refactor parameters	2024-07-05 19:33:08 +09:00
மனோஜ்குமார் பழனிச்சாமி	143f38d25a	Refactored sandbox config and added fast boot (#2455 ) * Refactored sandbox config and added fastboot * added tests * fixed tests * fixed tests * intimate user about breaking change * remove default config from eval * check for lowercase env * add test * Revert Migration * migrate old sandbox configs * resolve merge conflict * revert migration 2 * Revert "remove default config from eval" This reverts commit `de57c588db`. * change type to box_type * fix var name * linted * lint * lint comments * fix tests * fix tests * fix typo * fix box_type, remove fast_boot * add tests for sandbox config * fix test * update eval docs * small removal comments * adapt toml template * old fields shouldn't be in the app dataclass * fix old keys in app config * clean up exec box --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-05 03:30:21 +00:00
Xingyao Wang	298956c78a	[Eval] initialize llm inside process_instance to circumvent "AttributeError:… (#2805 ) * initialize llm inside process_instance to circumvent "AttributeError: Can't pickle local object" * update kwargs	2024-07-05 01:26:03 +00:00
Xingyao Wang	e6cdf18d3b	[Evaluation] Log empty patch stats for SWE-Bench (#2776 ) * bump swebench version since the fix PR is merged * add empy generation stats from latest pr * delete eval_outputs if it already exists * handle non string patch	2024-07-05 07:03:27 +08:00
Graham Neubig	ffd3c7144c	Remove global args (#2760 ) * Remove global args * Remove global args * Update files * Update main * Bug fixes * Fix logging	2024-07-03 20:07:52 +09:00
Xingyao Wang	4d0c4f37d6	[Evaluation] fix SWE-Bench docker image name (#2751 ) * fix double underscore * remove unused script	2024-07-03 04:30:38 +08:00
Xingyao Wang	41ddba84bd	[Agent] (Potentially) improve Editing using `diff` (#2685 ) * add replace-based block edit & preliminary test case fix * further fix the insert behavior * make edit only work on first occurence * bump codeact version since we now use new edit agentskills * update prompt for new agentskills * update integration tests * make run_infer.sh executable * remove code block for edit_file * update integration test for prompt changes * default to not use hint for eval * fix insert emptyfile bug * throw value error when `to_replace` is empty * make `_edit_or_insert_file` return string so we can try to fix some linter errors (best attempt) * add todo * update integration test * fix sandbox test for this PR	2024-07-02 11:50:15 +09:00
Xingyao Wang	6a0ffc5c61	[Evaluation] Use the latest official SWE-Bench Dockerization for evaluation (#2728 ) * add newline after patch to fix patch apply * new swebench wip * add newline after patch to fix patch apply * only add newline if not empty * update swebench source and update * update gitignore for swebench eval * update old prep_eval * update gitignore * add scripts for push and pull swebench images * update eval_infer.sh * update eval_infer for new docker workflow * update script to create markdown report based on report.json * update eval infer to use update output * update readme * only move result to folder if running whole file * remove set-x * update conversion script * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md * make sure last line end with newline * switch to an fix attempt branch of swebench * Update evaluation/swe_bench/README.md * Update evaluation/swe_bench/README.md --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>	2024-07-01 23:58:30 +00:00
Engel Nyst	2d9bb56763	Add ability to restore the cli session (optional) (#2699 ) * add ability to restore the main session * add quick log * rename to cli session	2024-06-30 06:56:55 +00:00
Engel Nyst	874b4c9075	CLI concurrency (#2695 ) * add session id in cli, evals * fix main sid	2024-06-30 04:04:30 +02:00
Xingyao Wang	15e0c524f4	default to not use hint for eval (#2696 )	2024-06-29 21:27:57 +00:00
Xingyao Wang	e8cb6803df	[Evaluation] Improve patch apply in SWE-Bench (#2684 ) * add newline after patch to fix patch apply * only add newline if not empty	2024-06-29 14:11:07 +08:00
மனோஜ்குமார் பழனிச்சாமி	af9385322b	Refactor: Simplify message formatting (#2670 ) Removed redundant `str()` conversion in f-string.	2024-06-28 07:34:26 +02:00
Jiayi Pan	917d96e06f	Fix doc error in evals (#2654 )	2024-06-27 16:13:47 +00:00
Xavier Vergés	cd91d45b44	Allow SANDBOX_CONTAINER_IMAGEs built from opendevin/sandbox:main (#2622 )	2024-06-26 12:05:07 +08:00
Xingyao Wang	6de584d77d	update swe-bench output with eval results (#2606 )	2024-06-24 08:07:28 +09:00
Graham Neubig	cab7a288ca	Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings (#2597 ) * Add NUM_WORKERS variable to run_infer.sh scripts for configurable worker settings * Update evaluation/webarena/scripts/run_infer.sh --------- Co-authored-by: OpenDevin <opendevin@all-hands.dev>	2024-06-23 03:43:43 +00:00
மனோஜ்குமார் பழனிச்சாமி	41564c2eac	Use :main instead of :latest (#2539 ) Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>	2024-06-21 03:57:50 +00:00
Boxuan Li	feabc97aba	Evaluation time travel: build sandbox on the fly (#2491 )	2024-06-20 20:22:02 -06:00
Xingyao Wang	b569ba710d	docs: Add visualizer instruction for SWE-Bench (#2529 ) * Update README.md for visualizer instruction * Polish the visualization guidance (#2531) * fix conda create error * fix and polish the readme for visualization * Update README.md --------- Co-authored-by: Haofei Yu <haofeiy@cs.cmu.edu>	2024-06-19 20:41:09 +00:00
Xingyao Wang	1f379bebc2	Update README.md (#2505 ) LGTM	2024-06-18 18:14:21 +02:00
Boxuan Li	6f235937cf	Evaluation time travel: allow evaluation on a specific version (#2356 ) * Time travel for evaluation * Fix source script path * Exit script if given version doesn't exist * Exit on failure * Update README * Change scripts of all other benchmarks * Modify README files * Fix logic_reasoning README	2024-06-16 10:25:14 -04:00
super-dainiu	563bc41fd3	Use LLM to analyze ML-Bench failure cases (#2399 ) * add ml-bench w/o exec env * fix typos (#1956) no functional change * Refactored Logs (#1939) * [Feat] A competitive Web Browsing agent (#1856) * initial attempt at a browsing only agent * add browsing agent * update * implement agent * update * fix comments * remove unnecessary things from memory extras * update image processing --------- Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Update README.md SWE-bench score (#1959) * Update README.md SWE-bench score Our most recent results on swe-bench lite are 25%, so this updates the README accordingly. * Update * fix: llm is_local function logic error (#1961) Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> * doc: update documentation about poetry update (#1962) * add doc * Update Development.md --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * feat: add metrics related to cost for better observability (#1944) * add metrics for total_cost * make lint * refact codeact * change metrics into llm * add costs list, add into state * refactor log completion * refactor and test others * make lint * Update opendevin/core/metrics.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/llm/llm.py Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor * add code --------- Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * doc: add more cmd in unit test documentation (#1963) * --- (#1975) updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1976) updated-dependencies: - dependency-name: litellm dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Logging security (#1943) * update .gitignore * Rename the confusing 'INFO' style to 'DETAIL' * override str and repr * feat: api_key desensitize * feat: add SensitiveDataFilter in file handler * tweak regex, add tests * more tweaks, include other attrs * add env vars, those with equivalent config * fix tests * tests are invaluable --------- Co-authored-by: Shimada666 <649940882@qq.com> * --- (#1967) updated-dependencies: - dependency-name: react-dom dependency-type: direct:production update-type: version-update:semver-minor - dependency-name: "@types/react-dom" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1968) updated-dependencies: - dependency-name: "@reduxjs/toolkit" dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1969) updated-dependencies: - dependency-name: husky dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1970) updated-dependencies: - dependency-name: tailwind-merge dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * --- (#1971) updated-dependencies: - dependency-name: i18next dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> * Refactor session management (#1810) * refactor session mgmt * defer file handling to runtime * add todo * refactor sessions a bit more * remove messages logic from FE * fix up socket handshake * refactor frontend auth a bit * first pass at redoing file explorer * implement directory suffix * fix up file tree * close agent on websocket close * remove session saving * move file refresh * remove getWorkspace * plumb path/code differently * fix build issues * fix the tests * fix npm build * add session rehydration * fix event serialization * logspam * fix user message rehydration * add get_event fn * agent state restoration * change history tracking for codeact * fix responsiveness of init * fix lint * lint * delint * fix prop * update tests * logspam * lint * fix test * revert codeact * change fileService to use API * fix up session loading * delint * delint * fix integration tests * revert test * fix up access to options endpoints * fix initial files load * delint * fix file initialization * fix mock server * fixl int * fix auth for html * Update frontend/src/i18n/translation.json Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> * refactor sessions and sockets * avoid reinitializing the same session * fix reconnect issue * change up intro message * more guards on reinit * rename agent_session * delint * fix a bunch of tests * delint * fix last test * remove code editor context * fix build * fix any * fix dot notation * Update frontend/src/services/api.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix up error handling * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/server/session/agent.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update frontend/src/services/session.ts Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * fix build errs * fix else * add closed state * delint * Update opendevin/server/session/session.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * fix #1960 (#1964) * Add ruff for shared mutable defaults (B) (#1938) * Add ruff for shared mutable defaults (B) * Apply B006, B008 on current files, except fast API * Update agenthub/SWE_agent/prompts.py Co-authored-by: Graham Neubig <neubig@gmail.com> * fix unintended behavior change * this is correct, tell Ruff to leave it alone --------- Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888) * Add MacOS to integration tests * Switch back to python 3.11 * Install Docker for macos pipeline * regenerate.sh: Use environmental variable for sandbox type * Pack different agents' tests into a single check * Fix CodeAct tests * Reduce file match and extensive debug logs * Add TEST_IN_CI mode that reports codecov * Small fix: don't quit if reusing old responses failed * Merge codecov results * Fix typos * Remove coverage merge step - codecov automatically does that * Make mac integration tests as optional - too slow * Fix codecov args * Add comments in yaml * Include sandbox type in codecov report name * Fix codecov report merge * Revert renaming of test_matrix_success * Remove SWEAgent and PlannerAgent from tests * Mark planner agent and SWE agent as deprecated * CodeCov: Ignore planner and sweagent * Revert "Remove SWEAgent and PlannerAgent from tests" This reverts commit `040cb3bfb9`. * Remove all tests for SWE Agent * Only keep basic tests for MonologueAgent and PlannerAgent * Mark SWE Agent as deprecated, and ignore code coverage for it --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * Fix Repeated Responses in Chat by Adding IPythonRunCellObservation (#1987) Co-authored-by: jianghongwei <jianghongwei@58.com> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> * Save CI cycles for backend tests (#1985) * Fix typo in prompt (#1992) * Refactor monologue and SWE agent to use the messages in state history (#1863) * Refactor monologue to use the messages in state history * add messages, clean up * fix monologue * update integration tests * move private method * update SWE agent to use the history from State * integration tests for SWE agent * rename monologue to initial_thoughts, since that is what it is * fix: catch session file not existed exception when init EventStream(maybe creating a new session with no session files stored). (#1994) * add ml-bench in readme * Bump boto3 from 1.34.110 to 1.34.111 (#2001) Bumps [boto3](https://github.com/boto/boto3) from 1.34.110 to 1.34.111. - [Release notes](https://github.com/boto/boto3/releases) - [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst) - [Commits](https://github.com/boto/boto3/compare/1.34.110...1.34.111) --- updated-dependencies: - dependency-name: boto3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump docker from 7.0.0 to 7.1.0 (#2002) Bumps [docker](https://github.com/docker/docker-py) from 7.0.0 to 7.1.0. - [Release notes](https://github.com/docker/docker-py/releases) - [Commits](https://github.com/docker/docker-py/compare/7.0.0...7.1.0) --- updated-dependencies: - dependency-name: docker dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump litellm from 1.37.20 to 1.38.0 (#2005) Bumps [litellm](https://github.com/BerriAI/litellm) from 1.37.20 to 1.38.0. - [Release notes](https://github.com/BerriAI/litellm/releases) - [Commits](https://github.com/BerriAI/litellm/compare/v1.37.20...v1.38.0) --- updated-dependencies: - dependency-name: litellm dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix SWE-Bench evaluation due to setuptools version (#1995) * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit `2bd1055673`. * bump version * fix session state after resuming (#1999) * fix state resuming * fix session reconnection * fix lint * Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941) * add draft for skills * Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file * Remove new_sample.txt file * add some work from opendevin w/ fixes * Add unit tests for agentskills module * fix some issues and updated tests * add more tests for open * tweak and handle goto_line * add tests for some edge cases * add tests for scrolling * add tests for edit * add tests for search_dir * update tests to use pytest * use pytest --forked to avoid file op unit tests to interfere with each other via global var * update doc based on swe agent tool * update and add tests for find_file and search_file * move agent_skills to plugins * add agentskills as plugin and docs * add agentskill to ssh box and fix sandbox integration * remove extra returns in doc * add agentskills to initial tool for jupyter * support re-init jupyter kernel (for agentskills) after restart * fix print window's issue with indentation and add testcases * add prompt for codeact with the newest edit primitives * modify the way line number is presented (remove leading space) * change prompt to the newest display format * support tracking of costs via metrics * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * implement and add tests for py linting * remove extra text arg for incompatible subprocess ver * remove sample.txt * update test_edits integration tests * fix all integration * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update opendevin/runtime/plugins/agent_skills/README.md * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update agenthub/codeact_agent/prompt.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update opendevin/runtime/plugins/agent_skills/agentskills.py Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * correctly setup plugins for swebench eval * bump swe-bench version and add logging * correctly setup plugins for swebench eval * bump swe-bench version and add logging * Revert "correctly setup plugins for swebench eval" This reverts commit `2bd1055673`. * bump version * remove _AGENT_SKILLS_DOCS * move flake8 to test dep * update poetry.lock * remove extra arg * reduce max iter for eval * update poetry * fix integration tests --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * build: Add poetry command to use Python 3.11 for environment setup (#1972) * Bump @react-types/shared from 3.23.0 to 3.23.1 in /frontend (#2006) Bumps [@react-types/shared](https://github.com/adobe/react-spectrum) from 3.23.0 to 3.23.1. - [Release notes](https://github.com/adobe/react-spectrum/releases) - [Commits](https://github.com/adobe/react-spectrum/compare/@react-types/shared@3.23.0...@react-types/shared@3.23.1) --- updated-dependencies: - dependency-name: "@react-types/shared" dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump @types/react-syntax-highlighter in /frontend (#2007) Bumps [@types/react-syntax-highlighter](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-syntax-highlighter) from 15.5.11 to 15.5.13. - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-syntax-highlighter) --- updated-dependencies: - dependency-name: "@types/react-syntax-highlighter" dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump @typescript-eslint/parser from 7.9.0 to 7.10.0 in /frontend (#2008) Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 7.9.0 to 7.10.0. - [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases) - [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md) - [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.10.0/packages/parser) --- updated-dependencies: - dependency-name: "@typescript-eslint/parser" dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump lint-staged from 15.2.2 to 15.2.4 in /frontend (#2009) Bumps [lint-staged](https://github.com/okonet/lint-staged) from 15.2.2 to 15.2.4. - [Release notes](https://github.com/okonet/lint-staged/releases) - [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md) - [Commits](https://github.com/okonet/lint-staged/compare/v15.2.2...v15.2.4) --- updated-dependencies: - dependency-name: lint-staged dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update README.md * Update README.md * add run_infer.sh * fix input output * fix docker sandbox * fix run * update and clean run_infer.py * add script to clean up dockers * update repo uid * add description * new * Update README.md * use root for sandbox * update readme * update ml-bench conda env * update readme * update readme * use try except * modify raise exception * add int * update README * longer time * fix existing issues * fix existing issue * new docker image * add metrics of cost * add result parsing cost * fix * fix * update summarize * fix * add analyze * update readme * use 4o * add eval output --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-157.ec2.internal> Co-authored-by: RainRat <rainrat78@yahoo.ca> Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com> Co-authored-by: Frank Xu <frankxu2004@gmail.com> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Shimada666 <649940882@qq.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Robert Brennan <accounts@rbren.io> Co-authored-by: Rahul Anand <62982824+zeul22@users.noreply.github.com> Co-authored-by: jiangleo <jiangleo@users.noreply.github.com> Co-authored-by: jianghongwei <jianghongwei@58.com> Co-authored-by: Jeremi Joslin <jeremi@newlogic.com> Co-authored-by: Aaron Xia <zhhuaxia@gmail.com> Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: DaxServer <7479937+DaxServer@users.noreply.github.com> Co-authored-by: Robert <871607149@qq.com>	2024-06-13 09:30:55 +08:00
Xingyao Wang	b3bdc44292	mkdir `infer_logs` instead of `logs` (#2382 )	2024-06-11 07:18:19 +08:00

1 2 3

128 Commits