chore: add claude 4 to verified mode & global replace 3.7 to claude 4 (#8665)

Co-authored-by: openhands <openhands@all-hands.dev>
This commit is contained in:
Xingyao Wang 2025-05-24 01:35:30 +08:00 committed by GitHub
parent 5e43dbadcb
commit 31ad7fc175
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
22 changed files with 45 additions and 27 deletions

View File

@ -24,7 +24,7 @@ on:
LLM_MODEL:
required: false
type: string
default: "anthropic/claude-3-7-sonnet-20250219"
default: "anthropic/claude-sonnet-4-20250514"
LLM_API_VERSION:
required: false
type: string

View File

@ -67,7 +67,7 @@ docker run -it --rm --pull=always \
You'll find OpenHands running at [http://localhost:3000](http://localhost:3000)!
When you open the application, you'll be asked to choose an LLM provider and add an API key.
[Anthropic's Claude 3.7 Sonnet](https://www.anthropic.com/api) (`anthropic/claude-3-7-sonnet-20250219`)
[Anthropic's Claude Sonnet 4](https://www.anthropic.com/api) (`anthropic/claude-sonnet-4-20250514`)
works best, but you have [many options](https://docs.all-hands.dev/modules/usage/llms).
## 💡 Other ways to run OpenHands

View File

@ -52,4 +52,4 @@ $ poetry run python docs/translation_updater.py
# ...
```
This process uses `claude-3-7-sonnet-20250219` as base model and each language consumes at least ~30k input tokens and ~35k output tokens.
This process uses `claude-sonnet-4-20250514` as base model and each language consumes at least ~30k input tokens and ~35k output tokens.

View File

@ -13,7 +13,7 @@ recommandations pour la sélection de modèles. Nos derniers résultats d'évalu
Sur la base de ces résultats et des retours de la communauté, les modèles suivants ont été vérifiés comme fonctionnant raisonnablement bien avec OpenHands :
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recommandé)
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommandé)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@ -13,7 +13,7 @@ OpenHandsはLiteLLMでサポートされているあらゆるLLMに接続でき
これらの調査結果とコミュニティからのフィードバックに基づき、以下のモデルはOpenHandsでうまく動作することが確認されています
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (推奨)
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (推奨)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@ -13,7 +13,7 @@ recomendações para seleção de modelos. Nossos resultados de benchmarking mai
Com base nessas descobertas e feedback da comunidade, os seguintes modelos foram verificados e funcionam razoavelmente bem com o OpenHands:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recomendado)
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recomendado)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@ -12,7 +12,7 @@ OpenHands 可以连接到任何 LiteLLM 支持的 LLM。但是它需要一个
基于这些发现和社区反馈,以下模型已被验证可以与 OpenHands 合理地配合使用:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api)(推荐)
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api)(推荐)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)
- [openai/o3-mini](https://openai.com/index/openai-o3-mini/)

View File

@ -23,7 +23,7 @@ This command opens an interactive prompt where you can type tasks or commands an
1. Set the following environment variables in your terminal:
- `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes))
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219"`)
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-sonnet-4-20250514"`)
- `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`)
2. Run the following command:

View File

@ -23,7 +23,7 @@ To run OpenHands in Headless mode with Docker:
1. Set the following environment variables in your terminal:
- `SANDBOX_VOLUMES` to specify the directory you want OpenHands to access ([See using SANDBOX_VOLUMES for more info](../runtimes/docker#using-sandbox_volumes))
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219"`)
- `LLM_MODEL` - the LLM model to use (e.g. `export LLM_MODEL="anthropic/claude-sonnet-4-20250514"`)
- `LLM_API_KEY` - your API key (e.g. `export LLM_API_KEY="sk_test_12345"`)
2. Run the following Docker command:

View File

@ -13,7 +13,7 @@ recommendations for model selection. Our latest benchmarking results can be foun
Based on these findings and community feedback, these are the latest models that have been verified to work reasonably well with OpenHands:
- [anthropic/claude-3-7-sonnet-20250219](https://www.anthropic.com/api) (recommended)
- [anthropic/claude-sonnet-4-20250514](https://www.anthropic.com/api) (recommended)
- [openai/o4-mini](https://openai.com/index/introducing-o3-and-o4-mini/)
- [gemini/gemini-2.5-pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/)
- [deepseek/deepseek-chat](https://api-docs.deepseek.com/)

View File

@ -57,7 +57,7 @@ def translate_content(content, target_lang):
system_prompt = f'You are a professional translator. Translate the following content into {target_lang}. Preserve all Markdown formatting, code blocks, and front matter. Keep any {{% jsx %}} tags and similar intact. Do not translate code examples, URLs, or technical terms.'
message = client.messages.create(
model='claude-3-7-sonnet-20250219',
model='claude-sonnet-4-20250514',
max_tokens=4096,
temperature=0,
system=system_prompt,

View File

@ -48,7 +48,7 @@ describe("Content", () => {
await waitFor(() => {
expect(provider).toHaveValue("Anthropic");
expect(model).toHaveValue("claude-3-7-sonnet-20250219");
expect(model).toHaveValue("claude-sonnet-4-20250514");
expect(apiKey).toHaveValue("");
expect(apiKey).toHaveProperty("placeholder", "");
@ -135,7 +135,7 @@ describe("Content", () => {
);
const condensor = screen.getByTestId("enable-memory-condenser-switch");
expect(model).toHaveValue("anthropic/claude-3-7-sonnet-20250219");
expect(model).toHaveValue("anthropic/claude-sonnet-4-20250514");
expect(baseUrl).toHaveValue("");
expect(apiKey).toHaveValue("");
expect(apiKey).toHaveProperty("placeholder", "");
@ -542,7 +542,7 @@ describe("Form submission", () => {
// select model
await userEvent.click(model);
const modelOption = screen.getByText("claude-3-7-sonnet-20250219");
const modelOption = screen.getByText("claude-sonnet-4-20250514");
await userEvent.click(modelOption);
const submitButton = screen.getByTestId("submit-button");
@ -550,7 +550,7 @@ describe("Form submission", () => {
expect(saveSettingsSpy).toHaveBeenCalledWith(
expect.objectContaining({
llm_model: "anthropic/claude-3-7-sonnet-20250219",
llm_model: "anthropic/claude-sonnet-4-20250514",
llm_base_url: "",
confirmation_mode: false,
}),

View File

@ -71,6 +71,18 @@ describe("extractModelAndProvider", () => {
separator: "/",
});
expect(extractModelAndProvider("claude-sonnet-4-20250514")).toEqual({
provider: "anthropic",
model: "claude-sonnet-4-20250514",
separator: "/",
});
expect(extractModelAndProvider("claude-opus-4-20250514")).toEqual({
provider: "anthropic",
model: "claude-opus-4-20250514",
separator: "/",
});
expect(extractModelAndProvider("claude-3-haiku-20240307")).toEqual({
provider: "anthropic",
model: "claude-3-haiku-20240307",

View File

@ -100,7 +100,7 @@ const openHandsHandlers = [
"gpt-4o",
"gpt-4o-mini",
"anthropic/claude-3.5",
"anthropic/claude-3-7-sonnet-20250219",
"anthropic/claude-sonnet-4-20250514",
]),
),

View File

@ -279,7 +279,7 @@ function LlmSettingsScreen() {
<ModelSelector
models={modelsAndProviders}
currentModel={
settings.LLM_MODEL || "anthropic/claude-3-5-sonnet-20241022"
settings.LLM_MODEL || "anthropic/claude-sonnet-4-20250514"
}
onChange={handleModelIsDirty}
/>
@ -342,9 +342,9 @@ function LlmSettingsScreen() {
name="llm-custom-model-input"
label={t(I18nKey.SETTINGS$CUSTOM_MODEL)}
defaultValue={
settings.LLM_MODEL || "anthropic/claude-3-7-sonnet-20250219"
settings.LLM_MODEL || "anthropic/claude-sonnet-4-20250514"
}
placeholder="anthropic/claude-3-7-sonnet-20250219"
placeholder="anthropic/claude-sonnet-4-20250514"
type="text"
className="w-[680px]"
onChange={handleCustomModelIsDirty}

View File

@ -3,7 +3,7 @@ import { Settings } from "#/types/settings";
export const LATEST_SETTINGS_VERSION = 5;
export const DEFAULT_SETTINGS: Settings = {
LLM_MODEL: "anthropic/claude-3-7-sonnet-20250219",
LLM_MODEL: "anthropic/claude-sonnet-4-20250514",
LLM_BASE_URL: "",
AGENT: "CodeActAgent",
LANGUAGE: "en",

View File

@ -6,6 +6,8 @@ export const VERIFIED_MODELS = [
"o4-mini-2025-04-16",
"claude-3-5-sonnet-20241022",
"claude-3-7-sonnet-20250219",
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
"deepseek-chat",
];
@ -39,4 +41,6 @@ export const VERIFIED_ANTHROPIC_MODELS = [
"claude-3-opus-20240229",
"claude-3-sonnet-20240229",
"claude-3-7-sonnet-20250219",
"claude-sonnet-4-20250514",
"claude-opus-4-20250514",
];

View File

@ -167,6 +167,8 @@ VERIFIED_ANTHROPIC_MODELS = [
'claude-3-opus-20240229',
'claude-3-sonnet-20240229',
'claude-3-7-sonnet-20250219',
'claude-sonnet-4-20250514',
'claude-opus-4-20250514',
]

View File

@ -47,7 +47,7 @@ class LLMConfig(BaseModel):
seed: The seed to use for the LLM.
"""
model: str = Field(default='claude-3-7-sonnet-20250219')
model: str = Field(default='claude-sonnet-4-20250514')
api_key: SecretStr | None = Field(default=None)
base_url: str | None = Field(default=None)
api_version: str | None = Field(default=None)

View File

@ -109,7 +109,7 @@ export GIT_USERNAME="your-gitlab-username" # Optional, defaults to token owner
# LLM configuration
export LLM_MODEL="anthropic/claude-3-7-sonnet-20250219" # Recommended
export LLM_MODEL="anthropic/claude-sonnet-4-20250514" # Recommended
export LLM_API_KEY="your-llm-api-key"
export LLM_BASE_URL="your-api-url" # Optional, for API proxies
```

View File

@ -24,7 +24,7 @@ jobs:
macro: ${{ vars.OPENHANDS_MACRO || '@openhands-agent' }}
max_iterations: ${{ fromJson(vars.OPENHANDS_MAX_ITER || 50) }}
base_container_image: ${{ vars.OPENHANDS_BASE_CONTAINER_IMAGE || '' }}
LLM_MODEL: ${{ vars.LLM_MODEL || 'anthropic/claude-3-7-sonnet-20250219' }}
LLM_MODEL: ${{ vars.LLM_MODEL || 'anthropic/claude-sonnet-4-20250514' }}
target_branch: ${{ vars.TARGET_BRANCH || 'main' }}
runner: ${{ vars.TARGET_RUNNER }}
secrets:

View File

@ -354,11 +354,11 @@ class TestModelAndProviderFunctions:
assert result['separator'] == '/'
def test_extract_model_and_provider_anthropic_implicit(self):
model = 'claude-3-7-sonnet-20250219'
model = 'claude-sonnet-4-20250514'
result = extract_model_and_provider(model)
assert result['provider'] == 'anthropic'
assert result['model'] == 'claude-3-7-sonnet-20250219'
assert result['model'] == 'claude-sonnet-4-20250514'
assert result['separator'] == '/'
def test_extract_model_and_provider_versioned(self):
@ -380,7 +380,7 @@ class TestModelAndProviderFunctions:
def test_organize_models_and_providers(self):
models = [
'openai/gpt-4o',
'anthropic/claude-3-7-sonnet-20250219',
'anthropic/claude-sonnet-4-20250514',
'o3-mini',
'anthropic.claude-3-5', # Should be ignored as it uses dot separator for anthropic
'unknown-model',
@ -397,7 +397,7 @@ class TestModelAndProviderFunctions:
assert 'o3-mini' in result['openai']['models']
assert len(result['anthropic']['models']) == 1
assert 'claude-3-7-sonnet-20250219' in result['anthropic']['models']
assert 'claude-sonnet-4-20250514' in result['anthropic']['models']
assert len(result['other']['models']) == 1
assert 'unknown-model' in result['other']['models']