mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 05:48:36 +08:00
Update SWE-Bench README.md about RemoteRuntime (#7108)
This commit is contained in:
parent
c76a659cde
commit
4be33a079b
@ -56,9 +56,10 @@ You can update the arguments in the script
|
||||
./evaluation/benchmarks/aider_bench/scripts/run_infer.sh eval_gpt35_turbo HEAD CodeActAgent 100 1 "1,3,10"
|
||||
```
|
||||
|
||||
### Run Inference on `RemoteRuntime` (experimental)
|
||||
### Run Inference on `RemoteRuntime`
|
||||
|
||||
This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
|
||||
|
||||
This is in limited beta. Contact Xingyao over slack if you want to try this out!
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/aider_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [eval-num-workers] [eval_ids]
|
||||
|
||||
@ -58,9 +58,10 @@ then your command would be:
|
||||
./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh lite llm.eval_sonnet HEAD CodeActAgent 10 30 1 wentingzhao/commit0_combined test
|
||||
```
|
||||
|
||||
### Run Inference on `RemoteRuntime` (experimental)
|
||||
### Run Inference on `RemoteRuntime`
|
||||
|
||||
This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
|
||||
|
||||
This is in limited beta. Contact Xingyao over slack if you want to try this out!
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh [repo_split] [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
|
||||
|
||||
@ -19,9 +19,10 @@ Access with browser the above MiniWoB URLs and see if they load correctly.
|
||||
./evaluation/benchmarks/miniwob/scripts/run_infer.sh llm.claude-35-sonnet-eval
|
||||
```
|
||||
|
||||
### Run Inference on `RemoteRuntime` (experimental)
|
||||
### Run Inference on `RemoteRuntime`
|
||||
|
||||
This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
|
||||
|
||||
This is in limited beta. Contact Xingyao over slack if you want to try this out!
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/miniwob/scripts/run_infer.sh [model_config] [git-version] [agent] [note] [eval_limit] [num_workers]
|
||||
|
||||
@ -65,9 +65,9 @@ then your command would be:
|
||||
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
|
||||
```
|
||||
|
||||
### Run Inference on `RemoteRuntime` (experimental)
|
||||
### Run Inference on `RemoteRuntime`
|
||||
|
||||
This is in limited beta. Contact Xingyao over slack if you want to try this out!
|
||||
This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
|
||||
@ -163,9 +163,9 @@ The final results will be saved to `evaluation/evaluation_outputs/outputs/swe_be
|
||||
- `report.json`: a JSON file that contains keys like `"resolved_ids"` pointing to instance IDs that are resolved by the agent.
|
||||
- `logs/`: a directory of test logs
|
||||
|
||||
### Run evaluation with `RemoteRuntime` (experimental)
|
||||
### Run evaluation with `RemoteRuntime`
|
||||
|
||||
This is in limited beta. Contact Xingyao over slack if you want to try this out!
|
||||
This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!
|
||||
|
||||
```bash
|
||||
./evaluation/benchmarks/swe_bench/scripts/eval_infer_remote.sh [output.jsonl filepath] [num_workers]
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user