Update SWE-Bench README.md about RemoteRuntime (#7108)

2025-12-26 05:48:36 +08:00 · 2025-03-05 10:00:54 -05:00 · 2025-03-05 10:00:54 -05:00 · 4be33a079b
commit 4be33a079b
parent c76a659cde
4 changed files with 13 additions and 10 deletions
--- a/evaluation/benchmarks/aider_bench/README.md
+++ b/evaluation/benchmarks/aider_bench/README.md
@ -56,9 +56,10 @@ You can update the arguments in the script
 ./evaluation/benchmarks/aider_bench/scripts/run_infer.sh eval_gpt35_turbo HEAD CodeActAgent 100 1 "1,3,10"
 ```

-### Run Inference on `RemoteRuntime` (experimental)
+### Run Inference on `RemoteRuntime`
+
+This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!

-This is in limited beta. Contact Xingyao over slack if you want to try this out!

 ```bash
 ./evaluation/benchmarks/aider_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [eval-num-workers] [eval_ids]
--- a/evaluation/benchmarks/commit0_bench/README.md
+++ b/evaluation/benchmarks/commit0_bench/README.md
@ -58,9 +58,10 @@ then your command would be:
 ./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh lite llm.eval_sonnet HEAD CodeActAgent 10 30 1 wentingzhao/commit0_combined test
 ```

-### Run Inference on `RemoteRuntime` (experimental)
+### Run Inference on `RemoteRuntime`
+
+This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!

-This is in limited beta. Contact Xingyao over slack if you want to try this out!

 ```bash
 ./evaluation/benchmarks/commit0_bench/scripts/run_infer.sh [repo_split] [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
--- a/evaluation/benchmarks/miniwob/README.md
+++ b/evaluation/benchmarks/miniwob/README.md
@ -19,9 +19,10 @@ Access with browser the above MiniWoB URLs and see if they load correctly.
 ./evaluation/benchmarks/miniwob/scripts/run_infer.sh llm.claude-35-sonnet-eval
 ```

-### Run Inference on `RemoteRuntime` (experimental)
+### Run Inference on `RemoteRuntime`
+
+This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!

-This is in limited beta. Contact Xingyao over slack if you want to try this out!

 ```bash
 ./evaluation/benchmarks/miniwob/scripts/run_infer.sh [model_config] [git-version] [agent] [note] [eval_limit] [num_workers]
--- a/evaluation/benchmarks/swe_bench/README.md
+++ b/evaluation/benchmarks/swe_bench/README.md
@ -65,9 +65,9 @@ then your command would be:
 ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh llm.eval_gpt4_1106_preview HEAD CodeActAgent 10
 ```

-### Run Inference on `RemoteRuntime` (experimental)
+### Run Inference on `RemoteRuntime`

-This is in limited beta. Contact Xingyao over slack if you want to try this out!
+This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!

 ```bash
 ./evaluation/benchmarks/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
@ -163,9 +163,9 @@ The final results will be saved to `evaluation/evaluation_outputs/outputs/swe_be
 - `report.json`: a JSON file that contains keys like `"resolved_ids"` pointing to instance IDs that are resolved by the agent.
 - `logs/`: a directory of test logs

-### Run evaluation with `RemoteRuntime` (experimental)
+### Run evaluation with `RemoteRuntime`

-This is in limited beta. Contact Xingyao over slack if you want to try this out!
+This is in beta. Fill out [this form](https://docs.google.com/forms/d/e/1FAIpQLSckVz_JFwg2_mOxNZjCtr7aoBFI2Mwdan3f75J_TrdMS1JV2g/viewform) to apply if you want to try this out!

 ```bash
 ./evaluation/benchmarks/swe_bench/scripts/eval_infer_remote.sh [output.jsonl filepath] [num_workers]