diff --git a/README.md b/README.md index 6794add..a907d5e 100644 --- a/README.md +++ b/README.md @@ -249,9 +249,14 @@ The web interface is built using Gradio and runs locally on your machine. No dat # 🧪 Experiments -We provided a script to reproduce the results on GAIA. -You can check the `run_gaia_roleplaying.py` file and run the following command: +To reproduce OWL's GAIA benchmark score of 58.18: +1. Switch to the `gaia58.18` branch: +```bash +git checkout gaia58.18 +``` + +1. Run the evaluation script: ```bash python run_gaia_roleplaying.py ``` diff --git a/README_zh.md b/README_zh.md index 24850e3..7f1cd91 100644 --- a/README_zh.md +++ b/README_zh.md @@ -244,9 +244,15 @@ python run_app.py # 🧪 实验 -我们提供了一个脚本用于复现 GAIA 上的实验结果。 -你可以查看 `run_gaia_roleplaying.py` 文件,并运行以下命令: +我们提供了一个脚本用于复现 GAIA 上的实验结果。 +要复现我们在 GAIA 基准测试中获得的 58.18 分: +1. 切换到 `gaia58.18` 分支: +```bash +git checkout gaia58.18 +``` + +2. 运行评估脚本: ```bash python run_gaia_roleplaying.py ```