update gaia (#123)

2026-03-22 05:57:17 +08:00 · 2025-03-10 07:52:27 +08:00
parent 1da380538d 49979497d7
commit 34d9384a58
2 changed files with 15 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -249,9 +249,14 @@ The web interface is built using Gradio and runs locally on your machine. No dat
 # 🧪 Experiments
-We provided a script to reproduce the results on GAIA. 
+To reproduce OWL's GAIA benchmark score of 58.18:
 You can check the `run_gaia_roleplaying.py` file and run the following command:
 1. Switch to the `gaia58.18` branch:
 ```bash
 git checkout gaia58.18
 ```
 1. Run the evaluation script:
 ```bash
 python run_gaia_roleplaying.py
 ```
--- a/README_zh.md
+++ b/README_zh.md
@@ -244,9 +244,15 @@ python run_app.py
 # 🧪 实验
-我们提供了一个脚本用于复现 GAIA 上的实验结果。  
+我们提供了一个脚本用于复现 GAIA 上的实验结果。
-你可以查看 `run_gaia_roleplaying.py` 文件，并运行以下命令：
+要复现我们在 GAIA 基准测试中获得的 58.18 分：
 1. 切换到 `gaia58.18` 分支：
 ```bash
 git checkout gaia58.18
 ```
 2. 运行评估脚本：
 ```bash
 python run_gaia_roleplaying.py
 ```