From 49979497d745fc5cc690f183402805551188eb27 Mon Sep 17 00:00:00 2001 From: lazychih114 <55657767+Aaron617@users.noreply.github.com> Date: Mon, 10 Mar 2025 07:52:07 +0800 Subject: [PATCH] update gaia --- README.md | 9 +++++++-- README_zh.md | 10 ++++++++-- 2 files changed, 15 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index adb70af..b3419c6 100644 --- a/README.md +++ b/README.md @@ -172,9 +172,14 @@ Example tasks you can try: # 🧪 Experiments -We provided a script to reproduce the results on GAIA. -You can check the `run_gaia_roleplaying.py` file and run the following command: +To reproduce OWL's GAIA benchmark score of 58.18: +1. Switch to the `gaia58.18` branch: +```bash +git checkout gaia58.18 +``` + +1. Run the evaluation script: ```bash python run_gaia_roleplaying.py ``` diff --git a/README_zh.md b/README_zh.md index e26338e..64af309 100644 --- a/README_zh.md +++ b/README_zh.md @@ -164,9 +164,15 @@ logger.success(f"Answer: {answer}") - "总结这篇研究论文的主要观点:[论文URL]" # 🧪 实验 -我们提供了一个脚本用于复现 GAIA 上的实验结果。 -你可以查看 `run_gaia_roleplaying.py` 文件,并运行以下命令: +我们提供了一个脚本用于复现 GAIA 上的实验结果。 +要复现我们在 GAIA 基准测试中获得的 58.18 分: +1. 切换到 `gaia58.18` 分支: +```bash +git checkout gaia58.18 +``` + +2. 运行评估脚本: ```bash python run_gaia_roleplaying.py ```