Update README.md

2026-03-22 05:57:17 +08:00 · 2025-06-10 00:15:36 +08:00
parent 2fae29d101
commit 599c8bafcb
1 changed files with 5 additions and 5 deletions
--- a/README.md
+++ b/README.md
@@ -603,19 +603,19 @@ The web interface is built using Gradio and runs locally on your machine. No dat

 # 🧪 Experiments

-To reproduce OWL's GAIA benchmark score of 58.18:
-Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia58.18` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation.
+To reproduce OWL's GAIA benchmark score:
+Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia69` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation.

 When running the benchmark evaluation:

-1. Switch to the `gaia58.18` branch:
+1. Switch to the `gaia69` branch:
   ```bash
-   git checkout gaia58.18
+   git checkout gaia69
   ```

 2. Run the evaluation script:
   ```bash
-   python run_gaia_roleplaying.py
+   python run_gaia_workforce_claude.py
   ```

 This will execute the same configuration that achieved our top-ranking performance on the GAIA benchmark.