Update README.md

This commit is contained in:
lazychih114
2025-06-10 00:15:36 +08:00
parent 2fae29d101
commit 599c8bafcb

View File

@@ -603,19 +603,19 @@ The web interface is built using Gradio and runs locally on your machine. No dat
# 🧪 Experiments # 🧪 Experiments
To reproduce OWL's GAIA benchmark score of 58.18: To reproduce OWL's GAIA benchmark score:
Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia58.18` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation. Furthermore, to ensure optimal performance on the GAIA benchmark, please note that our `gaia69` branch includes a customized version of the CAMEL framework in the `owl/camel` directory. This version contains enhanced toolkits with improved stability for gaia benchmark compared to the standard CAMEL installation.
When running the benchmark evaluation: When running the benchmark evaluation:
1. Switch to the `gaia58.18` branch: 1. Switch to the `gaia69` branch:
```bash ```bash
git checkout gaia58.18 git checkout gaia69
``` ```
2. Run the evaluation script: 2. Run the evaluation script:
```bash ```bash
python run_gaia_roleplaying.py python run_gaia_workforce_claude.py
``` ```
This will execute the same configuration that achieved our top-ranking performance on the GAIA benchmark. This will execute the same configuration that achieved our top-ranking performance on the GAIA benchmark.