mirror of https://github.com/OpenHands/OpenHands.git synced 2026-03-22 05:37:20 +08:00

Files

Aaron Sequeira 4c0f0a1e9b feat: Support Tau-Bench and BFCL evaluation benchmarks (#11953 )

Co-authored-by: openhands <openhands@all-hands.dev>

2025-12-31 03:12:50 +00:00

Tau-Bench Evaluation

This directory contains the evaluation scripts for Tau-Bench.

Setup

First, make sure you have installed the tau-bench package:

pip install tau-bench

To run the evaluation, use the following command:

python evaluation/benchmarks/tau_bench/run_infer.py \
  --agent-cls CodeActAgent \
  --llm-config <your_llm_config> \
  --env retail