mirror of
https://github.com/OpenHands/OpenHands.git
synced 2025-12-26 05:48:36 +08:00
* Add detailed tutorial for adding new evaluation benchmarks * update tutorial, fix typo, and log observation to the cmdline * fix url * Update evaluation/TUTORIAL.md * Update evaluation/TUTORIAL.md * Update evaluation/TUTORIAL.md * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * Update evaluation/TUTORIAL.md Co-authored-by: Graham Neubig <neubig@gmail.com> * simplify readme and add comments to the actual code * Fix typo in evaluation/TUTORIAL.md * Fix typo in evaluation/swe_bench/run_infer.py * Fix another typo in evaluation/swe_bench/run_infer.py * Update TUTORIAL.md * Set host net work to false for SWEBench * Update evaluation/TUTORIAL.md Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update evaluation/TUTORIAL.md Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update evaluation/TUTORIAL.md Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * Update evaluation/TUTORIAL.md Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> --------- Co-authored-by: OpenDevin <opendevin@opendevin.ai> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
22 lines
714 B
Bash
Executable File
22 lines
714 B
Bash
Executable File
#!/bin/bash
|
|
|
|
AGENT=CodeActAgent
|
|
# IMPORTANT: Because Agent's prompt changes fairly often in the rapidly evolving codebase of OpenDevin
|
|
# We need to track the version of Agent in the evaluation to make sure results are comparable
|
|
AGENT_VERSION=v$(python3 -c "from agenthub.codeact_agent import CodeActAgent; print(CodeActAgent.VERSION)")
|
|
MODEL_CONFIG=$1
|
|
|
|
echo "AGENT: $AGENT"
|
|
echo "AGENT_VERSION: $AGENT_VERSION"
|
|
echo "MODEL_CONFIG: $MODEL_CONFIG"
|
|
|
|
# You should add $MODEL_CONFIG in your `config.toml`
|
|
|
|
poetry run python3 evaluation/swe_bench/run_infer.py \
|
|
--agent-cls $AGENT \
|
|
--llm-config $MODEL_CONFIG \
|
|
--max-iterations 50 \
|
|
--max-chars 10000000 \
|
|
--eval-num-workers 8 \
|
|
--eval-note $AGENT_VERSION
|