Boxuan Li 6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
* Time travel for evaluation

* Fix source script path

* Exit script if given version doesn't exist

* Exit on failure

* Update README

* Change scripts of all other benchmarks

* Modify README files

* Fix logic_reasoning README
2024-06-16 10:25:14 -04:00
..