diff --git a/README.md b/README.md index 7d1781e..d810ce9 100644 --- a/README.md +++ b/README.md @@ -289,7 +289,7 @@ I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./ I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer. ```bash -npm run eval ./src/evals/ego-questions.json +npm run eval ./src/evals/ego-questions ``` Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set.