diff --git a/README.md b/README.md
index 7d1781e..d810ce9 100644
--- a/README.md
+++ b/README.md
@@ -289,7 +289,7 @@ I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./
 I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer.
 
 ```bash
-npm run eval ./src/evals/ego-questions.json
+npm run eval ./src/evals/ego-questions
 ```
 
 Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set.