feat: improve prompting

2026-03-22 15:39:06 +08:00 · 2025-02-10 12:10:46 +08:00
parent 18f0312c38
commit 441654ac5d
9 changed files with 177 additions and 31 deletions
--- a/README.md
+++ b/README.md
@@ -289,7 +289,7 @@ I kept the evaluation simple, LLM-as-a-judge and collect some [ego questions](./
 I mainly look at 3 things: total steps, total tokens, and the correctness of the final answer.

 ```bash
-npm run eval ./src/evals/ego-questions
+npm run eval ./src/evals/questions.json
 ```

 Here's the table comparing plain `gemini-2.0-flash` and `gemini-2.0-flash + node-deepresearch` on the ego set.