return`You are an expert at analyzing search and reasoning processes. Your task is to analyze the given sequence of steps and identify what went wrong in the search process.
system:`You are an expert at analyzing search and reasoning processes. Your task is to analyze the given sequence of steps and identify what went wrong in the search process.
<rules>
1. The sequence of actions taken
@@ -90,14 +90,13 @@ The answer is not definitive and fails to provide the requested information. La
]
}
</output>
</example>
Review the steps below carefully and generate your analysis following this format.
${diaryContext.join('\n')}
`;
</example>`,
user:`${diaryContext.join('\n')}`
}
}
constTOOL_NAME='errorAnalyzer';
exportasyncfunctionanalyzeSteps(
diaryContext: string[],
trackers: TrackerContext,
@@ -110,7 +109,8 @@ export async function analyzeSteps(
system:`You are an evaluator of answer definitiveness. Analyze if the given answer provides a definitive response or not.
<rules>
First, if the answer is not a direct response to the question, it must return false.
@@ -157,15 +160,16 @@ Evaluation: {
"think": "The answer provides concrete mathematical approaches to proving P ≠ NP without uncertainty markers, presenting definitive methods that could be used."
return`You are an evaluator that analyzes if answer content is likely outdated based on mentioned dates (or implied datetime) and current system time: ${currentTime}
system:`You are an evaluator that analyzes if answer content is likely outdated based on mentioned dates (or implied datetime) and current system time: ${currentTime}
4. **Source Reliability**: Pair freshness metrics with source credibility scores for better quality assessment.
5. **Domain Specificity**: Some specialized fields (medical research during pandemics, financial data during market volatility) may require dynamically adjusted thresholds.
6. **Geographic Relevance**: Regional considerations may alter freshness requirements for local regulations or events.
system:`You are an evaluator that determines if an answer addresses all explicitly mentioned aspects of a multi-aspect question.
<rules>
For questions with **explicitly** multiple aspects:
@@ -305,15 +311,17 @@ Aspects_Provided: "cycles de croissance, distribution des ravageurs, adaptations
Think: "La question demande explicitement les effets du changement climatique sur trois aspects: la production agricole, les écosystèmes marins et la santé publique dans les régions côtières. La réponse aborde la production agricole (en discutant des 'cycles de croissance', de la 'distribution des ravageurs' et des 'adaptations des pratiques de culture') et les écosystèmes marins (en couvrant 'l'acidification des océans', le 'réchauffement des eaux', le 'blanchissement des coraux', la 'migration des espèces marines' et la 'perturbation des chaînes alimentaires'). Cependant, elle omet complètement toute discussion sur les effets sur la santé publique dans les régions côtières, qui était explicitement demandée dans la question."
return`You are an evaluator that determines if a question requires freshness, plurality, and/or completeness checks in addition to the required definitiveness check.
system:`You are an evaluator that determines if a question requires freshness, plurality, and/or completeness checks in addition to the required definitiveness check.
<evaluation_types>
1. freshness - Checks if the question is time-sensitive or requires very recent information
@@ -393,9 +403,11 @@ function getQuestionEvaluationPrompt(question: string): string {
Question: "fam PLEASE help me calculate the eigenvalues of this 4x4 matrix ASAP!! [matrix details] got an exam tmrw 😭"
fam PLEASE help me calculate the eigenvalues of this 4x4 matrix ASAP!! [matrix details] got an exam tmrw 😭
<think>
This is a math question about eigenvalues which doesn't change over time, so I don't need fresh info. A 4x4 matrix has multiple eigenvalues, so I'll need to provide several results. The student just wants the eigenvalues calculated, not asking me to address multiple specific topics.
</think>
<output>
"think": "This is a math question about eigenvalues which doesn't change over time, so I don't need fresh info. A 4x4 matrix has multiple eigenvalues, so I'll need to provide several results. The student just wants the eigenvalues calculated, not asking me to address multiple specific topics.",
"needsFreshness": false,
"needsPlurality": true,
"needsCompleteness": false,
@@ -413,9 +427,11 @@ Question: "fam PLEASE help me calculate the eigenvalues of this 4x4 matrix ASAP!
</example-2>
<example-3>
Question: "Quelles sont les principales différences entre le romantisme et le réalisme dans la littérature du 19ème siècle?"
Quelles sont les principales différences entre le romantisme et le réalisme dans la littérature du 19ème siècle?
<output>
"think": "C'est une question sur l'histoire littéraire, donc je n'ai pas besoin d'informations récentes. Je dois comparer deux mouvements spécifiques: le romantisme et le réalisme. Ma réponse doit couvrir ces deux éléments, donc l'exhaustivité est importante ici. La pluralité n'est pas la priorité dans ce cas.",
<think>
C'est une question sur l'histoire littéraire, donc je n'ai pas besoin d'informations récentes. Je dois comparer deux mouvements spécifiques: le romantisme et le réalisme. Ma réponse doit couvrir ces deux éléments, donc l'exhaustivité est importante ici. La pluralité n'est pas la priorité dans ce cas.
</think>
"needsFreshness": false,
"needsPlurality": false,
"needsCompleteness": true,
@@ -423,9 +439,11 @@ Question: "Quelles sont les principales différences entre le romantisme et le r
Question: "What are the current interest rates for mortgage loans from Bank of America, Wells Fargo, and Chase Bank in the US?"
What are the current interest rates for mortgage loans from Bank of America, Wells Fargo, and Chase Bank in the US?
<think>
This is asking about 'current' interest rates, so I definitely need up-to-date info. The person wants rates from three specific banks: Bank of America, Wells Fargo, and Chase. I need to cover all three to properly answer, so addressing these specific elements is more important than providing multiple different answers.
</think>
<output>
"think": "This is asking about 'current' interest rates, so I definitely need up-to-date info. The person wants rates from three specific banks: Bank of America, Wells Fargo, and Chase. I need to cover all three to properly answer, so addressing these specific elements is more important than providing multiple different answers.",
"needsFreshness": true,
"needsPlurality": false,
"needsCompleteness": true,
@@ -443,9 +463,10 @@ Question: "What are the current interest rates for mortgage loans from Bank of A
Question: "Was sind die besten Strategien für nachhaltiges Investieren in der heutigen Wirtschaft?"
Was sind die besten Strategien für nachhaltiges Investieren in der heutigen Wirtschaft?
<think>
Hier geht's um Investieren in der 'heutigen Wirtschaft', also brauche ich aktuelle Informationen. Die Frage ist nach 'Strategien' im Plural gestellt, daher sollte ich mehrere Beispiele nennen. Es werden keine bestimmten Aspekte genannt, die ich alle behandeln muss - ich soll einfach verschiedene gute Strategien vorschlagen. Aktualität und mehrere Antworten sind hier wichtig.
</think>
<output>
"think": "Hier geht's um Investieren in der 'heutigen Wirtschaft', also brauche ich aktuelle Informationen. Die Frage ist nach 'Strategien' im Plural gestellt, daher sollte ich mehrere Beispiele nennen. Es werden keine bestimmten Aspekte genannt, die ich alle behandeln muss - ich soll einfach verschiedene gute Strategien vorschlagen. Aktualität und mehrere Antworten sind hier wichtig.",
"needsFreshness": true,
"needsPlurality": true,
"needsCompleteness": false,
@@ -463,20 +486,22 @@ Question: "Was sind die besten Strategien für nachhaltiges Investieren in der h
return`You are an expert search query generator with deep psychological understanding. You optimize user queries by extensively analyzing potential user intents and generating comprehensive search variations.
return{system:`You are an expert search query generator with deep psychological understanding. You optimize user queries by extensively analyzing potential user intents and generating comprehensive search variations.
needsFreshness: z.boolean().describe('If the question requires freshness check'),
needsPlurality: z.boolean().describe('If the question requires plurality check'),
needsCompleteness: z.boolean().describe('If the question requires completeness check'),
think: z.string().describe(`A very concise explain of why you choose those checks are needed. ${this.getLanguagePrompt()}`).max(500),
think: z.string().describe(`A very concise explain of why those checks are needed. ${this.getLanguagePrompt()}`).max(500),
});
}
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.