The final reported score is the average value of these scores

Digital Assistant Routing- The evaluation of model predictions for this dataset is done by direct comparison to the ground truth, leading a score of zero or one

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading

cs.AI · 2026-04-30 · unverdicted · novelty 6.0

Prompt optimization per model substantially alters LLM rankings on both public and internal benchmarks compared to using fixed unoptimized prompts.

citing papers explorer

Showing 1 of 1 citing paper.

Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading cs.AI · 2026-04-30 · unverdicted · none · ref 4
Prompt optimization per model substantially alters LLM rankings on both public and internal benchmarks compared to using fixed unoptimized prompts.

The final reported score is the average value of these scores

fields

years

verdicts

representative citing papers

citing papers explorer