Augmenting LLM search judges with historical QRI cards improves Spearman correlation with user preferences by ~5% overall (91% relative on disagreements) and 15% in multilingual settings, with better alignment to live A/B test outcomes.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.IR 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Eye-tracking study shows F-pattern and examination hypothesis from web search do not hold in carousel interfaces; users follow an L-pattern on clicks, ignore headings, and examination does not predict clicks as assumed.
citing papers explorer
-
As It Was: Aligning LLM Search Evaluation with Historical User Preferences
Augmenting LLM search judges with historical QRI cards improves Spearman correlation with user preferences by ~5% overall (91% relative on disagreements) and 15% in multilingual settings, with better alignment to live A/B test outcomes.
-
Following the Eye-Tracking Evidence: Established Web-Search Assumptions Fail in Carousel Interfaces
Eye-tracking study shows F-pattern and examination hypothesis from web search do not hold in carousel interfaces; users follow an L-pattern on clicks, ignore headings, and examination does not predict clicks as assumed.