Title resolution pending

Ziad Obermeyer, Brian Powers, Christine Vogeli, Sendhil Mullainathan · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.

Beyond Semantic Similarity: A Component-Wise Evaluation Framework for Medical Question Answering Systems with Health Equity Implications

cs.HC · 2026-04-21 · unverdicted · novelty 6.0

VB-Score shows three major LLMs have severe failures in medical entity recognition and factual consistency, with 13.8% lower performance on chronic conditions affecting older and minority groups, indicating condition-based algorithmic discrimination.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards cs.AI · 2026-04-23 · unverdicted · none · ref 40
Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer