Title resolution pending

Mitchell L Gordon, Michelle S Lam, Joon Sung Park, Kayur Patel, Jeff Hancock, Tatsunori Hashimoto, Michael S Bernstein · 2022

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

"What Are You Really Trying to Do?": Co-Creating Life Goals from Everyday Computer Use

cs.HC · 2026-05-01 · unverdicted · novelty 7.0

A co-creation process for inferring and refining personal strivings from computer activity logs yields more representative goals and higher user agency than baselines in a 14-person week-long study.

Three Models of RLHF Annotation: Extension, Evidence, and Authority

cs.CY · 2026-04-28 · unverdicted · novelty 7.0

RLHF should decompose annotations into dimensions each matched to one of three models—extension, evidence, or authority—instead of applying a single unified pipeline.

MultEval: Supporting Collaborative Alignment for LLM-as-a-Judge Evaluation Criteria

cs.HC · 2026-04-29 · unverdicted · novelty 6.0

MultEval supports collaborative creation of LLM-as-a-judge criteria by surfacing disagreements via consensus-building methods, allowing iterative revisions with examples and history, and keeping transparent how human judgments become automated rules.

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards

cs.AI · 2026-04-23 · unverdicted · novelty 6.0

Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Who Defines "Best"? Towards Interactive, User-Defined Evaluation of LLM Leaderboards cs.AI · 2026-04-23 · unverdicted · none · ref 17
Analysis of the LMArena dataset reveals heavy topic skew and varying model rankings, leading to an interactive visualization tool for users to define custom evaluation priorities on LLM leaderboards.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer