Persistent 'Rock Tokens' in on-policy distillation resist teacher corrections, consume large gradient norms, yet add negligible value to reasoning, allowing targeted bypassing to streamline alignment.
From Generation to Judgment: Opportunities and Challenges of LLM -as-a-judge
7 Pith papers cite this work, alongside 44 external citations. Polarity classification is still indexing.
years
2026 7representative citing papers
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
ARA extracts workflow graphs from papers and scores reproducibility, reaching 61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.
A reasoning-distillation plus dual-reward GRPO method for multi-role dialogue summarization matches ROUGE and BERTScore baselines while improving factual faithfulness and preference alignment on CSDS and SAMSum.
citing papers explorer
-
Cornerstones or Stumbling Blocks? Deciphering the Rock Tokens in On-Policy Distillation
Persistent 'Rock Tokens' in on-policy distillation resist teacher corrections, consume large gradient norms, yet add negligible value to reasoning, allowing targeted bypassing to streamline alignment.
-
Logical Consistency as a Bridge: Improving LLM Hallucination Detection via Label Constraint Modeling between Responses and Self-Judgments
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
-
ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review
ARA extracts workflow graphs from papers and scores reproducibility, reaching 61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
-
Label Effects: Shared Heuristic Reliance in Trust Assessment by Humans and LLM-as-a-Judge
Both humans and LLMs trust content more when labeled human-authored than AI-generated, with LLMs showing denser attention to labels and higher uncertainty under AI labels, mirroring human heuristic patterns.
-
"Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programming
LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.
-
Beyond Overlap Metrics: Rewarding Reasoning and Preferences for Faithful Multi-Role Dialogue Summarization
A reasoning-distillation plus dual-reward GRPO method for multi-role dialogue summarization matches ROUGE and BERTScore baselines while improving factual faithfulness and preference alignment on CSDS and SAMSum.
- Estimating LLM Grading Ability and Response Difficulty in Automatic Short Answer Grading via Item Response Theory