pith. sign in

hub

Can Large Language Models Be an Alternative to Human Evaluations?

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

years

2026 14 2025 2

roles

background 4

representative citing papers

DECK: A Consistency x Confidence Taxonomy of LLM Hallucinations

cs.CL · 2026-06-01 · unverdicted · novelty 7.0

The DECK taxonomy partitions LLM hallucinations into four detectability regimes using consistency and confidence axes, mapping each to scorer families and identifying a universal blind spot for output-level uncertainty quantification on knowledge-gap inputs.

NARRA-Gym for Evaluating Interactive Narrative Agents

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that static story tests miss.

LLM Advertisement based on Neuron Auctions

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Neuron Auctions auction continuous neuron intervention budgets on brand-specific orthogonal subspaces in LLMs to achieve strategy-proof revenue optimization while penalizing user utility loss.

citing papers explorer

Showing 16 of 16 citing papers.