JECS aggregates per-model conformal p-values via their maximum and reconstructs a conservative envelope of the max-p null distribution to select benchmarks with global contamination rate control.
arXiv preprint arXiv:2309.10677 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
5
Pith papers citing it
citation-role summary
background 1
other 1
citation-polarity summary
verdicts
UNVERDICTED 5representative citing papers
A new paired-prompt protocol reveals alignment-pipeline-specific heterogeneity in how open-weight LLMs respond to evaluation versus deployment framings.
First unified survey formalizing Pretraining Data Exposure across exposure levels and reviewing attack, defense, and contamination methods for LLMs.
A majorization-minimization framework turns IRT into scalable matrix factorization subproblems for LLM evaluation, delivering orders-of-magnitude speedups with identifiability guarantees.
A survey reviewing benchmark data contamination in LLMs, its impact on evaluation, and alternative assessment approaches.
citing papers explorer
No citing papers match the current filters.