QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
Compares six meta-learners (Cox/RSF risk models paired with elastic net/RF CATE models) via simulations differing in hazard complexity and censoring, and releases the R package crsurvlearners.
citing papers explorer
-
Offline Evaluation Measures of Fairness in Recommender Systems
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.