For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
Compares six meta-learners (Cox/RSF risk models paired with elastic net/RF CATE models) via simulations differing in hazard complexity and censoring, and releases the R package crsurvlearners.
citing papers explorer
-
Offline Evaluation Measures of Fairness in Recommender Systems
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.
-
A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings
Compares six meta-learners (Cox/RSF risk models paired with elastic net/RF CATE models) via simulations differing in hazard complexity and censoring, and releases the R package crsurvlearners.