Title resolution pending

The · 1945 · DOI 10.1093/biomet/33.3.239

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open at publisher browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

QVal: Cheaply Evaluating Dense Supervision Signals for Long-Horizon LLM Agents

cs.LG · 2026-06-30 · unverdicted · novelty 7.0

QVal is a new evaluation framework that directly measures dense supervision quality via Q-alignment to a reference policy, showing simple prompting baselines outperform 21 other methods across environments and models.

Agreement Metrics for LLM-as-Judge Evaluation: What to Report and Why

cs.CL · 2026-05-25 · conditional · novelty 6.0

For binary LLM judge validation, Pearson's r, Spearman's ρ, Kendall's τ_b, phi, and Matthews correlation all equal a single number on non-degenerate data, Cohen's κ supplies the extra signal on label-rate drift, and a reporting checklist is provided.

Offline Evaluation Measures of Fairness in Recommender Systems

cs.IR · 2026-04-27 · unverdicted · novelty 4.0

The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.

A Guide to Estimating Conditional Average Treatment Effects in Competing Risks Settings

stat.AP · 2026-06-08 · unverdicted · novelty 3.0

Compares six meta-learners (Cox/RSF risk models paired with elastic net/RF CATE models) via simulations differing in hazard complexity and censoring, and releases the R package crsurvlearners.

Verbalized Algorithms: Classical Algorithms are All You Need (Mostly)

cs.CL · 2025-09-09

citing papers explorer

Showing 1 of 1 citing paper after filters.

Offline Evaluation Measures of Fairness in Recommender Systems cs.IR · 2026-04-27 · unverdicted · none · ref 122
The thesis identifies theoretical, empirical, and conceptual flaws in offline fairness measures for recommender systems and contributes new evaluation methods and practical guidelines.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer