pith. sign in

arXiv preprint arXiv:2404.18824 , year=

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

clear filters

representative citing papers

Meta-Benchmarks for Financial-Services LLM Evaluation

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

A meta-benchmarking framework organizes 452 LLM benchmarks into 41 O*NET Generalized Work Activities and 38 BIAN domains, using discrimination-coverage-recency weights to scale K-factors in an Elo tournament for comparable financial-services scores.

Can AI Agents Synthesize Scientific Conclusions?

cs.AI · 2026-06-09 · unverdicted · novelty 7.0

A new benchmark and clean-room harness show frontier AI agents reach only 0.337 factual F1 when synthesizing conclusions from scientific evidence.

Uncertainty-based Debiasing and Unlearning for Decontamination

cs.CY · 2026-06-22 · unverdicted · novelty 6.0

UBD leverages ensemble uncertainty to estimate per-sample memorization and construct debiased targets for post-hoc correction or unlearning, yielding output distributions closer to uncontaminated models on MMLU-Pro and MATH-MCQA than baselines.

citing papers explorer

Showing 10 of 10 citing papers after filters.