LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
Polylogarithmic-time deterministic network decomposition and distributed derandomization , booktitle =
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A uniform sample of size O(ε^{-2} log 1/ε) is a stable (ε, O(ε))-coreset for the geometric median with high probability, tight up to the log factor.
Introduces a distributed stochastic setting for graph optimization and supplies fast approximation algorithms for matching, vertex cover, and dominating set that surpass non-stochastic lower bounds.
Meta-theorems convert planar-graph α-approximation LOCAL algorithms for cuttable minimization problems into f(g)-round (3α+1)-approximations on bounded-genus graphs, yielding a (34+ε) approximation for MDS that improves prior bounds.
CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
Introduces approximation-preserving coresets that guarantee cost preservation for near-optimal solutions and proves that even tiny approximation-factor distortion forbids coresets of that size.
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
Introduces Grouped Memorization Evaluation and FedMemPrune to remove unique memorized information in federated unlearning while preserving overlapping knowledge.
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.
citing papers explorer
-
Pretraining Exposure Explains Popularity Judgments in Large Language Models
LLM popularity judgments align more closely with pretraining data exposure counts than with Wikipedia popularity, with stronger effects in pairwise comparisons and larger models.
-
Optimal Stable Coresets for Geometric Median via Uniform Sampling
A uniform sample of size O(ε^{-2} log 1/ε) is a stable (ε, O(ε))-coreset for the geometric median with high probability, tight up to the log factor.
-
Distributed Stochastic Graph Algorithms
Introduces a distributed stochastic setting for graph optimization and supplies fast approximation algorithms for matching, vertex cover, and dominating set that surpass non-stochastic lower bounds.
-
Meta-Theorems for Cuttable Distributed Problems
Meta-theorems convert planar-graph α-approximation LOCAL algorithms for cuttable minimization problems into f(g)-round (3α+1)-approximations on bounded-genus graphs, yielding a (34+ε) approximation for MDS that improves prior bounds.
-
Constraint-Data-Value-Maximization: Utilizing Data Attribution for Effective Data Pruning in Low-Data Environments
CDVM formulates data pruning as maximizing total data influence while constraining excessive contributions to any single test point, yielding robust performance on the OpenDataVal benchmark in low-data regimes.
-
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
LLMs display high variance and major accuracy drops on GSM-Symbolic variants of grade-school math problems, indicating they replicate training patterns rather than execute logical reasoning.
-
Approximation Preserving Coresets
Introduces approximation-preserving coresets that guarantee cost preservation for near-optimal solutions and proves that even tiny approximation-factor distortion forbids coresets of that size.
-
TabKDE: Simple and Scalable Tabular Data Generation with Kernel Density Estimates
TabKDE generates synthetic tabular data using copula transformations followed by kernel density estimation, matching prior accuracy with negligible training time and reduced storage via coresets.
-
Rethinking Federated Unlearning via the Lens of Memorization
Introduces Grouped Memorization Evaluation and FedMemPrune to remove unique memorized information in federated unlearning while preserving overlapping knowledge.
-
Mathematical Reasoning in Large Language Models: Benchmarks, Architectures, Evaluation, and Open Challenges
A literature survey synthesizing benchmarks, architectures, training strategies, and evaluation methods for mathematical reasoning in LLMs, based on roughly 120 papers.