Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.
hub
InProceedings of the 18th Confer- ence of the European Chapter of the Association for Computational Linguistics, pages 139–151
11 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 11roles
background 1polarities
background 1representative citing papers
HopWeaver automatically synthesizes authentic bridge and comparison multi-hop questions from cross-document sources via a pipeline that identifies complementary documents and builds reasoning paths.
SADGE is a new fused similarity metric combining DINOv3 appearance and MASt3R geometry via constrained bilinear interaction that correlates with downstream synthetic-to-real performance at Pearson r=0.88 across multiple benchmarks.
Dense scene composition and instance fidelity in synthetic diffusion images drive better segmentation performance; SENSE framework exploits this to improve models on Cityscapes, COCO, and ADE20K.
FireFly inverts task synthesis by exploring real MCP servers first via pairwise tool graphs and sub-DAG sampling, then generates 5,144 verified tasks backward from outcomes to train a 4B model that matches Claude Sonnet 4.6 on tool-calling benchmarks.
A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.
Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.
CoX-MoE achieves up to 7.1x higher throughput than FlexGen for MoE inference via coalesced expert execution and AMX-enabled CPU-GPU orchestration with static expert stratification.
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.
citing papers explorer
-
When Does Model Collapse Occur in Structured Interactive Learning?
Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.
-
HopWeaver: Cross-Document Synthesis of High-Quality and Authentic Multi-Hop Questions
HopWeaver automatically synthesizes authentic bridge and comparison multi-hop questions from cross-document sources via a pipeline that identifies complementary documents and builds reasoning paths.
-
SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data
SADGE is a new fused similarity metric combining DINOv3 appearance and MASt3R geometry via constrained bilinear interaction that correlates with downstream synthetic-to-real performance at Pearson r=0.88 across multiple benchmarks.
-
What Makes Synthetic Data Effective in Image Segmentation
Dense scene composition and instance fidelity in synthetic diffusion images drive better segmentation performance; SENSE framework exploits this to improve models on Cityscapes, COCO, and ADE20K.
-
Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs
FireFly inverts task synthesis by exploring real MCP servers first via pairwise tool graphs and sub-DAG sampling, then generates 5,144 verified tasks backward from outcomes to train a 4B model that matches Claude Sonnet 4.6 on tool-calling benchmarks.
-
Generative AI-Based Monte Carlo Simulation for Method Evaluation Using Synthetic Multilevel Data
A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.
-
Stochastic dynamics learning with state-space systems
Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.
-
CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution
CoX-MoE achieves up to 7.1x higher throughput than FlexGen for MoE inference via coalesced expert execution and AMX-enabled CPU-GPU orchestration with static expert stratification.
-
Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes
Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.
-
Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training
TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.
-
Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms
DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.