{"total":11,"items":[{"citing_arxiv_id":"2605.22467","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SADGE: Structure and Appearance Domain Gap Estimation of Synthetic and Real Data","primary_cat":"cs.CV","submitted_at":"2026-05-21T13:27:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SADGE is a new fused similarity metric combining DINOv3 appearance and MASt3R geometry via constrained bilinear interaction that correlates with downstream synthetic-to-real performance at Pearson r=0.88 across multiple benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20151","ref_index":21,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"When Does Model Collapse Occur in Structured Interactive Learning?","primary_cat":"cs.LG","submitted_at":"2026-05-19T17:41:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Model collapse occurs in structured interactive learning if and only if the directed interaction graph satisfies a specific topological condition, with finite-sample guarantees for linear regression and asymptotic results for M-estimators.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19289","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"What Makes Synthetic Data Effective in Image Segmentation","primary_cat":"cs.CV","submitted_at":"2026-05-19T03:07:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Dense scene composition and instance fidelity in synthetic diffusion images drive better segmentation performance; SENSE framework exploits this to improve models on Cityscapes, COCO, and ADE20K.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17889","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CoX-MoE: Coalesced Expert Execution for High-Throughput MoE Inference with AMX-Enabled CPU-GPU Co-Execution","primary_cat":"cs.LG","submitted_at":"2026-05-18T05:54:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CoX-MoE achieves up to 7.1x higher throughput than FlexGen for MoE inference via coalesced expert execution and AMX-enabled CPU-GPU orchestration with static expert stratification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17558","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Firefly: Illuminating Large-Scale Verified Tool-Call Data Generation from Real APIs","primary_cat":"cs.SE","submitted_at":"2026-05-17T17:38:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"FireFly inverts task synthesis by exploring real MCP servers first via pairwise tool graphs and sub-DAG sampling, then generates 5,144 verified tasks backward from outcomes to train a 4B model that matches Claude Sonnet 4.6 on tool-calling benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08826","ref_index":6,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Fundamental Trade-Offs in Multi-Bit Watermarking of Stochastic Processes","primary_cat":"cs.IT","submitted_at":"2026-05-09T09:26:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Derives matched converse and achievability bounds that characterize optimal trade-offs among false-alarm probability, detection error probability, distortion, and information rate for multi-bit watermarking of stationary ergodic stochastic processes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Many modern systems generate data by sampling from stochastic processes specified by a known or learned generative distribution. This umbrella includes classical Markov [1] and autoregressive models, contemporary diffusion models [2], and large language models (LLMs) [3], and extends to protein/DNA sequence generators [4], randomized simulators [5], and synthetic data generation [6]. Within such stochastic systems, it is increasingly desirable to embed hidden information directly into the generation process so that each generated sample carries verifiable watermarks. The embedded watermarks can certify ownership by identifying the source model or provider, or establish provenance by recording the generation context or other traceable identifiers."},{"citing_arxiv_id":"2605.05752","ref_index":36,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Generative AI-Based Monte Carlo Simulation for Method Evaluation Using Synthetic Multilevel Data","primary_cat":"stat.ME","submitted_at":"2026-05-07T06:45:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18966","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Self-Improving Tabular Language Models via Iterative Reward-Guided Post-Training","primary_cat":"cs.LG","submitted_at":"2026-04-21T01:29:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TabGRAA applies group-relative advantage alignment in an iterative reward-guided post-training loop to improve tabular language model generators on fidelity, utility, and privacy trade-offs across five benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2508.07876","ref_index":58,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Stochastic dynamics learning with state-space systems","primary_cat":"stat.ML","submitted_at":"2025-08-11T11:49:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Establishes that fading memory and solution stability hold generically in state-space systems for reservoir computing even without the echo state property, with a distributional attractor perspective for stochastic cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.15087","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HopWeaver: Cross-Document Synthesis of High-Quality and Authentic Multi-Hop Questions","primary_cat":"cs.CL","submitted_at":"2025-05-21T04:14:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HopWeaver automatically synthesizes authentic bridge and comparison multi-hop questions from cross-document sources via a pipeline that identifies complementary documents and builds reasoning paths.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.01785","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms","primary_cat":"cs.LG","submitted_at":"2025-01-03T12:35:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DECAF synthetic data generator best balances privacy and fairness while fairness pre-processing improves outcomes more on synthetic data than real data, though at some cost to predictive accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}