{"total":10,"items":[{"citing_arxiv_id":"2605.13681","ref_index":20,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Sampling from Flow Language Models via Marginal-Conditioned Bridges","primary_cat":"cs.LG","submitted_at":"2026-05-13T15:38:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Marginal-conditioned bridges enable training-free sampling from Flow Language Models by drawing clean one-hot endpoints from factorized posteriors and using Ornstein-Uhlenbeck bridges, preserving token marginals and reducing denoising error versus conditional-mean bridges.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12940","ref_index":34,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-13T03:22:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Probabilistic circuits have an output bottleneck with convex probability combinations and a context bottleneck limited to fixed vtree-aligned partitions, making them less expressive than transformers for language data with heterogeneous dependencies, though decomposable PCs are strictly more capable","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11317","ref_index":39,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"SOMA: Efficient Multi-turn LLM Serving via Small Language Model","primary_cat":"cs.CL","submitted_at":"2026-05-11T23:07:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09271","ref_index":2,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding","primary_cat":"cs.AI","submitted_at":"2026-05-10T02:42:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Advanced language representations shape LLMs' schemas to improve knowledge activation and problem-solving.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"iments showing that LLM performance and its internal feature activations vary under different language representations of the same underlying task. Together, these findings highlight language representation design as a promising direction for future research. 1 Introduction \"The limits of my language mean the limits of my world. \" -Ludwig Wittgenstein [1] Large Language Models (LLMs) [ 2, 3, 4, 5, 6] have emerged as the dominant paradigm in con- temporary artificial intelligence [2], largely driven by the empirical success of scaling model size and training data [7, 8]. While this scaling strategy enables LLMs to internalize vast amounts of knowledge within their parameters [ 9, 10], the mere presence of knowledge does not guarantee"},{"citing_arxiv_id":"2605.08734","ref_index":26,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"AdaPreLoRA: Adafactor Preconditioned Low-Rank Adaptation","primary_cat":"cs.LG","submitted_at":"2026-05-09T06:37:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AdaPreLoRA pairs the Adafactor diagonal Kronecker preconditioner on the full weight matrix with a closed-form factor-space solve that selects the update minimizing an H_t-weighted imbalance, yielding competitive results on GPT-2, Mistral-7B, Qwen2-7B and diffusion personalization tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10971","ref_index":27,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-08T18:52:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Adaptive scheduling of interventions in discrete diffusion language models, timed to attribute-specific commitment schedules discovered with sparse autoencoders, delivers precise multi-attribute steering up to 93% strength while preserving generation quality.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07199","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Three-in-One World Model: Energy-Based Consistency, Prediction, and Counterfactual Inference for Marketing Intervention","primary_cat":"cs.AI","submitted_at":"2026-05-08T03:47:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A DBM-based architecture learns consumer beliefs to enable consistent prediction and counterfactual inference for marketing interventions, outperforming baselines on heterogeneous treatment effects in simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.01833","ref_index":44,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks","primary_cat":"cs.CV","submitted_at":"2026-04-02T09:46:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.10465","ref_index":40,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Superposition Yields Robust Neural Scaling","primary_cat":"cs.LG","submitted_at":"2025-05-15T16:18:13+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.09992","ref_index":3,"ref_count":1,"confidence":0.55,"is_internal_anchor":false,"paper_title":"Large Language Diffusion Models","primary_cat":"cs.CL","submitted_at":"2025-02-14T08:23:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"LLaDA is a scalable diffusion-based language model that matches autoregressive LLMs like LLaMA3 8B on tasks and surpasses GPT-4o on reversal poem completion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}