{"total":25,"items":[{"citing_arxiv_id":"2606.28057","ref_index":39,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MultiHashFormer: Hash-based Generative Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-26T13:03:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MultiHashFormer enables hash-based autoregression in LMs by encoding tokens as multi-hash signatures, outperforming standard Transformers at 100M-3B scales while keeping parameter count constant for multilingual expansion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26969","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Einstein World Models","primary_cat":"cs.AI","submitted_at":"2026-06-25T12:42:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Einstein World Models integrate visual rollouts from a callable world-module into LLM reasoning traces to support complex thought beyond language.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.18056","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ConSA: Controllable Sparsity in Hybrid Attention via Learnable Allocation","primary_cat":"cs.CL","submitted_at":"2026-06-16T15:33:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ConSA learns FA/SWA allocation via L0 masks and augmented Lagrangian constraints, outperforming rule-based baselines on 0.6B and 1.7B models with consistent layer patterns.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04612","ref_index":36,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hybrid Adversarial Defence for Natural Language Understanding Tasks","primary_cat":"cs.CL","submitted_at":"2026-06-03T08:49:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Hybrid entropy-uncertainty-geometric defence improves clean accuracy by up to 43% and adversarial robustness by up to 65% on NLU and security benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.18607","ref_index":97,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Forecasting Downstream Performance of LLMs With Proxy Metrics","primary_cat":"cs.CL","submitted_at":"2026-05-18T16:17:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.15768","ref_index":5,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ALSO: Adversarial Online Strategy Optimization for Social Agents","primary_cat":"cs.AI","submitted_at":"2026-05-15T09:25:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ALSO frames social agent interactions as an adversarial bandit problem with a neural reward predictor to enable online strategy optimization in non-stationary multi-agent simulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10793","ref_index":35,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ConQuR: Corner Aligned Activation Quantization via Optimized Rotations for LLMs","primary_cat":"cs.LG","submitted_at":"2026-05-11T16:23:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ConQuR is a post-training rotation calibration technique that aligns activations to hypercube corners via Procrustes optimization and online updates, delivering competitive LLM quantization performance without end-to-end training or offline activation storage.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"token quantization, and, when applicable, the KV cache is quantized using asymmetric quantization with group size 128. Additional implementation details are provided in Appendix C. We report perplexity (PPL) on WikiText-2 [29], Penn Treebank (PTB) [30], and C4 [31]. We also evaluate common-sense reasoning performance on nine downstream benchmarks: WinoGrande [32], SocialIQA [33], LAMBADA [34], MMLU [35], ARC-Easy, ARC-Challenge [36], HellaSwag [37], OpenBookQA [38], and PIQA [39]. 6 0.0 0.2 0.4 0.6 0.8 1.0Cumulative Probability All Layers Layer 7 Layer 18 Layer 29 0.0 0.2 0.4 0.6 0.8 1.0 Normalized PR 0.0 0.2 0.4 0.6 0.8 1.0Cumulative Probability 0.0 0.2 0.4 0.6 0.8 1.0 Normalized PR 0.0 0.2 0.4 0.6 0.8 1.0 Normalized PR 0.0 0.2 0.4 0.6 0.8 1.0"},{"citing_arxiv_id":"2605.08636","ref_index":22,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"EdgeFlowerTune: Evaluating Federated LLM Fine-Tuning Under Realistic Edge System Constraints","primary_cat":"cs.CL","submitted_at":"2026-05-09T03:02:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"EdgeFlowerTune is a real-device benchmark that jointly assesses model quality and system costs for federated LLM fine-tuning on edge hardware using three protocols: Quality-under-Budget, Cost-to-Target, and Robustness.","context_count":1,"top_context_role":"dataset","top_context_polarity":"use_dataset","context_text":"Chooseevaluates whether a model can select the most appropriate option from multiple candidates. This category reflects scenarios where an edge assistant or local controller needs to choose among candidate replies, recommendations, actions, or explanations based on contextual information. We instantiate this category with PIQA [ 3], HellaSwag [ 31], and SocialIQA [ 22], covering physical commonsense, event continuation, and social commonsense. Reasonevaluates whether a model can infer the correct answer from contextual clues and com- monsense knowledge. This category reflects scenarios where a user asks about local content, a device explains its current state, or reasons over event descriptions. We instantiate this category"},{"citing_arxiv_id":"2605.06856","ref_index":207,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility","primary_cat":"cs.LG","submitted_at":"2026-05-07T18:56:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05971","ref_index":50,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Training Transformers for KV Cache Compressibility","primary_cat":"cs.LG","submitted_at":"2026-05-07T10:17:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Training transformers with KV sparsification during continued pretraining produces representations that admit better post-hoc KV cache compression, improving quality under memory budgets for long-context tasks.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"Commonsense reasoning about social interactions. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4463-4473, Hong Kong, China, 2019. Association for Computational Linguistics. doi: 10.18653/v1/D19-1454. URL https: //aclanthology.org/D19-1454/. [50] Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, and Song Han. QUEST: Query-aware sparsity for efficient long-context LLM inference. InProceedings of the 41st In- ternational Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 47901-47911, 2024. [51] Qwen Team. Qwen2.5: A party of foundation models, September 2024."},{"citing_arxiv_id":"2605.01046","ref_index":27,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Learning in the Fisher Subspace: A Guided Initialization for LoRA Fine-Tuning","primary_cat":"cs.LG","submitted_at":"2026-05-01T19:20:25+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25578","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling","primary_cat":"cs.CL","submitted_at":"2026-04-28T12:45:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Marco-MoE delivers open multilingual MoE models with 5% activation sparsity that outperform similarly sized dense models on English and multilingual benchmarks through efficient upcycling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21901","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"GiVA: Gradient-Informed Bases for Vector-Based Adaptation","primary_cat":"cs.CL","submitted_at":"2026-04-23T17:48:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"GiVA uses gradients to initialize vector adapters so they match LoRA performance at eight times lower rank while keeping extreme parameter efficiency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21100","ref_index":69,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences","primary_cat":"cs.LG","submitted_at":"2026-04-22T21:38:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Preconditioned delta-rule models with a diagonal curvature approximation improve upon standard DeltaNet, GDN, and KDA by better approximating the test-time regression objective.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11080","ref_index":9,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation","primary_cat":"cs.CV","submitted_at":"2026-04-13T07:00:26+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08016","ref_index":85,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs","primary_cat":"cs.AI","submitted_at":"2026-04-09T09:16:00+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"The paper delivers the first survey of abductive reasoning in LLMs, a unified two-stage taxonomy, a compact benchmark, and an analysis of gaps relative to deductive and inductive reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06291","ref_index":30,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-07T14:57:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.02764","ref_index":64,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models","primary_cat":"cs.CL","submitted_at":"2025-12-02T13:44:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PEFT-Factory supplies a ready-to-use, extensible codebase that unifies 19 PEFT methods and evaluation pipelines for fine-tuning large autoregressive language models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.21285","ref_index":50,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"PEFT-Bench: A Parameter-Efficient Fine-Tuning Methods Benchmark","primary_cat":"cs.CL","submitted_at":"2025-11-26T11:18:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEFT-Bench is a standardized end-to-end benchmark for 7 PEFT methods across 27 NLP datasets on autoregressive LLMs, accompanied by the PSCP metric that penalizes based on trainable parameters, inference speed, and training memory.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2511.08565","ref_index":25,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models","primary_cat":"cs.CL","submitted_at":"2025-11-11T18:47:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM moral robustness under persona role-play is largely determined by model family with Claude models most consistent, while susceptibility shows little family dependence.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"eliciting, and enhancing role-playing abilities of large language models, 2023. URL https: //arxiv.org/abs/2310.00746. [24] Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, and Yanghua Xiao. Character is destiny: Can role-playing language agents make persona-driven decisions?, 2024. URLhttps://arxiv.org/abs/2404.12138. [25] Pengfei Yu, Dongming Shen, Silin Meng, Jaewon Lee, Weisu Yin, Andrea Yaoyun Cui, Zhenlin Xu, Yi Zhu, Xingjian Shi, Mu Li, and Alex Smola. Rpgbench: Evaluating large language models as role-playing game engines, 2025. URLhttps://arxiv.org/abs/2502.00595. [26] Xuhui Zhou et al. Sotopia: Interactive evaluation for social intelligence in language agents."},{"citing_arxiv_id":"2501.00663","ref_index":93,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Titans: Learning to Memorize at Test Time","primary_cat":"cs.LG","submitted_at":"2024-12-31T22:32:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Hong Kong, China: Association for Computational Linguistics, Nov. 2019, pp. 4463-4473. doi: 10.18653/v1/D19-1454. url: https://aclanthology.org/D19-1454/. [92] Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. \"Linear transformers are secretly fast weight programmers\". In: International Conference on Machine Learning . PMLR. 2021, pp. 9355-9366. [93] JH Schmidhuber. \"Learning to control fast-weight memories: An alternative to recurrent nets. Accepted for publication in\". In: Neural Computation (1992). [94] Jürgen Schmidhuber. \"Reducing the ratio between learning complexity and number of time varying variables in fully recurrent nets\". In: ICANN'93: Proceedings of the International Conference on Artificial Neural Networks"},{"citing_arxiv_id":"2412.06464","ref_index":35,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Gated Delta Networks: Improving Mamba2 with Delta Rule","primary_cat":"cs.CL","submitted_at":"2024-12-09T13:09:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Gated DeltaNet integrates gating and delta rules into linear transformers, outperforming Mamba2 and DeltaNet on language modeling, reasoning, retrieval, and long-context tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.17891","ref_index":174,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Scaling Diffusion Language Models via Adaptation from Autoregressive Models","primary_cat":"cs.CL","submitted_at":"2024-10-23T14:04:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Adapting autoregressive models via continual pre-training yields diffusion language models from 127M to 7B parameters that outperform prior diffusion models and compete with their autoregressive counterparts on language, reasoning, and commonsense benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.11794","ref_index":161,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"DataComp-LM: In search of the next generation of training sets for language models","primary_cat":"cs.LG","submitted_at":"2024-06-17T17:42:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"Patwary, Mohammad Shoeybi, and Bryan Catanzaro. Nemotron-cc: Transforming common crawl into a refined long-horizon pretraining dataset. arXiv preprint arXiv:2412.02595, 2024. [160] Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding. ArXiv preprint, abs/2104.09864, 2021. URLhttps://arxiv.org/abs/2104.09864. [161] Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149- 4158, Minneapolis, Minnesota, 2019."},{"citing_arxiv_id":"2305.14233","ref_index":85,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Enhancing Chat Language Models by Scaling High-quality Instructional Conversations","primary_cat":"cs.CL","submitted_at":"2023-05-23T16:49:14+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}