{"total":20,"items":[{"citing_arxiv_id":"2605.08423","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms","primary_cat":"cs.LG","submitted_at":"2026-05-08T19:32:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.06402","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SparseForge: Efficient Semi-Structured LLM Sparsification via Annealing of Hessian-Guided Soft-Mask","primary_cat":"cs.LG","submitted_at":"2026-05-07T15:11:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SparseForge achieves 57.27% zero-shot accuracy on LLaMA-2-7B at 2:4 sparsity using only 5B retraining tokens, beating the dense baseline and nearly matching a 40B-token SOTA method.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02285","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Complexity Horizons of Compressed Models in Analog Circuit Analysis","primary_cat":"cs.AI","submitted_at":"2026-05-04T07:19:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Prerequisite graphs map compressed LLM performance boundaries in analog circuit analysis to allow selecting the smallest viable model for a given task complexity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08885","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Uncertainty-Aware Transformers: Conformal Prediction for Language Models","primary_cat":"cs.LG","submitted_at":"2026-04-10T02:48:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CONFIDE applies conformal prediction to transformer embeddings for valid prediction sets, improving accuracy up to 4.09% and efficiency over baselines on models like BERT-tiny.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2501.14249","ref_index":57,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Humanity's Last Exam","primary_cat":"cs.LG","submitted_at":"2025-01-24T05:27:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Humanity's Last Exam is a new 2,500-question benchmark at the frontier of human knowledge where state-of-the-art LLMs show low accuracy.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Mmlu-pro: A more robust and challenging multi-task language understanding benchmark (published at neurips 2024 track datasets and benchmarks), 2024. URL https://arxiv.org/abs/2406.01574. [56] J. Wei, N. Karina, H. W. Chung, Y . J. Jiao, S. Papay, A. Glaese, J. Schulman, and W. Fedus. Measuring short-form factuality in large language models, 2024. URLhttps://arxiv.org/abs/2411.04368. [57] H. Wijk, T. Lin, J. Becker, S. Jawhar, N. Parikh, T. Broadley, L. Chan, M. Chen, J. Clymer, J. Dhyani, E. Ericheva, K. Garcia, B. Goodrich, N. Jurkovic, M. Kinniment, A. Lajko, S. Nix, L. Sato, W. Saunders, M. Taran, B. West, and E. Barnes. Re-bench: Evaluating frontier ai r&d capabilities of language model agents against human experts, 2024. URLhttps://arxiv."},{"citing_arxiv_id":"2403.14720","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Defending Against Indirect Prompt Injection Attacks With Spotlighting","primary_cat":"cs.CR","submitted_at":"2024-03-20T15:26:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Spotlighting prompt transformations cut indirect prompt injection success rates from >50% to <2% on GPT models while preserving task performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2309.14509","ref_index":147,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models","primary_cat":"cs.LG","submitted_at":"2023-09-25T20:15:57+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeepSpeed-Ulysses keeps communication volume constant for sequence-parallel attention when sequence length and device count scale together, delivering 2.5x faster training on 4x longer sequences than prior SOTA.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2307.08621","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Retentive Network: A Successor to Transformer for Large Language Models","primary_cat":"cs.CL","submitted_at":"2023-07-17T16:40:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2306.14824","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Kosmos-2: Grounding Multimodal Large Language Models to the World","primary_cat":"cs.CL","submitted_at":"2023-06-26T16:32:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Kosmos-2 grounds text to image regions by encoding refer expressions as Markdown links to sequences of location tokens and trains on a new GrIT dataset of grounded image-text pairs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.10403","ref_index":150,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"PaLM 2 Technical Report","primary_cat":"cs.CL","submitted_at":"2023-05-17T17:46:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"PaLM 2 reports state-of-the-art results on language, reasoning, and multilingual tasks with improved efficiency over PaLM.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2207.05221","ref_index":144,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Language Models (Mostly) Know What They Know","primary_cat":"cs.CL","submitted_at":"2022-07-11T22:59:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models show good calibration when asked to estimate the probability that their own answers are correct, with performance improving as models get larger.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.04359","ref_index":285,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Ethical and social risks of harm from Language Models","primary_cat":"cs.CL","submitted_at":"2021-12-08T16:09:48+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The authors provide a detailed taxonomy of 21 risks associated with language models, covering discrimination, information leaks, misinformation, malicious applications, interaction harms, and societal impacts like job loss and environmental costs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2112.00861","ref_index":86,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A General Language Assistant as a Laboratory for Alignment","primary_cat":"cs.CL","submitted_at":"2021-12-01T22:24:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Ranked preference modeling outperforms imitation learning for language model alignment and scales more favorably with model size.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2009.03300","ref_index":272,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Measuring Massive Multitask Language Understanding","primary_cat":"cs.CY","submitted_at":"2020-09-07T17:59:25+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"Introduces the MMLU benchmark of 57 tasks and shows that current models, including GPT-3, achieve low accuracy far below expert level across academic and professional domains.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2005.11401","ref_index":65,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks","primary_cat":"cs.CL","submitted_at":"2020-05-22T21:34:34+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAG models set new state-of-the-art results on open-domain QA by retrieving Wikipedia passages and conditioning a generative model on them, while also producing more factual text than parametric baselines.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2002.05202","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"GLU Variants Improve Transformer","primary_cat":"cs.LG","submitted_at":"2020-02-12T19:57:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Some GLU variants using non-sigmoid nonlinearities improve Transformer quality over ReLU and GELU in feed-forward sublayers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2001.08361","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Scaling Laws for Neural Language Models","primary_cat":"cs.LG","submitted_at":"2020-01-23T03:59:20+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1910.10683","ref_index":76,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer","primary_cat":"cs.LG","submitted_at":"2019-10-23T17:37:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colossal Clean Crawled Corpus.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1910.03771","ref_index":186,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HuggingFace's Transformers: State-of-the-art Natural Language Processing","primary_cat":"cs.CL","submitted_at":"2019-10-09T03:23:22+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Hugging Face releases an open-source Python library that supplies a unified API and pretrained weights for major Transformer architectures used in natural language processing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11692","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RoBERTa: A Robustly Optimized BERT Pretraining Approach","primary_cat":"cs.CL","submitted_at":"2019-07-26T17:48:29+00:00","verdict":"ACCEPT","verdict_confidence":"HIGH","novelty_score":5.0,"formal_verification":"none","one_line_summary":"With better hyperparameters, more data, and longer training, an unchanged BERT-Large architecture matches or exceeds XLNet and other successors on GLUE, SQuAD, and RACE.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}