{"total":14,"items":[{"citing_arxiv_id":"2605.14055","ref_index":95,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.08840","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReST-KV: Robust KV Cache Eviction with Layer-wise Output Reconstruction and Spatial-Temporal Smoothing","primary_cat":"cs.CL","submitted_at":"2026-05-09T09:49:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ReST-KV formulates KV eviction as layer-wise output reconstruction optimization with spatial-temporal smoothing, outperforming baselines by 2.58% on LongBench and 15.2% on RULER while cutting decoding latency by 10.61x at 128k context.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"KV cache entries ⟨ ˆKT , ˆVT ⟩ containing up to B elements from the total cache entries ⟨KT , VT ⟩ at the step t, with the goal of maximizing the retention of the orignial MHA output. We use ℓ2 distance to calculate reconstruction error, the objective for a single attention head can be deﬁned as: argmin ⟨ ˆKT , ˆVT ⟩ MHA (xt, ⟨KT , VT ⟩) − MHA \u0010 xt, ⟨ ˆKT , ˆVT ⟩ \u0011 2 s.t. ⟨ ˆKT , ˆVT ⟩ ≤ B, (5) where ⟨ ˆKT , ˆVT ⟩ is the number of selected KV pairs. To efﬁciently compute Eq. 5, we adopt a greedy selection strategy that retains the top- B KV pairs estimated to have the greatest impact on the attention output. Speciﬁcally, for the n-th KV pair, its importance is measured by the increase in reconstruction error when it is removed, which based on"},{"citing_arxiv_id":"2605.05974","ref_index":51,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts","primary_cat":"cs.CR","submitted_at":"2026-05-07T10:19:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14825","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels","primary_cat":"cs.PL","submitted_at":"2026-04-16T09:55:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Nautilus auto-compiles math-like tensor descriptions into optimized GPU kernels, delivering up to 42% higher throughput than prior compilers on transformer models across NVIDIA GPUs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tile optimizations to multiple tile compilers (Triton, Tawa, and TileLang), and chooses between tile compilers dynam- ically based on performance. Nautilus is integrated with Triton, Tawa, and TileLang, showing that its principle is general and can be applied to other tile compilers. Scheduling Tensor Compilers and Auto-Schedulers. Schedule-based tensor compilers such as Halide [30], TVM [8], AKG [15], and Tiramisu [ 4], separate algorithm from op- timization decisions (\"schedule\"). Neptune [48] limits the schedule search space by delegating low-level optimization to tile optimizers. Auto-scheduling frameworks like Ansor [49], MetaSchedule [34], and FlexTensor [50] automate schedule generation and free users from manually supplying sched- ules."},{"citing_arxiv_id":"2604.08299","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SeLaR: Selective Latent Reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-09T14:32:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SeLaR selectively applies latent soft reasoning in LLMs via entropy gating and contrastive regularization, outperforming standard CoT on five benchmarks without training.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"t , v∗∗ t are the top-1/top-2 tokens and et,˜et are the soft embeddings without and with contrastive reg- ularization. For each pass, we apply the logit lens at every layer ℓ and take the top- k projected to- kens (k=10), denoted T · ℓ . We then measure how much each soft-embedding pass shares with the two references: Otop1(ℓ) = |T soft ℓ ∩ T top1 ℓ | k ,(11) Otop2(ℓ) = |T soft ℓ ∩ T top2 ℓ | k ,(12) quantifying how much of each candidate's reason- ing content the soft-embedding forward pass still carries at layer ℓ. We average both across all N steps. Logit Lens Results.Figure 4 shows the aggre- gated curves. Without contrastive regularization (left), Otop1 rises from ∼0.45 to ∼0.73 while Otop2 stagnates around ∼0."},{"citing_arxiv_id":"2410.13903","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment","primary_cat":"cs.CR","submitted_at":"2024-10-16T08:14:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CoreGuard introduces a computation- and communication-efficient protocol claimed to deliver upper-bound security against model stealing for edge-deployed LLMs with negligible overhead.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2406.08035","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LVBench: An Extreme Long Video Understanding Benchmark","primary_cat":"cs.CV","submitted_at":"2024-06-12T09:36:52+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LVBench is a new benchmark for extreme long video understanding that evaluates multimodal large language models on hour-scale videos using tasks designed to probe extended memory and comprehension.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2404.14294","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Survey on Efficient Inference for Large Language Models","primary_cat":"cs.CL","submitted_at":"2024-04-22T15:53:08+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"attention from both academia and industry in recent years. The field of LLMs has experienced notable growth and sig- nificant achievements. Numerous open-source LLMs have emerged, including the GPT-series (GPT-1 [1], GPT-2 [2], and GPT-3 [3]), OPT [4], LLaMA-series (LLaMA [5], LLaMA 2 [5], Baichuan 2 [6], Vicuna [7], LongChat [8]), BLOOM [9], FALCON [10], GLM [11], and Mistral [12], which are used for both academic research and commercial purposes. The success of LLMs stems from their robust capability in han- dling diverse tasks such as neural language understanding • Z. Zhou, K. Hong, T. Fu, S. Li, L. Wang are with Infinigence-AI and the Department of Electronic Engineering, Tsinghua University, China. E-mail: zhouzx21@mails."},{"citing_arxiv_id":"2404.06395","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies","primary_cat":"cs.CL","submitted_at":"2024-04-09T15:36:50+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MiniCPM 1.2B and 2.4B models reach parity with 7B-13B LLMs via model wind-tunnel scaling and a WSD scheduler that yields a higher optimal data-to-model ratio than Chinchilla scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2403.20330","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Are We on the Right Way for Evaluating Large Vision-Language Models?","primary_cat":"cs.CV","submitted_at":"2024-03-29T17:59:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Current LVLM benchmarks overestimate capabilities because many questions can be answered without images due to design flaws or data leakage; MMStar is a human-curated set of 1,500 vision-indispensable samples across 6 capabilities and 18 axes with new metrics for leakage and true multi-modal gain.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023. [12] X. Dong, P. Zhang, Y . Zang, Y . Cao, B. Wang, L. Ouyang, X. Wei, S. Zhang, H. Duan, M. Cao, et al. Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision- language large model. arXiv preprint arXiv:2401.16420, 2024. [13] Z. Du, Y . Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360, 2021. [14] C. Fu, P. Chen, Y . Shen, Y . Qin, M. Zhang, X. Lin, Z. Qiu, W. Lin, J. Yang, X. Zheng, K. Li, X. Sun, and R. Ji. Mme: A comprehensive evaluation benchmark for multimodal large language models."},{"citing_arxiv_id":"2401.15947","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MoE-LLaVA: Mixture of Experts for Large Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2024-01-29T08:13:40+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MoE-LLaVA applies mixture-of-experts sparsity to LVLMs via MoE-Tuning, delivering LLaVA-1.5-7B level visual understanding and better hallucination resistance with only ~3B active parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2312.16886","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices","primary_cat":"cs.CV","submitted_at":"2023-12-28T08:21:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MobileVLM achieves on-par performance with much larger vision-language models on standard benchmarks while delivering state-of-the-art inference speeds of 21.5 tokens per second on Snapdragon 888 CPU and 65.3 on Jetson Orin GPU.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"corpora, exhibiting emergent capabilities [123] that have not been witnessed before. They have reshaped the field of natural language processing and are being used in a wide range of applications. To date, proprietary LLMs like GPT- 4 [89] prevail over open-sourced models. Nevertheless, the community is exuberant with the continuous model re- leases, including GLM [35], BLOOM [65], OPT [131] and LLaMA series [115, 116]. Many recent works [4, 132] have been built on top of them. Noticeably, there is a trend to build smaller language models, i.e., whose parameters are around 1B or fewer. To name a few, GPT-Neo [9], Pythia [7], GALACTICA [112], OpenLLaMA [43], Phi [46, 70], Qwen [4] all ship lan- guage models at such sizes."},{"citing_arxiv_id":"2311.12793","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ShareGPT4V: Improving Large Multi-Modal Models with Better Captions","primary_cat":"cs.CV","submitted_at":"2023-11-21T18:58:11+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A new 1.2M-caption dataset generated via GPT-4V improves LMMs on MME and MMBench by 222.8/22.0/22.3 and 2.7/1.3/1.5 points respectively when used for supervised fine-tuning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, and Steven Hoi. Instructblip: Towards general- purpose vision-language models with instruction tuning, 2023. 2, 3 [11] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. 2 [12] Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360, 2021. 2 [13] Lijie Fan, Dilip Krishnan, Phillip Isola, Dina Katabi, and Yonglong Tian. Improving clip training with language rewrites. arXiv preprint arXiv:2305."},{"citing_arxiv_id":"2309.10253","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts","primary_cat":"cs.AI","submitted_at":"2023-09-19T02:19:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"GPTFuzz is a black-box fuzzing framework that mutates seed jailbreak templates to automatically generate effective attacks, achieving over 90% success rates on models including ChatGPT and Llama-2.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}