{"total":95,"items":[{"citing_arxiv_id":"2607.01789","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"EPnG: Adaptive Expert Prune-and-Grow for Parameter-Efficient MoE Fine-tuning","primary_cat":"cs.LG","submitted_at":"2026-07-02T07:02:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EPnG reallocates LoRA capacity in MoE models by pruning experts with low router gate probabilities and expanding high-importance ones via rank growth, outperforming standard LoRA and nearing full fine-tuning performance with 0.55-0.72% parameters updated.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.00379","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Attribute-Prompted Kernel Hashing for Unsupervised Data-Efficient Cross-Modal Retrieval","primary_cat":"cs.IR","submitted_at":"2026-07-01T03:23:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"APKH uses prompt-optimized attribute kernel mapping and kernel-smoothed contrastive alignment to improve generalization from seen to unseen categories in data-constrained unsupervised cross-modal hashing.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.30775","ref_index":31,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization","primary_cat":"cs.CL","submitted_at":"2026-06-29T18:06:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A single LLM rewrite of skill descriptions using false positive and negative cases matches manual optimization performance in production, with most other pipeline components adding little value.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26749","ref_index":178,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Structure Before Collapse: Transient semantic geometry in next-token prediction","primary_cat":"cs.LG","submitted_at":"2026-06-25T08:33:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Semantic geometry emerges transiently early in next-token prediction training before collapsing to Neural Collapse symmetry in synthetic settings with latent semantic factors.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.26183","ref_index":26,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LiMoDE: Rethinking Lifelong Robot Manipulation from a Mixture-of-Dynamic-Experts Perspective","primary_cat":"cs.RO","submitted_at":"2026-06-24T13:18:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LiMoDE uses dynamic MoE pre-training on motion cues followed by lifelong expert addition for continuous robot task adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23897","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Professor: Multi-Teacher Unsupervised Prompt Distillation for Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-06-22T19:53:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Multi-teacher confidence-weighted ensembling in unsupervised prompt distillation raises average harmonic mean from 87.52 to 89.28 across four base-to-novel datasets, with largest gains on domain-shifted EuroSAT.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.24937","ref_index":112,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"The Hitchhiker's Guide to Agentic AI: From Foundations to Systems","primary_cat":"cs.AI","submitted_at":"2026-06-22T17:48:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A comprehensive reference book organizing existing techniques for agentic AI systems across LLM substrate, reasoning, agent design patterns, inter-agent coordination, and production deployment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.21645","ref_index":130,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Behavioral and Representational Evidence of Binomial Ordering Preferences in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-06-19T17:56:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLMs recover dominant binomial orders from corpora but align less closely with exact preference distributions, with preference strength partially encoded in middle-to-late layers and manipulable via steering.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11854","ref_index":7,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training","primary_cat":"cs.LG","submitted_at":"2026-06-10T09:30:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ART optimizes visual pixel inputs to frozen MLLMs to achieve LoRA-competitive accuracy on math and structured tool-use benchmarks without modifying computational graphs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11853","ref_index":99,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Task-Aware Structured Memory for Dynamic Multi-modal In-Context Learning","primary_cat":"cs.CV","submitted_at":"2026-06-10T09:30:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TASM proposes a task-aware structured memory framework using task-vector compression, bipartite token merging, and a Core Memory plus Latent Bank hierarchy to enable efficient dynamic multi-modal in-context learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.11712","ref_index":52,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Substrate Asymmetry in User-Side Memory: A Diagnostic Framework","primary_cat":"cs.CL","submitted_at":"2026-06-10T06:39:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"User memory in LLMs factors into three orthogonal axes where parametric adapters and retrieval show opposite strengths, with causal evidence from attention interventions and an alignment tax on RLHF models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10488","ref_index":5,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"5% > 100%: Flatness Preference is All You Need for Multimodal Parameter-Efficient Fine-Tuning","primary_cat":"cs.CV","submitted_at":"2026-06-09T07:03:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Flatness Preference Optimization (FlatPO) improves multimodal PEFT generalization by flattening a small set of sharp dimensions that dominate performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09125","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Unveiling Privacy Risks in Multi-modal Large Language Models: Task-specific Vulnerabilities and Mitigation Challenges","primary_cat":"cs.CR","submitted_at":"2026-06-08T07:19:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces MM-Privacy dataset and evaluations showing MLLMs leak sensitive data from images in various tasks, highlighting task inconsistency effects.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.04325","ref_index":39,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Parameter-Efficient Fine-Tuning with Learnable Rank","primary_cat":"cs.CL","submitted_at":"2026-06-03T00:57:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LR-LoRA learns per-layer adapter ranks during training and reports outperforming fixed-rank LoRA and other PEFT baselines on language understanding and commonsense reasoning tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03979","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories","primary_cat":"cs.LG","submitted_at":"2026-06-02T17:56:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Language models can use a two-stage sleep process of upward distillation for memory consolidation and RL-based dreaming for unsupervised self-improvement to enable continual learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00776","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Latent Diffusion Pretraining for Crystal Property Prediction","primary_cat":"cs.LG","submitted_at":"2026-05-30T15:44:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"MODERATE","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00508","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"V-LynX: Token Interface Alignment for Video+X LLMs","primary_cat":"cs.CV","submitted_at":"2026-05-30T03:54:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"V-LynX integrates novel modalities into frozen Video LLMs by aligning to an internalized continuous token manifold using unpaired unimodal data and attention/statistical matching.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02624","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"TadA-Bench: A Million-Variant Benchmark for Future-Round Discovery Toward Agentic Protein Engineering","primary_cat":"q-bio.QM","submitted_at":"2026-05-29T12:12:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TadA-Bench supplies a chronological million-variant wet-lab replay benchmark from 31 TadA directed-evolution rounds that evaluates models on future-round variant ranking given only earlier data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.31108","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Remembering by Reconstructing: Domain Incremental Learning With Test-Time Training on Video Streams","primary_cat":"cs.CV","submitted_at":"2026-05-29T10:17:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Domain-incremental video learning that permits forgetting through per-domain LoRA adapters and recovers the matching adapter at inference via test-time training on a self-supervised MAE reconstruction head.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29498","ref_index":30,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Mask the Target: A Plug-and-Play Regularizer Against LoRA Forgetting","primary_cat":"cs.CL","submitted_at":"2026-05-28T07:22:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A plug-and-play KL regularizer that masks the target token and renormalizes probabilities to improve the learning-forgetting trade-off in LoRA adaptation of LLMs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27962","ref_index":9,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bridging the Generalization Gap in Adverse Weather Segmentation: A Training Recipe Perspective","primary_cat":"cs.CV","submitted_at":"2026-05-27T04:55:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A training recipe combining domain-adaptive fine-tuning, multi-source mixing, balanced sampling, and synthetic augmentations on SegMAN-S achieves 59.9% mIoU on the adverse weather test set with a 6.5-point validation-test gap.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.27313","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"When Does Demographic Information Help? Data and Modeling Regimes for Perspective-Aware Hate Speech Detection","primary_cat":"cs.CL","submitted_at":"2026-05-26T17:24:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Demographic information aids perspective-aware hate speech detection in regimes of low training disagreement and high test disagreement, with a gated residual model proving effective on high-disagreement examples across MHS and POPQUORN datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.07567","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SurfDesign: Effective Protein Design on Molecular Surfaces","primary_cat":"q-bio.BM","submitted_at":"2026-05-25T19:53:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SurfDesign introduces surface-conditioned protein design via manifold modeling and equivariant message passing on surfaces integrated with pretrained language models, outperforming prior methods on binder and enzyme design benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25479","ref_index":22,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"MAIL++: Multi-Modal Bi-directional Agent Layer for Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-05-25T06:35:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MAIL++ embeds bidirectional multi-modal agent layers with meta bridges directly into VLM computation modules for improved few-shot classification and retrieval over prior PEFT approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24807","ref_index":19,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CLIP-Guided SAM: Parameter-Efficient Semantic Conditioning for Promptable Segmentation","primary_cat":"cs.CV","submitted_at":"2026-05-24T01:40:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"CLIP-Guided SAM injects CLIP-derived features into SAM via lightweight adapters for semantic conditioning, supporting text and spatial prompts while remaining parameter-efficient and achieving competitive performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19478","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Exposing Functional Fusion: A New Class of Strategic Backdoor in Dynamic Prompt Architectures","primary_cat":"cs.CR","submitted_at":"2026-05-19T07:29:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"VIPER exposes Functional Fusion in dynamic prompt architectures, enabling a backdoor that resists pruning by tightly integrating attack and utility parameters in the same high-magnitude core.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.17270","ref_index":188,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Detection: A Structure-Aware Framework for Scene Text Tracking","primary_cat":"cs.CV","submitted_at":"2026-05-17T05:40:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SymTrack is the first systematic detection-free framework for scene text tracking that constructs benchmarks from video text spotting datasets and reports up to 11.97% AUC gains over prior trackers.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14948","ref_index":27,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editing","primary_cat":"cs.CV","submitted_at":"2026-05-14T15:20:27+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ACE-LoRA introduces adaptive orthogonal decoupling and rank-invariant compression for continual image editing in diffusion models, plus the CIE-Bench benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14938","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Octopus: History-Free Gradient Orthogonalization for Continual Learning in Multimodal Large Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-14T15:13:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Octopus introduces history-free gradient orthogonalization in a two-stage finetuning framework to achieve state-of-the-art continual learning results for multimodal LLMs on the UCIT benchmark.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.14055","ref_index":53,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts","primary_cat":"cs.CL","submitted_at":"2026-05-13T19:25:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.13421","ref_index":179,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Combining pre-trained models via localized model averaging","primary_cat":"stat.ME","submitted_at":"2026-05-13T12:16:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Localized model averaging with covariate-dependent weights achieves asymptotic optimality and weight consistency for combining pre-trained models under a general loss framework.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07244","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Experience Sharing in Mutual Reinforcement Learning for Heterogeneous Language Models","primary_cat":"cs.LG","submitted_at":"2026-05-08T05:01:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Mutual Reinforcement Learning allows heterogeneous LLMs to exchange experience through mechanisms like Peer Rollout Pooling, Cross-Policy GRPO Advantage Sharing, and Success-Gated Transfer, with outcome-level sharing identified as favorable on the stability-support trade-off.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.07096","ref_index":116,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Query-efficient model evaluation using cached responses","primary_cat":"cs.LG","submitted_at":"2026-05-08T01:24:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DKPS-based methods predict new model benchmark scores using cached responses, matching baseline mean absolute error with substantially fewer queries and an offline query selection approach.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05974","ref_index":132,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts","primary_cat":"cs.CR","submitted_at":"2026-05-07T10:19:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"PragLocker generates function-preserving but non-portable prompts for LLM agents via code-symbol semantic anchoring followed by target-model feedback noise injection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.05776","ref_index":89,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"HEDP: A Hybrid Energy-Distance Prompt-based Framework for Domain Incremental Learning","primary_cat":"cs.AI","submitted_at":"2026-05-07T07:09:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HEDP uses energy regularization inspired by Helmholtz free energy plus hybrid energy-distance weighting in prompts to improve domain selection and achieve a 2.57% accuracy gain on benchmarks like CORe50 while mitigating catastrophic forgetting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04651","ref_index":13,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation","primary_cat":"cs.LG","submitted_at":"2026-05-06T08:58:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FAAST performs test-time supervised adaptation by analytically deriving fast weights from examples in one forward pass, matching backprop performance with over 90% less adaptation time and up to 95% memory savings versus memory-based methods.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Specifically, let the SVD ofKbe K=UΣR ⊤,(11) 4 Forward-Only Associative Learning for Test-Time Adaptation where singular values Σ = diag(σ 1, . . . , σr) with σ1 ≥ · · · ≥σ r >0 , and singular vectors U and R have orthonor- mal columns. The pseudoinverse is then given by K † =RΣ † U ⊤,Σ † = diag(σ−1 1 , . . . , σ−1 r ),(12) and the fast weights can be written as W ⋆ =RΣ † U ⊤V.(13) The computation of W ⋆ involves only asingle forward pass over the dataand yields a deterministic solution with theoretical optimality (see Appendix B.1). Incremental Update Rule.A key challenge in supervised adaptation is scale: while classification tasks may involve up to 106 input-output pairs, language models may involve the order of 1010 tokens."},{"citing_arxiv_id":"2605.04447","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Deep Reprogramming Distillation for Medical Foundation Models","primary_cat":"cs.CV","submitted_at":"2026-05-06T03:22:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DRD introduces a reprogramming module and CKA-based distillation to enable efficient, robust adaptation of medical foundation models to downstream 2D/3D classification and segmentation tasks, outperforming prior PEFT and KD methods on 18 tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"into three types. First, adapter-based methods involve intro- ducing additional trainable components into the frozen back- JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 3 bone [27]-[30]. Second, prompt-based methods incorporate additional soft tokens (prompts) into the initial input, focusing on fine-tuning these specific parameters [10], [31]-[33]. The third category includes LoRA [12] and its variants [34]-[36], which are particularly notable for not increasing the inference burden. These techniques utilize low-rank matrices to simulate weight modifications during fine-tuning, allowing them to be seamlessly integrated with pre-trained weights before infer- ence. Moreover, the application of PEFT in medical imaging"},{"citing_arxiv_id":"2605.00650","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AdaMeZO: Adam-style Zeroth-Order Optimizer for LLM Fine-tuning Without Maintaining the Moments","primary_cat":"cs.LG","submitted_at":"2026-05-01T13:31:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AdaMeZO adapts Adam moment estimates to zeroth-order LLM fine-tuning without extra memory storage, outperforming MeZO with up to 70% fewer forward passes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00930","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CellxPert: Inference-Time MCMC Steering of a Multi-Omics Single-Cell Foundation Model for In-Silico Perturbation","primary_cat":"q-bio.GN","submitted_at":"2026-04-30T21:34:16+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CellxPert uses inference-time MCMC steering on a multi-omics single-cell foundation model to predict genome-wide transcriptomic responses to gene perturbations and outperforms baselines on cell-type annotation, perturbation prediction, and multi-omic integration benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26388","ref_index":14,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning","primary_cat":"cs.DC","submitted_at":"2026-04-29T07:58:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"domains that involve private data, thereby improving their capabilities in specialized areas. As a result, fine-tuning pre- trained models for vertical domains has become a prominent research focus. To mitigate computational overhead for the large-scale model training, several parameter-efficient fine- tuning (PEFT) [10]-[12] strategies have been developed, in- cluding prompt learning [13], adapters [14], and low-rank adaptation (LoRA) [15]. These strategies involve freezing the pre-trained parameters of LLMs and training a small set of additional parameters. LoRA, in particular, has gained widespread adoption across various applications. Federated learning offers a promising approach for fine-tuning LLMs across disparate data silos containing private data [16], [17]-"},{"citing_arxiv_id":"2604.19342","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Are Large Language Models Economically Viable for Industry Deployment?","primary_cat":"cs.CL","submitted_at":"2026-04-21T11:25:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Small LLMs under 2B parameters achieve better economic break-even, energy efficiency, and hardware density than larger models on legacy GPUs for industrial tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.19015","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"FedProxy: Federated Fine-Tuning of LLMs via Proxy SLMs and Heterogeneity-Aware Fusion","primary_cat":"cs.LG","submitted_at":"2026-04-21T03:06:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FedProxy replaces weak adapters with a proxy SLM for federated LLM fine-tuning, outperforming prior methods and approaching centralized performance via compression, heterogeneity-aware aggregation, and training-free fusion.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18559","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"ConforNets: Latents-Based Conformational Control in OpenFold3","primary_cat":"q-bio.BM","submitted_at":"2026-04-20T17:47:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ConforNets use channel-wise affine transforms on pre-Pairformer pair latents in OpenFold3 to achieve state-of-the-art unsupervised generation of alternate protein states and supervised conformational transfer across families.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18124","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"TLoRA: Task-aware Low Rank Adaptation of Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-20T11:43:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TLoRA jointly optimizes LoRA initialization via task-data SVD and sensitivity-driven rank allocation, delivering stronger results than standard LoRA across NLU, reasoning, math, code, and chat tasks while using fewer trainable parameters.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18026","ref_index":54,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RASP-Tuner: Retrieval-Augmented Soft Prompts for Context-Aware Black-Box Optimization in Non-Stationary Environments","primary_cat":"cs.LG","submitted_at":"2026-04-20T09:52:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RASP-Tuner matches or beats GP-UCB and CMA-ES regret on seven of nine synthetic non-stationary tasks while running 8-12 times faster per step.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17751","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"HiP-LoRA: Budgeted Spectral Plasticity for Robust Low-Rank Adaptation","primary_cat":"cs.LG","submitted_at":"2026-04-20T03:11:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"HiP-LoRA decomposes LoRA updates into principal and residual spectral channels with a singular-value-weighted stability budget to reduce forgetting and interference during foundation model adaptation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17725","ref_index":96,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RePrompT: Recurrent Prompt Tuning for Integrating Structured EHR Encoders with Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-04-20T02:20:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RePrompT uses recurrent prompt tuning to inject prior-visit latent states and cohort-derived population prompt tokens into LLMs, yielding better performance than pure EHR or pure LLM baselines on MIMIC clinical prediction tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.15998","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"SCHK-HTC: Sibling Contrastive Learning with Hierarchical Knowledge-Aware Prompt Tuning for Hierarchical Text Classification","primary_cat":"cs.CL","submitted_at":"2026-04-17T12:22:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SCHK-HTC uses sibling contrastive learning plus hierarchical prompt tuning to improve discrimination between confusable sibling classes in few-shot hierarchical text classification.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.12610","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs","primary_cat":"cs.CL","submitted_at":"2026-04-14T11:36:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"tured knowledge transformation retrieval framework tailored for LLM-based RAG. Tri-RAG converts external knowledge written in natural language into standardized triplets ofCon- dition,Proof, andConclusion, enabling retrieval over seman- tically focused units that expose key relational structure. The transformation is learned via soft prompt tuning [22]: with the LLM backbone frozen, a small set of trainable prompt vectors guides the model to extract explicit triplet representations from unstructured knowledge sources. During retrieval, Tri- RAG treats the triplet head (Condition) as a dynamic semantic anchor for matching and returns the full triplet to support inference. This design aims to reduce ineffective concatenation"},{"citing_arxiv_id":"2604.09034","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The nextAI Solution to the NeurIPS 2023 LLM Efficiency Challenge","primary_cat":"cs.LG","submitted_at":"2026-04-10T06:52:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"A competition entry achieved efficient fine-tuning of LLaMa2 70B on one GPU in 24 hours with competitive QA benchmark performance.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}