VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-specific experts and adaptive routing.
Moelora: Contrastive learning guided mixture of experts on parameter-efficient fine-tuning for large language models
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
By sharing the B matrix across adapters instead of the A matrix, ALoRA and Fed-ALoRA deliver more balanced performance in multi-task and federated LLM fine-tuning.
Sub-token routing in LoRA-adapted transformers adds a finer compression axis for KV caches, with query-independent and query-aware designs that improve efficiency under reduced budgets when combined with token-level selection.
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.
TriageRA-CCF combines source-side confidence, coverage, and counterfactual signals to supervise an adaptive LoRA rank router, reporting modest average accuracy gains over LoRA/DoRA/MoELoRA baselines on two 8B models under matched training.
KORE combines knowledge-oriented data augmentations with null-space projections on activation covariance matrices to inject new knowledge into LMMs while preserving prior knowledge.
citing papers explorer
-
When Text Hijacks Vision: Benchmarking and Mitigating Text Overlay-Induced Hallucination in Vision Language Models
VLMs hallucinate by prioritizing contradictory on-screen text over visual content, addressed via the VisualTextTrap benchmark with 6,057 human-validated samples and the VTHM-MoE dual-encoder framework using dimension-specific experts and adaptive routing.
-
Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms
Queryable LoRA adds dynamic routing over shared low-rank atoms with attention and language-instruction regularization to make parameter-efficient fine-tuning more adaptive across inputs and layers.
-
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
-
Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
By sharing the B matrix across adapters instead of the A matrix, ALoRA and Fed-ALoRA deliver more balanced performance in multi-task and federated LLM fine-tuning.
-
Sub-Token Routing in LoRA for Adaptation and Query-Aware KV Compression
Sub-token routing in LoRA-adapted transformers adds a finer compression axis for KV caches, with query-independent and query-aware designs that improve efficiency under reduced budgets when combined with token-level selection.
-
TalkLoRA: Communication-Aware Mixture of Low-Rank Adaptation for Large Language Models
TalkLoRA equips MoE-LoRA experts with a communication module that smooths routing dynamics and improves performance on language tasks under similar parameter budgets.
-
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
LoRA-Mixer routes modular LoRA experts into attention projection matrices with an adaptive Routing Specialization Loss to improve multi-task performance while using fewer trainable parameters than prior LoRA-MoE methods.
-
TriageRA-CCF: Source-Side Clinical Confidence and Coverage Signals for Adaptive Rank Budgeting in Medical LLMs
TriageRA-CCF combines source-side confidence, coverage, and counterfactual signals to supervise an adaptive LoRA rank router, reporting modest average accuracy gains over LoRA/DoRA/MoELoRA baselines on two 8B models under matched training.
-
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Controls
KORE combines knowledge-oriented data augmentations with null-space projections on activation covariance matrices to inject new knowledge into LMMs while preserving prior knowledge.