DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
hub Mixed citations
Lora: Low-rank adaptation of large language models
Mixed citation behavior. Most common role is method (67%).
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 22representative citing papers
HOSL reduces client memory up to 3.7x versus full first-order split learning while staying within 0.20-4.23% accuracy on OPT models by pairing client zeroth-order estimation with server first-order optimization.
S2ST-Omni 2 uses typology-informed hierarchical encoding, gated Dual-CTC, and typology-aware prompting to improve multilingual S2ST over flat-label baselines on CVSS-C, with gains in low-data regimes.
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
Bayesian fine-tuning of large models can be done efficiently by projecting uncertainties into low-dimensional subspaces, yielding improved calibration and generalization while keeping computational costs low.
DiCLIP uses diffusion-based visual correlation enhancement and text semantic augmentation to improve CLIP-generated class activation maps for weakly supervised semantic segmentation, outperforming prior methods on PASCAL VOC and MS COCO.
BID-LoRA uses bi-directional low-rank adapters with retain/new/unlearn pathways and escape unlearning to enable continual learning and unlearning while minimizing knowledge leakage and parameter updates.
Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
StructDiff adds adaptive receptive fields and 3D positional encoding to a single-scale diffusion model to preserve structure and enable spatial control in single-image generation.
EventFace achieves 94.19% Rank-1 accuracy and 5.35% EER on a new small event-based face dataset by transferring facial structure priors via LoRA and fusing them with temporal motion features.
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.
LLMs using in-context learning and fine-tuning on listener experiment data generate equalization settings that align better with population preferences than random sampling or static presets.
CoMelSinger introduces a discrete token-based zero-shot SVS framework on MaskGCT with coarse-to-fine contrastive learning and an SVT module to improve melody control and reduce prosody leakage.
Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.
iGSP uses implicit gradient subspace projection in two phases to enable efficient continual adaptation of vision-language models, claiming SOTA accuracy with 42.7% fewer trainable parameters and 86.9% less total parameter growth.
SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.
CreatiParser decomposes raster graphic designs into editable text, background, and sticker layers via a hybrid VLM-diffusion model with ParserReward and GRPO optimization, reporting 23.7% average metric gains on Parser-40K and Crello datasets.
ZSG-IAD is a zero-shot multimodal system that uses language-guided two-hop grounding and rule-based reinforcement learning to produce anomaly masks and explainable reports from industrial sensor data.
RadarPLM adapts PLMs for marine radar target detection with lightweight adaptation and selective fine-tuning based on online learning values, reporting at least 6.35% average detection gains in low SCR conditions.
BGG adapts vision foundation models using multi-granularity dilated convolutions and frequency-domain patch aggregation to achieve state-of-the-art cross-view geo-localization on University-1652 and SUES-200 with low training cost.
A lightweight hybrid CNN-Transformer framework for heterogeneous face recognition achieves competitive performance on cross-spectral benchmarks and standard RGB tasks using contrastive alignment and distillation.
citing papers explorer
-
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
DMEP prunes experts module-by-module in LoRA-MoE and removes load balancing after pruning, cutting trainable parameters 35-43% and raising throughput ~10% while matching or exceeding uniform baselines on reasoning tasks.
-
HOSL: Hybrid-Order Split Learning for Memory-Constrained Edge Training
HOSL reduces client memory up to 3.7x versus full first-order split learning while staying within 0.20-4.23% accuracy on OPT models by pairing client zeroth-order estimation with server first-order optimization.
-
From Flat Language Labels to Typological Priors: Structured Language Conditioning for Multilingual Speech-to-Speech Translation
S2ST-Omni 2 uses typology-informed hierarchical encoding, gated Dual-CTC, and typology-aware prompting to improve multilingual S2ST over flat-label baselines on CVSS-C, with gains in low-data regimes.
-
From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data
The work creates identity-consistent synthetic makeup data via ConsistentBeauty and adapts models to real images using reinforcement learning in RealBeauty, achieving better identity preservation and real-world performance than prior methods.
-
Bayesian Fine-tuning in Projected Subspaces
Bayesian fine-tuning of large models can be done efficiently by projecting uncertainties into low-dimensional subspaces, yielding improved calibration and generalization while keeping computational costs low.
-
DiCLIP: Diffusion Model Enhances CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
DiCLIP uses diffusion-based visual correlation enhancement and text semantic augmentation to improve CLIP-generated class activation maps for weakly supervised semantic segmentation, outperforming prior methods on PASCAL VOC and MS COCO.
-
BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning
BID-LoRA uses bi-directional low-rank adapters with retain/new/unlearn pathways and escape unlearning to enable continual learning and unlearning while minimizing knowledge leakage and parameter updates.
-
Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs
Tri-RAG turns external knowledge into Condition-Proof-Conclusion triplets and retrieves via the Condition anchor to improve efficiency and quality in LLM RAG.
-
StructDiff: A Structure-Preserving and Spatially Controllable Diffusion Model for Single-Image Generation
StructDiff adds adaptive receptive fields and 3D positional encoding to a single-scale diffusion model to preserve structure and enable spatial control in single-image generation.
-
EventFace: Event-Based Face Recognition via Structure-Driven Spatiotemporal Modeling
EventFace achieves 94.19% Rank-1 accuracy and 5.35% EER on a new small event-based face dataset by transferring facial structure priors via LoRA and fusing them with temporal motion features.
-
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
-
NavCrafter: Exploring 3D Scenes from a Single Image
NavCrafter generates controllable novel-view videos from one image via video diffusion, geometry-aware expansion, and enhanced 3D Gaussian Splatting to achieve state-of-the-art synthesis under large viewpoint changes.
-
One Prompt, Many Sounds: Modeling Listener Variability in LLM-Based Equalization
LLMs using in-context learning and fine-tuning on listener experiment data generate equalization settings that align better with population preferences than random sampling or static presets.
-
CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance
CoMelSinger introduces a discrete token-based zero-shot SVS framework on MaskGCT with coarse-to-fine contrastive learning and an SVT module to improve melody control and reduce prosody leakage.
-
General Hazard Detection
Introduces CompliVision dataset and active learning framework for rule-based hazard compliance assessment using vision-language models grounded in safety standards.
-
iGSP:Implicit Gradient Subspace Projection for Efficient Continual Learning of Vision-Language Models
iGSP uses implicit gradient subspace projection in two phases to enable efficient continual adaptation of vision-language models, claiming SOTA accuracy with 42.7% fewer trainable parameters and 86.9% less total parameter growth.
-
SplitFT: An Adaptive Federated Split Learning System For LLMs Fine-Tuning
SplitFT adapts cut-layer selection and reduces LoRA rank per client in federated split learning to improve efficiency and performance when fine-tuning LLMs on heterogeneous devices and data.
-
CreatiParser: Generative Image Parsing of Raster Graphic Designs into Editable Layers
CreatiParser decomposes raster graphic designs into editable text, background, and sticker layers via a hybrid VLM-diffusion model with ParserReward and GRPO optimization, reporting 23.7% average metric gains on Parser-40K and Crello datasets.
-
ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection
ZSG-IAD is a zero-shot multimodal system that uses language-guided two-hop grounding and rule-based reinforcement learning to produce anomaly masks and explainable reports from industrial sensor data.
-
RadarPLM: Adapting Pre-trained Language Models for Marine Radar Target Detection by Selective Fine-tuning
RadarPLM adapts PLMs for marine radar target detection with lightweight adaptation and selective fine-tuning based on online learning values, reporting at least 6.35% average detection gains in low SCR conditions.
-
BGG: Bridging the Geometric Gap between Cross-View images by Vision Foundation Model Adaptation for Geo-Localization
BGG adapts vision foundation models using multi-granularity dilated convolutions and frequency-domain patch aggregation to achieve state-of-the-art cross-view geo-localization on University-1652 and SUES-200 with low training cost.
-
Lightweight Cross-Spectral Face Recognition via Contrastive Alignment and Distillation
A lightweight hybrid CNN-Transformer framework for heterogeneous face recognition achieves competitive performance on cross-spectral benchmarks and standard RGB tasks using contrastive alignment and distillation.