Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing on 14 models across 3 datasets.
Title resolution pending
6 Pith papers cite this work. Polarity classification is still indexing.
years
2026 6verdicts
UNVERDICTED 6representative citing papers
UG-Separation framework disentangles user-side and item-side flows in TokenMixer dense-interaction models to enable reusable user computations, cutting inference latency up to 20% in ByteDance production scenarios.
GenHAR generalizes cross-domain human activity recognition by 9.97% accuracy and 6.4x lower FLOPs via tokenized sensor data, frequency channel correlations, selective masking, and efficient attention, with deployment detecting 2.15 billion activities.
A hybrid KAN-MLP architecture with KAN input embedding and specialized LarctanKAN classification layer yields 5.33% average macro F1 gain over pure-MLP baselines in IMU-based human activity recognition.
ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.
citing papers explorer
-
Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
Medical MLLMs degrade on image classification due to four failure modes in visual representation quality, connector projection fidelity, LLM comprehension, and semantic mapping alignment, quantified by feature probing on 14 models across 3 datasets.
-
Compute Only Once: UG-Separation for Efficient Large Recommendation Models
UG-Separation framework disentangles user-side and item-side flows in TokenMixer dense-interaction models to enable reusable user computations, cutting inference latency up to 20% in ByteDance production scenarios.
-
GenHAR: Generalizing Cross-domain Human Activity Recognition for Last-mile Delivery
GenHAR generalizes cross-domain human activity recognition by 9.97% accuracy and 6.4x lower FLOPs via tokenized sensor data, frequency channel correlations, selective masking, and efficient attention, with deployment detecting 2.15 billion activities.
-
KAN-MLP-Mixer: A comprehensive investigation of the usage of Kolmogorov-Arnold Networks (KANs) for improving IMU-based Human Activity Recognition
A hybrid KAN-MLP architecture with KAN input embedding and specialized LarctanKAN classification layer yields 5.33% average macro F1 gain over pure-MLP baselines in IMU-based human activity recognition.
-
ALAS: Adaptive Long-Horizon Action Synthesis via Async-pathway Stream Disentanglement
ALAS disentangles environment and self-state streams via bio-inspired modules to deliver 23% higher subtask success and 29% better execution efficiency on long-horizon HSI tasks.
-
Beyond Dense Connectivity: Explicit Sparsity for Scalable Recommendation
SSR uses static random filters and iterative competitive sparse mechanisms to explicitly enforce sparsity in recommendation models, outperforming dense baselines on public and billion-scale industrial datasets.