DAIN reframes multimodal fusion as dynamic agent collaboration with sparse activation, claiming SOTA results including 2.6% accuracy gain on ADNI across five benchmarks.
FineVideo: Afine-graineddatasetforvideounderstanding.arXiv preprint arXiv:2405.00000
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
LiteFrame is an efficient vision encoder backbone trained with Compressed Token Distillation and Language Model Adaptation to scale frame count in Video LLMs while cutting latency and raising accuracy.
Neural networks for HEP tasks can be fooled at significant rates by subtle perturbations inside uncertainty envelopes, revealing hidden systematics not captured by conventional methods.
citing papers explorer
-
LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs
LiteFrame is an efficient vision encoder backbone trained with Compressed Token Distillation and Language Model Adaptation to scale frame count in Video LLMs while cutting latency and raising accuracy.