MM-Eval unifies evaluation of multimodal summaries by integrating factual text quality, cross-modal relevance via MLLM judge, and visual diversity via truncated CLIP entropy, then calibrates their combination on human preferences.
XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , booktitle =
8 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
SPeCTrA-Sum uses hierarchical cross-modal fusion via DVP and DPP-distilled image selection via VRP to generate more accurate and visually grounded multimodal summaries.
ViSA-R2 recovers single executable SymPy expressions for linear steady-state fields from visualizations using a self-verifying chain-of-thought that recognizes patterns, hypothesizes solution families, derives parameters, and checks consistency.
KLR Hopfield networks exhibit robustness to quantization but sensitivity to pruning, interpreted as arising from dense bimodal parameterization of sparse input mappings.
Gated-SwinRMT unifies Swin windowed attention with retentive Manhattan decay via gating, reaching 80.22% top-1 accuracy on Mini-ImageNet versus 73.74% for the RMT baseline.
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.
citing papers explorer
-
Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity
MM-Eval unifies evaluation of multimodal summaries by integrating factual text quality, cross-modal relevance via MLLM judge, and visual diversity via truncated CLIP entropy, then calibrates their combination on human preferences.
-
MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models
MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.
-
Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention
SPeCTrA-Sum uses hierarchical cross-modal fusion via DVP and DPP-distilled image selection via VRP to generate more accurate and visually grounded multimodal summaries.
-
Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations
ViSA-R2 recovers single executable SymPy expressions for linear steady-state fields from visualizations using a self-verifying chain-of-thought that recognizes patterns, hypothesizes solution families, derives parameters, and checks consistency.
-
Quantization robustness from dense representations of sparse functions in high-capacity kernel associative memory
KLR Hopfield networks exhibit robustness to quantization but sensitivity to pruning, interpreted as arising from dense bimodal parameterization of sparse input mappings.
-
Gated-SwinRMT: Unifying Swin Windowed Attention with Retentive Manhattan Decay via Input-Dependent Gating
Gated-SwinRMT unifies Swin windowed attention with retentive Manhattan decay via gating, reaching 80.22% top-1 accuracy on Mini-ImageNet versus 73.74% for the RMT baseline.
-
Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification
A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.