{"total":10,"items":[{"citing_arxiv_id":"2605.18591","ref_index":59,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Randomized Advantage Transformation (RAT): Computing Natural Policy Gradients via Direct Backpropagation","primary_cat":"cs.LG","submitted_at":"2026-05-18T16:05:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"RAT reformulates regularized natural policy gradients as vanilla gradients with a transformed advantage, computed efficiently via randomized block Kaczmarz iterations on on-policy data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11753","ref_index":184,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Towards Visually Grounded Multimodal Summarization via Cross-Modal Transformer and Gated Attention","primary_cat":"cs.AI","submitted_at":"2026-05-12T08:28:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SPeCTrA-Sum uses hierarchical cross-modal fusion via DVP and DPP-distilled image selection via VRP to generate more accurate and visually grounded multimodal summaries.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11693","ref_index":185,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Measuring What Matters Beyond Text: Evaluating Multimodal Summaries by Quality, Alignment, and Diversity","primary_cat":"cs.AI","submitted_at":"2026-05-12T07:50:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MM-Eval unifies evaluation of multimodal summaries by integrating factual text quality, cross-modal relevance via MLLM judge, and visual diversity via truncated CLIP entropy, then calibrates their combination on human preferences.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01520","ref_index":20,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"MIRL: Mutual Information-Guided Reinforcement Learning for Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2026-05-02T16:21:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MIRL uses mutual information to guide trajectory selection and provide separate rewards for visual perception in RLVR for VLMs, achieving 70.22% average accuracy with 25% fewer full trajectories.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01283","ref_index":124,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Developing a Strong Pre-Trained Base Model for Plant Leaf Disease Classification","primary_cat":"cs.CV","submitted_at":"2026-05-02T06:33:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A DenseNet201 base model trained on a constructed plant leaf disease dataset outperforms baselines and enables faster, more robust transfer learning with less data than general models.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"As such, the creators of ConvNeXt built a model with similar characteristics based on CNN to rival these Vision Transformer and to combine the advantages of both. It does that by including techniques such as large kernels (7x7), layer normalization, residual connections GELU activations and inverted bottleneck (see Figure B.3). 5.8 DenseNet The DenseNet [124] family of networks is also based on residual type connections (see Figure B.4), but unlike ResNet, here dense connections are used. They work differently where, instead of adding layers, they append the channels to one another, also preserving earlier features and reducing the vanishing gradient problem in the mean time. 44 6. Benchmarking To this day there seems to be no real consensus on which model is best suited to handle the task"},{"citing_arxiv_id":"2604.20333","ref_index":5,"ref_count":2,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Quantization robustness from dense representations of sparse functions in high-capacity kernel associative memory","primary_cat":"cs.NE","submitted_at":"2026-04-22T08:29:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"KLR Hopfield networks exhibit robustness to quantization but sensitivity to pruning, interpreted as arising from dense bimodal parameterization of sparse input mappings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08863","ref_index":2,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations","primary_cat":"cs.AI","submitted_at":"2026-04-10T01:52:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ViSA-R2 recovers single executable SymPy expressions for linear steady-state fields from visualizations using a self-verifying chain-of-thought that recognizes patterns, hypothesizes solution families, derives parameters, and checks consistency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.06014","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Gated-SwinRMT: Unifying Swin Windowed Attention with Retentive Manhattan Decay via Input-Dependent Gating","primary_cat":"cs.LG","submitted_at":"2026-04-07T16:12:31+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Gated-SwinRMT unifies Swin windowed attention with retentive Manhattan decay via gating, reaching 80.22% top-1 accuracy on Mini-ImageNet versus 73.74% for the RMT baseline.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2510.09696","ref_index":40,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression","primary_cat":"cs.LG","submitted_at":"2025-10-09T15:17:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"UNKNOWN","novelty_score":5.0,"formal_verification":"none","one_line_summary":"VCON is a unified framework for smooth iterative DNN compression that uses parallel execution and an affine combination to progressively replace the original model with its compressed form during fine-tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2208.07339","ref_index":37,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale","primary_cat":"cs.LG","submitted_at":"2022-08-15T17:08:50+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}