{"total":17,"items":[{"citing_arxiv_id":"2605.29391","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Phast: Simultaneous reconstruction of photoelectron count and time profiles from PMT waveforms via machine learning","primary_cat":"hep-ex","submitted_at":"2026-05-28T05:44:01+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Phast applies a transformer encoder plus count-conditioned query decoder to reconstruct photoelectron count and time profiles from simulated PMT waveforms on toy Monte Carlo datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20649","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI","primary_cat":"eess.SP","submitted_at":"2026-05-20T03:09:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.00244","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-04-30T21:25:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"[15] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko. End-to-end object detection with transformers, 2020. URLhttps://arxiv.org/abs/2005.12872. 9 [16] T. Garipov, S. D. Peuter, G. Yang, V . Garg, S. Kaski, and T. Jaakkola. Compositional sculpting of iterative generative processes, 2023. URLhttps://arxiv.org/abs/2309.16115. [17] C. Chi, Z. Xu, S. Feng, E. Cousineau, Y . Du, B. Burchfiel, R. Tedrake, and S. Song. Diffusion policy: Visuomotor policy learning via action diffusion.The International Journal of Robotics Research, 2024. [18] M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine. Planning with diffusion for flexible behavior synthesis, 2022. URLhttps://arxiv.org/abs/2205."},{"citing_arxiv_id":"2604.26633","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection","primary_cat":"cs.CV","submitted_at":"2026-04-29T12:57:32+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.21184","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model","primary_cat":"astro-ph.SR","submitted_at":"2026-04-23T01:00:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20000","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery","primary_cat":"cs.CV","submitted_at":"2026-04-21T21:18:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.","context_count":1,"top_context_role":"background","top_context_polarity":"unclear","context_text":"labeledPrediction Uncertainty Score,U score, com- puted on a held-out validation split with the base detector frozen. The score aggregates multiple sources of detection error, including localization inaccuracies, classification ambiguity, false nega- tives, and false positives: Uscore = 1 |P|+|G| − |M| \" X (pi,gj )∈M 1−c i ·IoU(p i, gj) \u0001 +|G F N|+ X pi∈PF P min(1,0.5 +c i) # . (7) wherePandGdenote the sets of predictions and ground-truth objects, respectively,Mis the set of matched prediction-ground-truth pairs,c i is the confidence score of predictionp i, and IoU(·) mea- sures localization overlap. The normalization term ensures thatU score is invariant to the number of objects per tile. By training MUH to regressU score, the detec-"},{"citing_arxiv_id":"2604.13479","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention","primary_cat":"eess.IV","submitted_at":"2026-04-15T05:06:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Dynamic Focal Attention learns class-specific difficulty via per-class biases in attention logits, improving Dice and IoU on imbalanced histopathology segmentation benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11081","ref_index":2,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling","primary_cat":"cs.CV","submitted_at":"2026-04-13T07:02:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11042","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization","primary_cat":"cs.CV","submitted_at":"2026-04-13T06:14:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"[Online]. Available: https://arxiv.org/abs/1506.02640 [11] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, \"YOLOX: Exceeding YOLO Series in 2021,\" arXiv preprint arXiv:2107.08430, 2021, [Online]. Available: https://arxiv.org/abs/2107. 08430 [12] G. Jocher, A. Chaurasia, and J. Qiu, \"YOLOv8.\" [Online]. Available: https://github.com/ ultralytics/ultralytics [13] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, \"End-to- End Object Detection with Transformers,\" in European Conference on Computer Vision (ECCV), 2020. [Online]. Available: https://arxiv.org/abs/2005.12872 [14] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, \"Deformable DETR: Deformable Transformers for End-to-End Object Detection,\" in International Conference on Learning"},{"citing_arxiv_id":"2604.09996","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery","primary_cat":"cs.CV","submitted_at":"2026-04-11T02:56:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.04127","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection","primary_cat":"cs.CV","submitted_at":"2026-04-05T14:15:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"The YOLO-series [6]-[8], while establishing popular bench- marks for one-stage architectures, relies heavily on hand- crafted heuristics such as predefined anchor settings and Non- Maximum Suppression (NMS). These manual components often exhibit limited generalization across diverse SAR sensors and sea conditions where scattering patterns vary significantly. Conversely, DETR-style detectors [9]-[13] offer a streamlined Fenghao Song, Shaojing Yang, and Xi Zhou are with Yunnan Normal University, No. 768, Jucheng Avenue, Chenggong District, Kunming 650500, China. Corresponding author: Fenghao Song (e-mail: 2424420029@ynnu.edu.cn). end-to-end framework, eliminating the need for complex hand- crafted priors by formulating detection as a bipartite matching"},{"citing_arxiv_id":"2604.04086","ref_index":59,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection","primary_cat":"cs.CV","submitted_at":"2026-04-05T12:08:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Girshick, \"Masked au- toencoders are scalable vision learners,\" inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 16 000-16 009. [58] Y .-J. Heo, W.-H. Yeo, and B.-G. Kim, \"Deepfake detection algorithm based on improved vision transformer,\"Applied Intelligence, vol. 53, no. 7, pp. 7512-7527, 2023. [59] D. Wodajo and S. Atnafu, \"Deepfake video detection using convolutional vision transformer,\"arXiv preprint arXiv:2102.11126, 2021. [60] S. A. Khan and H. Dai, \"Video transformer for deepfake detection with incremental learning,\" inProceedings of the 29th ACM international conference on multimedia, 2021, pp. 1821-1828. [61] J. Wang, Z. Wu, W. Ouyang, X."},{"citing_arxiv_id":"2604.02753","ref_index":3,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection","primary_cat":"cs.CV","submitted_at":"2026-04-03T05:56:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DeCo-DETR builds hierarchical semantic prototypes offline and uses decoupled training streams to deliver competitive zero-shot open-vocabulary detection with improved inference speed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.13193","ref_index":51,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control","primary_cat":"cs.RO","submitted_at":"2026-02-13T18:57:56+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.07553","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters","primary_cat":"cs.LG","submitted_at":"2025-02-11T13:41:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2304.13705","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware","primary_cat":"cs.RO","submitted_at":"2023-04-23T19:10:53+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"of hardware costs. Early works tackle bimanual manipulation from a classical control perspective, with known environment dynamics [ 54, 48], but designing such models can be time- consuming, and they may not be accurate for objects with complex physical properties. More recently, learning has been incorporated into bimanual systems, such as reinforcement learning [9, 10], imitating human demonstrations [ 34, 37, 59, 67, 32], or learning to predict key points that chain together top camera front camera wrist camerawrist camera grip tape see-through gripper adjustable velcro 50cm 60cm #Dofs Reach Span Repeatability Accuracy Working Payload 6+gripper 750mm 1500mm 1mm 5-8mm 750g ViperX 6dof Arm (follower) red: bimanual workspace"},{"citing_arxiv_id":"2212.12130","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning to Detect and Segment for Open Vocabulary Object Detection","primary_cat":"cs.CV","submitted_at":"2022-12-23T03:54:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CondHead conditionally parameterizes detection heads on semantic embeddings via aggregated expert and dynamically generated streams to improve generalization for novel categories.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}