{"total":14,"items":[{"citing_arxiv_id":"2606.24122","ref_index":52,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bengal-HP_RU: A Dataset of Bengal People For Head Pose Estimation","primary_cat":"cs.CV","submitted_at":"2026-06-23T04:13:59+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Bengal-HP_RU is the first publicly available head pose dataset for Bengali subjects, with 12,894 images collected from Wikimedia Commons and partitioned by uploader identity.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.22945","ref_index":13,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models","primary_cat":"cs.GR","submitted_at":"2026-06-22T07:24:15+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A Diffusion Transformer framework applies coordinate-transformed RoPE and disjoint attention masks to achieve controllable, high-fidelity texture tiling that preserves reference structure and scene lighting.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.19002","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Enhancing Multilingual Reasoning via Steerable Model Merging","primary_cat":"cs.CL","submitted_at":"2026-06-17T12:28:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.06103","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models","primary_cat":"cs.CV","submitted_at":"2026-06-04T12:45:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MS-DKC is a dataset knowledge card framework that maps image, morphology, supervision, context, and risk descriptors to design priors and failure modes, shown to produce dataset-specific model adaptations with improved metrics on DRIVE, ISIC2018, and ACDC.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.02569","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AdaCodec: A Predictive Visual Code for Video MLLMs","primary_cat":"cs.CV","submitted_at":"2026-06-01T17:56:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.29260","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep Psychovisual Image Representations","primary_cat":"cs.CV","submitted_at":"2026-05-28T02:24:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Proposes a psychovisual-inspired deep learning method that encodes images in learned frequency sub-bands for interpretable semantic structures and reduced depth dependence.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12792","ref_index":110,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions","primary_cat":"cs.LG","submitted_at":"2026-05-12T22:10:01+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.18088","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation","primary_cat":"cs.CV","submitted_at":"2026-04-20T11:05:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.","context_count":1,"top_context_role":"dataset","top_context_polarity":"background","context_text":"collection and use of sensitive training data raises ethical and legal questions, es- pecially regarding data privacy. These considerations are particularly important in the European Union (EU), where data protection regulations are stringent [ 9]. How- ever, a variety of datasets have already been created to support human detection in S&R context, such as VisDrone [10], SeaDronesSee [11], UMA-SAR [12], and SARD [13]. Frequently used detection frameworks for these tasks include variants of YOLO [14, 15], Single Shot Detectors (SSDs) [16] and Region-based Convolutional Neural Networks (R-CNNs) [17, 18]. Hasan et al. [ 19] introduced a dataset on human be- havior in swimming pool environments, combining above-water and underwater im-"},{"citing_arxiv_id":"2604.13309","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Utilizing Inpainting for Keypoint Detection for Vision-Based Control of Robotic Manipulators","primary_cat":"cs.RO","submitted_at":"2026-04-14T21:23:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework trains keypoint detectors on inpainted markerless robot images and uses runtime inpainting plus UKF for robust vision-based control without models or calibration.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"estimation of ˆJc as shown in Eq. 10. ∆ ˆJci[k] =γ ˙QT ( ˙QJci[k]− ˙P)(9) ˆJci[k+ 1] = ∆ ˆJci[k] + ˆJci[k](10) Finally, the image feature error is utilized with a proportional control law and the current estimate of ˆJc, as shown in Eq. 11, to generate reference joint velocities that drive the robot to its desired configuration in the image. ˙ qr =−λJ + c e(11) In our implementation, we initialize the adaptive scheme online by providing the system with a small excitation velocity trajectory to estimate an initial combined Jacobian. After this initial estimation, the algorithm continuously updates the Jacobian in runtime during control. For adaptive schemes, it is important to select the window size carefully."},{"citing_arxiv_id":"2505.20535","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Rotary Masked Autoencoders are Versatile Learners","primary_cat":"cs.LG","submitted_at":"2025-05-26T21:45:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.13868","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Out-of-Distribution Generalization in Time Series: A Survey","primary_cat":"cs.LG","submitted_at":"2025-03-18T03:35:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"This is the first comprehensive survey of OOD generalization methodologies for time series, organized across data distribution, representation learning, and OOD evaluation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Out-of-Distribution Generalization in Time Series: A Survey Out-of-distribution Generalization in Time Series Data Distribution (Sec. 4) Covariate Shift BTOA [170] / Improved LIME [97] / [110] /SEGAL [98]/ PE-Att [104] / Two-stage method [163] / TKNets [164] / CTOT [67] / RevIN [71] / Connect Later [118] / ETSC [96] / TIMEX++ [88] / TFPS [132] / IN-Flow [33] / HTSF [27] / / GradExp[5] / JointPGM [60] / FSD [72]. Concept Drift DIOSC [111] / RAINCOAT [58] / MetaEformer [64] / TCVAE [59] / [122] / INNT [20] / SOLID [11] / OneNet [173] / CATS [90] / DIDA [175] / [94] / TS_Adam [22] / [77] / DeCau [87] / WormKAN [155] / PROCEED [177] / Cogra [101] / HSN-LSTM [179] / CORAL [156] / TD-IVDM [145] / [40] / [108] / TS-DM [144] / CatSight [39] / PSO [55] /"},{"citing_arxiv_id":"2407.07726","ref_index":95,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PaliGemma: A versatile 3B VLM for transfer","primary_cat":"cs.CV","submitted_at":"2024-07-10T14:57:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ing [55, 131], to leverage all data from the web without the need for onerous human labeling, re- placingafixedandlargesetofclassesbyacaption embedding instead. The caption embeddings are mostly obtained using language encoders (sim- ilar to BERT [33]) and allow to open up the vocabulary of classification and retrieval tasks. The second generation, akin to T5 [95] in lan- guage, is a unification of captioning and question- answering tasks via generative encoder-decoder modeling [27, 111, 120, 138], often backed by the progress in generative language models. Corresponding author(s): lbeyer,xzhai@google.com © 2024 Google DeepMind. All rights reserved arXiv:2407.07726v2 [cs.CV] 10 Oct 2024 PaliGemma: A versatile 3B VLM for transfer"},{"citing_arxiv_id":"2406.12632","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET","primary_cat":"eess.IV","submitted_at":"2024-06-18T13:59:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2305.10355","ref_index":7,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Evaluating Object Hallucination in Large Vision-Language Models","primary_cat":"cs.CV","submitted_at":"2023-05-17T16:34:01+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}