Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
hub
author Tang, Y
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
polarities
background 4representative citing papers
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
Bengal-HP_RU is the first publicly available head pose dataset for Bengali subjects, with 12,894 images collected from Wikimedia Commons and partitioned by uploader identity.
A Diffusion Transformer framework applies coordinate-transformed RoPE and disjoint attention masks to achieve controllable, high-fidelity texture tiling that preserves reference structure and scene lighting.
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
MS-DKC is a dataset knowledge card framework that maps image, morphology, supervision, context, and risk descriptors to design priors and failure modes, shown to produce dataset-specific model adaptations with improved metrics on DRIVE, ISIC2018, and ACDC.
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.
A framework trains keypoint detectors on inpainted markerless robot images and uses runtime inpainting plus UKF for robust vision-based control without models or calibration.
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
This is the first comprehensive survey of OOD generalization methodologies for time series, organized across data distribution, representation learning, and OOD evaluation.
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
citing papers explorer
-
Cyclic 2.5D Perceptual Loss for Cross-Modal 3D Medical Image Synthesis: T1w MRI to Tau PET
Proposes a cyclic 2.5D perceptual loss with manufacturer SUVR standardization for T1w MRI to tau PET synthesis, reporting improved regional agreement on ADNI and SCAN cohorts across U-Net, UNETR, SwinUNETR, CycleGAN, and Pix2Pix.
-
Evaluating Object Hallucination in Large Vision-Language Models
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
-
Bengal-HP_RU: A Dataset of Bengal People For Head Pose Estimation
Bengal-HP_RU is the first publicly available head pose dataset for Bengali subjects, with 12,894 images collected from Wikimedia Commons and partitioned by uploader identity.
-
Controllable Texture Tiling with Transformed RoPE-Enhanced Diffusion Models
A Diffusion Transformer framework applies coordinate-transformed RoPE and disjoint attention masks to achieve controllable, high-fidelity texture tiling that preserves reference structure and scene lighting.
-
Enhancing Multilingual Reasoning via Steerable Model Merging
ST-Merge uses gated cross-attention to adaptively weight source models during merging, outperforming baselines on multilingual reasoning tasks across 21 languages.
-
MS-DKC: A Dataset Knowledge Card Framework for Designing and Adapting Medical Image Segmentation Models
MS-DKC is a dataset knowledge card framework that maps image, morphology, supervision, context, and risk descriptors to design priors and failure modes, shown to produce dataset-specific model adaptations with improved metrics on DRIVE, ISIC2018, and ACDC.
-
AdaCodec: A Predictive Visual Code for Video MLLMs
AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.
-
Utilizing Inpainting for Keypoint Detection for Vision-Based Control of Robotic Manipulators
A framework trains keypoint detectors on inpainted markerless robot images and uses runtime inpainting plus UKF for robust vision-based control without models or calibration.
-
Rotary Masked Autoencoders are Versatile Learners
RoMAE applies rotary positional embeddings to masked autoencoders to enable representation learning and interpolation on continuous positional data across irregular time-series, images, and audio without modality-specific modifications.
-
Out-of-Distribution Generalization in Time Series: A Survey
This is the first comprehensive survey of OOD generalization methodologies for time series, organized across data distribution, representation learning, and OOD evaluation.
-
Autonomous Unmanned Aircraft Systems for Enhanced Search and Rescue of Drowning Swimmers: Image-Based Localization and Mission Simulation
A UAS with YOLO-based swimmer detection and DES simulations reduces drowning rescue response time by a factor of five versus standard operations in tested lake areas.
-
PaliGemma: A versatile 3B VLM for transfer
PaliGemma is an open 3B VLM based on SigLIP and Gemma that achieves strong performance on nearly 40 diverse open-world tasks including benchmarks, remote-sensing, and segmentation.
-
SoK: A Comprehensive Analysis of the Current Status of Neural Tangent Generalization Attacks with Research Directions
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.