RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

· 2026 · cs.CV · arXiv 2604.03454

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Rare diseases often manifest with distinctive facial phenotypes in children, offering valuable diagnostic cues for clinicians and AI-assisted screening systems. However, progress in this field is severely limited by the scarcity of curated, ethically sourced facial data and the high similarity among phenotypes across different conditions. To address these challenges, we introduce RDFace, a curated benchmark dataset comprising 456 pediatric facial images spanning 103 rare genetic conditions (average 4.4 samples per condition). Each ethically verified image is paired with standardized metadata. RDFace enables the development and evaluation of data-efficient AI models for rare disease diagnosis under real-world low-data constraints. We benchmark multiple pretrained vision backbones using cross-validation and explore synthetic augmentation with DreamBooth and FastGAN. Generated images are filtered via facial landmark similarity to maintain phenotype fidelity and merged with real data, improving diagnostic accuracy by up to 13.7% in ultra-low-data regimes. To assess semantic validity, phenotype descriptions generated by a vision-language model from real and synthetic images achieve a report similarity score of 0.84. RDFace establishes a transparent, benchmark-ready dataset for equitable rare disease AI research and presents a scalable framework for evaluating both diagnostic performance and the integrity of synthetic medical imagery.

representative citing papers

Synthetic Data Alone is Enough? Rethinking Data Scarcity in Pediatric Rare Disease Recognition

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Synthetic facial images alone can train models for pediatric rare disease recognition to performance levels comparable to real-data baselines when generated at sufficient scale.

citing papers explorer

Showing 1 of 1 citing paper.

Synthetic Data Alone is Enough? Rethinking Data Scarcity in Pediatric Rare Disease Recognition cs.CV · 2026-05-21 · unverdicted · none · ref 4 · internal anchor
Synthetic facial images alone can train models for pediatric rare disease recognition to performance levels comparable to real-data baselines when generated at sufficient scale.

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

fields

years

verdicts

representative citing papers

citing papers explorer