Rare Disease Detection by Sequence Modeling with Generative Adversarial Networks

Cao Xiao; Emily Zhao; Jimeng Sun; Kezi Yu; Lucas Glass; Yong Cai; Yunlong Wang

arxiv: 1907.01022 · v1 · pith:T6VOR6PWnew · submitted 2019-07-01 · 💻 cs.LG · stat.ML

Rare Disease Detection by Sequence Modeling with Generative Adversarial Networks

Kezi Yu , Yunlong Wang , Yong Cai , Cao Xiao , Emily Zhao , Lucas Glass , Jimeng Sun This is my paper

Pith reviewed 2026-05-25 11:37 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords rare disease detectiongenerative adversarial networksrecurrent neural networksmedical claims dataexocrine pancreatic insufficiencysequence modelingclass imbalancelongitudinal data

0 comments

The pith

GANs generate synthetic rare disease sequences that, with RNN modeling of claims data, yield 0.56 PR-AUC for EPI detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning approach to identify patients with the rare condition exocrine pancreatic insufficiency from longitudinal medical claims records. Generative adversarial networks create additional synthetic examples of the rare class to address severe imbalance, while recurrent neural networks capture the sequential patterns across patient histories. On a dataset covering 1.8 million patients and 29,149 EPI cases drawn from seven years of claims, the combined model reaches a PR-AUC of 0.56 and exceeds several benchmark methods in precision and recall. If the synthetic data preserve the essential statistical structure of real trajectories, this framework could support earlier identification of rare-disease patients who currently face frequent diagnostic delays.

Core claim

A deep learning model that pairs generative adversarial networks to augment the rare EPI class with recurrent neural networks to process patient sequence data from medical claims achieves a PR-AUC of 0.56 on a cohort of 1.8 million patients and outperforms benchmark models.

What carries the argument

Generative adversarial networks that produce synthetic rare-disease sequences, paired with recurrent neural networks that model temporal trajectories in medical claims.

If this is right

The method directly addresses class imbalance in rare-disease prediction without requiring external data sources.
Sequence modeling on claims histories can extract predictive signal even when positive examples remain scarce.
Performance gains appear in both precision and recall relative to standard baselines.
The framework operates on routinely collected longitudinal claims spanning multiple years.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same GAN-plus-RNN pattern could be tested on other rare conditions that produce similar longitudinal records.
Independent checks of how closely GAN outputs match real marginal distributions would clarify whether performance gains rely on faithful synthesis.
Deployment in a live claims-processing pipeline could be evaluated by measuring diagnostic lead time in a prospective cohort.

Load-bearing premise

The synthetic sequences produced by the GAN match the statistical properties of real EPI patient trajectories closely enough that they do not introduce spurious patterns inflating held-out performance.

What would settle it

Training the same RNN classifier on the real data alone without any GAN-generated samples and observing whether PR-AUC falls substantially below 0.56 on the same held-out test set.

Figures

Figures reproduced from arXiv: 1907.01022 by Cao Xiao, Emily Zhao, Jimeng Sun, Kezi Yu, Lucas Glass, Yong Cai, Yunlong Wang.

**Figure 1.** Figure 1: Framework architecture illustrated. z is a random noise input to the generator of GAN. Each patient is represented by a sequence vi = {vij , j = 1, . . . , N}, of which vij is a medical code indicating a type of hospital visit (Dx) or prescription (Rx). A graphical illustration of such representation is shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: A toy example of a patient medical history sequence. Dx refers to diagnosis, Rx refers to prescription and Px refers to medical procedure. The subscripts denote different codes within each category. 2.1. Patient Record Embedding Encode medical codes. In the patient medical history sequence, each medical code is essentially a categorical variable. The number of categories (different types of medical codes… view at source ↗

**Figure 4.** Figure 4: Precision-recall curves of the SSL GAN and benchmark models, where sGAN refers to SSL GAN model. 5. Discussion The problem of semi-supervised learning often comes with the issue of limited labeled data, and sometimes extreme class imbalance. In our problem of interest, we had both issues. In order to improve the classification performance, it is crucial to fully make use of unlabeled data. By comparing th… view at source ↗

**Figure 3.** Figure 3: The visualization result by t-SNE of medical codes. Blue and orange dots are respiratory diagnosis (Dx) and prescription (Rx) codes, respectively. The green and red are Dx and Rx codes for mental diseases. 4.2. Model comparison The PR-AUC by the SSL GAN was 0.56, and the deep neural network with the same architecture as the discriminator had a score of 0.52. We saw a relative increase of 6% over the best … view at source ↗

read the original abstract

Rare diseases affecting 350 million individuals are commonly associated with delay in diagnosis or misdiagnosis. To improve those patients' outcome, rare disease detection is an important task for identifying patients with rare conditions based on longitudinal medical claims. In this paper, we present a deep learning method for detecting patients with exocrine pancreatic insufficiency (EPI) (a rare disease). The contribution includes 1) a large longitudinal study using 7 years medical claims from 1.8 million patients including 29,149 EPI patients, 2) a new deep learning model using generative adversarial networks (GANs) to boost rare disease class, and also leveraging recurrent neural networks to model patient sequence data, 3) an accurate prediction with 0.56 PR-AUC which outperformed benchmark models in terms of precision and recall.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GAN-RNN combo applied to EPI detection in claims data hits 0.56 PR-AUC on a large cohort but skips any check that the synthetic sequences match real patient trajectories.

read the letter

The main point is that this paper takes an existing GAN-plus-RNN pattern, applies it to exocrine pancreatic insufficiency detection in a 1.8 million patient claims dataset spanning seven years, and reports 0.56 PR-AUC that beats their benchmarks. The scale of the study and the focus on a genuinely rare condition with severe imbalance are the parts that stand out as useful. They generate synthetic sequences to enlarge the positive class and then train an RNN on the combined data, which is a straightforward way to handle the problem they describe. That alone gives the work some practical relevance for people working on medical claims prediction. The dataset size and the concrete performance number on a previously unreported EPI cohort are the clearest contributions. The soft spot is the missing validation on the GAN output itself. Nothing shows whether the generated sequences preserve real event timing, co-occurrence rates, or diversity, so the reported gain could come from the classifier learning artifacts that do not appear in held-out real data. The abstract and available description also give little on architecture choices, training procedure, or ablation results, which makes it hard to judge how much the method actually drives the result versus tuning. This is the kind of paper that might interest applied groups in healthcare ML who need ideas for imbalance in longitudinal data, but it will not change core methods or serve as a strong baseline without the fidelity checks. It deserves peer review because the clinical task is real and the data scale gives it weight, even if the current version needs those additions to hold up.

Referee Report

1 major / 1 minor

Summary. The paper presents a deep learning method for detecting exocrine pancreatic insufficiency (EPI) patients from longitudinal medical claims data. It uses GANs to augment the rare EPI class (29,149 cases) and RNNs to model patient sequences, reporting a PR-AUC of 0.56 on a dataset of 1.8 million patients over 7 years that outperforms benchmark models.

Significance. If the GAN augmentation is shown to preserve real distributional properties without artifacts, the work could meaningfully advance rare-disease detection by mitigating extreme class imbalance in claims data. The scale of the study is a positive feature, but the absence of fidelity diagnostics makes the practical significance difficult to assess.

major comments (1)

[Abstract / Methods (implied)] The central performance claim (0.56 PR-AUC) rests on training the RNN classifier with GAN-generated synthetic EPI sequences, yet the manuscript contains no quantitative checks (sequence-length statistics, event co-occurrence frequencies, or temporal autocorrelation) comparing real versus generated trajectories; without these, it is impossible to exclude the possibility that the reported gain arises from GAN artifacts rather than genuine generalization.

minor comments (1)

[Abstract] The abstract would be clearer if it named the specific GAN variant and RNN architecture (e.g., LSTM vs. GRU) and stated the train/validation/test split ratios.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive criticism. We address the single major comment below.

read point-by-point responses

Referee: The central performance claim (0.56 PR-AUC) rests on training the RNN classifier with GAN-generated synthetic EPI sequences, yet the manuscript contains no quantitative checks (sequence-length statistics, event co-occurrence frequencies, or temporal autocorrelation) comparing real versus generated trajectories; without these, it is impossible to exclude the possibility that the reported gain arises from GAN artifacts rather than genuine generalization.

Authors: We agree that the manuscript does not report direct quantitative fidelity diagnostics comparing real and GAN-generated sequences. We note, however, that all reported metrics (including the 0.56 PR-AUC) are obtained on a held-out test set consisting exclusively of real patient trajectories never seen during training or GAN generation. Systematic artifacts in the synthetic data would therefore be expected to harm rather than improve performance on this real test distribution. The observed outperformance of multiple benchmarks therefore supplies indirect evidence that the generated sequences capture useful structure. Nevertheless, we will add the requested comparisons (sequence-length histograms, event co-occurrence frequencies, and temporal autocorrelation) between real and synthetic trajectories to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical PR-AUC is external held-out evaluation

full rationale

The paper describes an empirical pipeline: GAN augmentation of the minority EPI class followed by RNN training and evaluation on held-out real patient sequences, yielding 0.56 PR-AUC. No equations, fitted parameters renamed as predictions, or self-citation chains reduce this metric to a definitional tautology. The GAN-fidelity assumption is a modeling risk but does not create circularity by construction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that the GAN produces unbiased synthetic sequences.

pith-pipeline@v0.9.0 · 5677 in / 1175 out tokens · 35430 ms · 2026-05-25T11:37:49.948738+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 8 internal anchors

[1]

An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease

Che, C., Xiao, C., Liang, J., Jin, B., Zho, J., and Wang, F. An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease. In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 198–206. SIAM,

work page 2017
[2]

G., and Liu, Y

Che, Z., Purushotham, S., Khemani, R. G., and Liu, Y . Inter- pretable deep models for icu outcome prediction. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2016: 371–380,

work page 2016
[3]

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Cho, K., Van Merri¨enboer, B., Bahdanau, D., and Bengio, Y . On the properties of neural machine translation: Encoder- decoder approaches. arXiv preprint arXiv:1409.1259 ,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y . Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

L., and Ranganath, R

Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L., and Ranganath, R. Opportunities in machine learning for healthcare. arXiv preprint arXiv:1806.00388,

work page arXiv
[6]

Goodfellow, I. J. On distinguishability criteria for estimat- ing generative models. arXiv preprint arXiv:1412.6515,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Bidirectional LSTM-CRF Models for Sequence Tagging

Huang, Z., Xu, W., and Yu, K. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Priority medicines for europe and the world update 2013 report

Kaplan, W., Wirtz, V ., Mantel, A., and B´eatrice, P. Priority medicines for europe and the world update 2013 report. Methodology, 2(7):99–102,

work page 2013
[9]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Semi-supervised Rare Disease Detection Using Generative Adversarial Network

Li, W., Wang, Y ., Cai, Y ., Arnold, C., Zhao, E., and Yuan, Y . Semi-supervised rare disease detection using generative adversarial network. arXiv preprint arXiv:1812.00547,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Learning to Diagnose with LSTM Recurrent Neural Networks

Lipton, Z. C., Kale, D. C., Elkan, C., and Wetzel, R. Learn- ing to diagnose with lstm recurrent neural networks.arXiv preprint arXiv:1511.03677,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Glove: Global vectors for word representation

Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural lan- guage processing (EMNLP), pp. 1532–1543,

work page 2014
[13]

Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958,

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958,

work page 1929
[14]

Energy-based Generative Adversarial Network

Zhao, J., Mathieu, M., and LeCun, Y . Energy- based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[1] [1]

An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease

Che, C., Xiao, C., Liang, J., Jin, B., Zho, J., and Wang, F. An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease. In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 198–206. SIAM,

work page 2017

[2] [2]

G., and Liu, Y

Che, Z., Purushotham, S., Khemani, R. G., and Liu, Y . Inter- pretable deep models for icu outcome prediction. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2016: 371–380,

work page 2016

[3] [3]

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

Cho, K., Van Merri¨enboer, B., Bahdanau, D., and Bengio, Y . On the properties of neural machine translation: Encoder- decoder approaches. arXiv preprint arXiv:1409.1259 ,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

Chung, J., Gulcehre, C., Cho, K., and Bengio, Y . Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

L., and Ranganath, R

Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L., and Ranganath, R. Opportunities in machine learning for healthcare. arXiv preprint arXiv:1806.00388,

work page arXiv

[6] [6]

Goodfellow, I. J. On distinguishability criteria for estimat- ing generative models. arXiv preprint arXiv:1412.6515,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Bidirectional LSTM-CRF Models for Sequence Tagging

Huang, Z., Xu, W., and Yu, K. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Priority medicines for europe and the world update 2013 report

Kaplan, W., Wirtz, V ., Mantel, A., and B´eatrice, P. Priority medicines for europe and the world update 2013 report. Methodology, 2(7):99–102,

work page 2013

[9] [9]

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Semi-supervised Rare Disease Detection Using Generative Adversarial Network

Li, W., Wang, Y ., Cai, Y ., Arnold, C., Zhao, E., and Yuan, Y . Semi-supervised rare disease detection using generative adversarial network. arXiv preprint arXiv:1812.00547,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Learning to Diagnose with LSTM Recurrent Neural Networks

Lipton, Z. C., Kale, D. C., Elkan, C., and Wetzel, R. Learn- ing to diagnose with lstm recurrent neural networks.arXiv preprint arXiv:1511.03677,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Glove: Global vectors for word representation

Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural lan- guage processing (EMNLP), pp. 1532–1543,

work page 2014

[13] [13]

Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958,

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overﬁtting.The Journal of Machine Learning Research, 15(1):1929–1958,

work page 1929

[14] [14]

Energy-based Generative Adversarial Network

Zhao, J., Mathieu, M., and LeCun, Y . Energy- based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016