pith. sign in

arxiv: 1907.01022 · v1 · pith:T6VOR6PWnew · submitted 2019-07-01 · 💻 cs.LG · stat.ML

Rare Disease Detection by Sequence Modeling with Generative Adversarial Networks

Pith reviewed 2026-05-25 11:37 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords rare disease detectiongenerative adversarial networksrecurrent neural networksmedical claims dataexocrine pancreatic insufficiencysequence modelingclass imbalancelongitudinal data
0
0 comments X

The pith

GANs generate synthetic rare disease sequences that, with RNN modeling of claims data, yield 0.56 PR-AUC for EPI detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep learning approach to identify patients with the rare condition exocrine pancreatic insufficiency from longitudinal medical claims records. Generative adversarial networks create additional synthetic examples of the rare class to address severe imbalance, while recurrent neural networks capture the sequential patterns across patient histories. On a dataset covering 1.8 million patients and 29,149 EPI cases drawn from seven years of claims, the combined model reaches a PR-AUC of 0.56 and exceeds several benchmark methods in precision and recall. If the synthetic data preserve the essential statistical structure of real trajectories, this framework could support earlier identification of rare-disease patients who currently face frequent diagnostic delays.

Core claim

A deep learning model that pairs generative adversarial networks to augment the rare EPI class with recurrent neural networks to process patient sequence data from medical claims achieves a PR-AUC of 0.56 on a cohort of 1.8 million patients and outperforms benchmark models.

What carries the argument

Generative adversarial networks that produce synthetic rare-disease sequences, paired with recurrent neural networks that model temporal trajectories in medical claims.

If this is right

  • The method directly addresses class imbalance in rare-disease prediction without requiring external data sources.
  • Sequence modeling on claims histories can extract predictive signal even when positive examples remain scarce.
  • Performance gains appear in both precision and recall relative to standard baselines.
  • The framework operates on routinely collected longitudinal claims spanning multiple years.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same GAN-plus-RNN pattern could be tested on other rare conditions that produce similar longitudinal records.
  • Independent checks of how closely GAN outputs match real marginal distributions would clarify whether performance gains rely on faithful synthesis.
  • Deployment in a live claims-processing pipeline could be evaluated by measuring diagnostic lead time in a prospective cohort.

Load-bearing premise

The synthetic sequences produced by the GAN match the statistical properties of real EPI patient trajectories closely enough that they do not introduce spurious patterns inflating held-out performance.

What would settle it

Training the same RNN classifier on the real data alone without any GAN-generated samples and observing whether PR-AUC falls substantially below 0.56 on the same held-out test set.

Figures

Figures reproduced from arXiv: 1907.01022 by Cao Xiao, Emily Zhao, Jimeng Sun, Kezi Yu, Lucas Glass, Yong Cai, Yunlong Wang.

Figure 1
Figure 1. Figure 1: Framework architecture illustrated. z is a random noise input to the generator of GAN. Each patient is represented by a sequence vi = {vij , j = 1, . . . , N}, of which vij is a medical code indicating a type of hospital visit (Dx) or prescription (Rx). A graphi￾cal illustration of such representation is shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A toy example of a patient medical history sequence. Dx refers to diagnosis, Rx refers to prescription and Px refers to medical procedure. The subscripts denote different codes within each category. 2.1. Patient Record Embedding Encode medical codes. In the patient medical history se￾quence, each medical code is essentially a categorical vari￾able. The number of categories (different types of medical codes… view at source ↗
Figure 4
Figure 4. Figure 4: Precision-recall curves of the SSL GAN and benchmark models, where sGAN refers to SSL GAN model. 5. Discussion The problem of semi-supervised learning often comes with the issue of limited labeled data, and sometimes extreme class imbalance. In our problem of interest, we had both issues. In order to improve the classification performance, it is crucial to fully make use of unlabeled data. By com￾paring th… view at source ↗
Figure 3
Figure 3. Figure 3: The visualization result by t-SNE of medical codes. Blue and orange dots are respiratory diagnosis (Dx) and prescription (Rx) codes, respectively. The green and red are Dx and Rx codes for mental diseases. 4.2. Model comparison The PR-AUC by the SSL GAN was 0.56, and the deep neu￾ral network with the same architecture as the discriminator had a score of 0.52. We saw a relative increase of 6% over the best … view at source ↗
read the original abstract

Rare diseases affecting 350 million individuals are commonly associated with delay in diagnosis or misdiagnosis. To improve those patients' outcome, rare disease detection is an important task for identifying patients with rare conditions based on longitudinal medical claims. In this paper, we present a deep learning method for detecting patients with exocrine pancreatic insufficiency (EPI) (a rare disease). The contribution includes 1) a large longitudinal study using 7 years medical claims from 1.8 million patients including 29,149 EPI patients, 2) a new deep learning model using generative adversarial networks (GANs) to boost rare disease class, and also leveraging recurrent neural networks to model patient sequence data, 3) an accurate prediction with 0.56 PR-AUC which outperformed benchmark models in terms of precision and recall.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper presents a deep learning method for detecting exocrine pancreatic insufficiency (EPI) patients from longitudinal medical claims data. It uses GANs to augment the rare EPI class (29,149 cases) and RNNs to model patient sequences, reporting a PR-AUC of 0.56 on a dataset of 1.8 million patients over 7 years that outperforms benchmark models.

Significance. If the GAN augmentation is shown to preserve real distributional properties without artifacts, the work could meaningfully advance rare-disease detection by mitigating extreme class imbalance in claims data. The scale of the study is a positive feature, but the absence of fidelity diagnostics makes the practical significance difficult to assess.

major comments (1)
  1. [Abstract / Methods (implied)] The central performance claim (0.56 PR-AUC) rests on training the RNN classifier with GAN-generated synthetic EPI sequences, yet the manuscript contains no quantitative checks (sequence-length statistics, event co-occurrence frequencies, or temporal autocorrelation) comparing real versus generated trajectories; without these, it is impossible to exclude the possibility that the reported gain arises from GAN artifacts rather than genuine generalization.
minor comments (1)
  1. [Abstract] The abstract would be clearer if it named the specific GAN variant and RNN architecture (e.g., LSTM vs. GRU) and stated the train/validation/test split ratios.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive criticism. We address the single major comment below.

read point-by-point responses
  1. Referee: The central performance claim (0.56 PR-AUC) rests on training the RNN classifier with GAN-generated synthetic EPI sequences, yet the manuscript contains no quantitative checks (sequence-length statistics, event co-occurrence frequencies, or temporal autocorrelation) comparing real versus generated trajectories; without these, it is impossible to exclude the possibility that the reported gain arises from GAN artifacts rather than genuine generalization.

    Authors: We agree that the manuscript does not report direct quantitative fidelity diagnostics comparing real and GAN-generated sequences. We note, however, that all reported metrics (including the 0.56 PR-AUC) are obtained on a held-out test set consisting exclusively of real patient trajectories never seen during training or GAN generation. Systematic artifacts in the synthetic data would therefore be expected to harm rather than improve performance on this real test distribution. The observed outperformance of multiple benchmarks therefore supplies indirect evidence that the generated sequences capture useful structure. Nevertheless, we will add the requested comparisons (sequence-length histograms, event co-occurrence frequencies, and temporal autocorrelation) between real and synthetic trajectories to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical PR-AUC is external held-out evaluation

full rationale

The paper describes an empirical pipeline: GAN augmentation of the minority EPI class followed by RNN training and evaluation on held-out real patient sequences, yielding 0.56 PR-AUC. No equations, fitted parameters renamed as predictions, or self-citation chains reduce this metric to a definitional tautology. The GAN-fidelity assumption is a modeling risk but does not create circularity by construction. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated premise that the GAN produces unbiased synthetic sequences.

pith-pipeline@v0.9.0 · 5677 in / 1175 out tokens · 35430 ms · 2026-05-25T11:37:49.948738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 8 internal anchors

  1. [1]

    An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease

    Che, C., Xiao, C., Liang, J., Jin, B., Zho, J., and Wang, F. An rnn architecture with dynamic temporal matching for personalized predictions of parkinson’s disease. In Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 198–206. SIAM,

  2. [2]

    G., and Liu, Y

    Che, Z., Purushotham, S., Khemani, R. G., and Liu, Y . Inter- pretable deep models for icu outcome prediction. AMIA ... Annual Symposium proceedings. AMIA Symposium, 2016: 371–380,

  3. [3]

    On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

    Cho, K., Van Merri¨enboer, B., Bahdanau, D., and Bengio, Y . On the properties of neural machine translation: Encoder- decoder approaches. arXiv preprint arXiv:1409.1259 ,

  4. [4]

    Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling

    Chung, J., Gulcehre, C., Cho, K., and Bengio, Y . Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555,

  5. [5]

    L., and Ranganath, R

    Ghassemi, M., Naumann, T., Schulam, P., Beam, A. L., and Ranganath, R. Opportunities in machine learning for healthcare. arXiv preprint arXiv:1806.00388,

  6. [6]

    Goodfellow, I. J. On distinguishability criteria for estimat- ing generative models. arXiv preprint arXiv:1412.6515,

  7. [7]

    Bidirectional LSTM-CRF Models for Sequence Tagging

    Huang, Z., Xu, W., and Yu, K. Bidirectional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991,

  8. [8]

    Priority medicines for europe and the world update 2013 report

    Kaplan, W., Wirtz, V ., Mantel, A., and B´eatrice, P. Priority medicines for europe and the world update 2013 report. Methodology, 2(7):99–102,

  9. [9]

    Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

  10. [10]

    Semi-supervised Rare Disease Detection Using Generative Adversarial Network

    Li, W., Wang, Y ., Cai, Y ., Arnold, C., Zhao, E., and Yuan, Y . Semi-supervised rare disease detection using generative adversarial network. arXiv preprint arXiv:1812.00547,

  11. [11]

    Learning to Diagnose with LSTM Recurrent Neural Networks

    Lipton, Z. C., Kale, D. C., Elkan, C., and Wetzel, R. Learn- ing to diagnose with lstm recurrent neural networks.arXiv preprint arXiv:1511.03677,

  12. [12]

    Glove: Global vectors for word representation

    Pennington, J., Socher, R., and Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural lan- guage processing (EMNLP), pp. 1532–1543,

  13. [13]

    Dropout: a simple way to prevent neural networks from overfitting.The Journal of Machine Learning Research, 15(1):1929–1958,

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting.The Journal of Machine Learning Research, 15(1):1929–1958,

  14. [14]

    Energy-based Generative Adversarial Network

    Zhao, J., Mathieu, M., and LeCun, Y . Energy- based generative adversarial network. arXiv preprint arXiv:1609.03126, 2016