pith. machine review for the scientific record. sign in

arxiv: 2604.18898 · v2 · submitted 2026-04-20 · 📊 stat.AP

Recognition: unknown

A Review of Statistical Methods for Spontaneous Reporting System Data Mining: Signal Detection and Beyond

Marianthi Markatou, Saptarshi Chakraborty, Yihao Tan

Pith reviewed 2026-05-10 02:43 UTC · model grok-4.3

classification 📊 stat.AP
keywords spontaneous reporting systemssignal detectionpharmacovigilanceadverse eventsdata miningstatistical methodsdrug safetycontingency tables
0
0 comments X

The pith

Contemporary statistical methods for spontaneous reporting system data support both binary signal detection and estimation of signal strength with uncertainty for drug safety.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews statistical approaches to mining data from spontaneous reporting systems such as FAERS, EudraVigilance and VigiBase in order to identify associations between drugs and adverse events. It covers traditional methods that frame detection as a binary decision and more recent techniques that estimate signal strength along with uncertainty measures. The authors supply practical steps for building contingency tables from the aggregated counts that public databases release. A sympathetic reader would care because postmarketing surveillance depends on these tools to flag potential safety problems after drugs are approved. The review demonstrates the steps with opioid datasets drawn from real sources.

Core claim

The paper claims that a review of contemporary SRS data mining methods and their statistical underpinnings, paired with explicit guidance on constructing contingency tables from aggregated AE-drug counts, supplies a usable foundation for safety assessment across major pharmacovigilance databases.

What carries the argument

Statistical signal detection methods (including disproportionality analyses) together with the preprocessing step of building SRS contingency tables from publicly available aggregated counts.

Load-bearing premise

The selected contemporary methods and preprocessing steps using aggregated counts adequately represent current best practice and can be applied directly without further validation or dataset-specific adjustments.

What would settle it

An analysis of a confirmed drug-adverse event pair that produces materially weaker or stronger signals when the recommended preprocessing steps are omitted.

Figures

Figures reproduced from arXiv: 2604.18898 by Marianthi Markatou, Saptarshi Chakraborty, Yihao Tan.

Figure 1
Figure 1. Figure 1: Signal detection results on the FAERS-opioid–mental data obtained from pseudo-LRT (panel A), general [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Nonparametric empirical Bayes (general-gamma) signal strength estimation results for the FAERS [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Signal detection results on the VigiBase-opioid–mental data obtained from pseudo-LRT (panel A), general [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Nonparametric empirical Bayes (general-gamma) signal strength estimation results for the vigiBase [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
read the original abstract

Postmarketing safety surveillance relies on data from spontaneous reporting systems (SRS) such as FAERS, EudraVigilance and VigiBase, and commonly uses SRS data mining methods to assess the associations between drugs and adverse events (AEs). Traditionally, these analyses have focused on signal detection framed as a binary decision problem, whereas more recent work has emphasized more nuanced inference involving signal strength estimation and uncertainty quantification. In this paper, we review contemporary SRS data mining approaches and their statistical underpinnings for safety assessment using data from major pharmacovigilance databases worldwide. In addition to methodological review, we provide practical guidance on data preprocessing for such analysis, including construction of SRS contingency tables using only aggregated AE-drug counts, as are publicly available from databases such as VigiBase and EudraVigilance. We illustrate the guidance via opioid-related datasets obtained from FAERS and VigiBase, complied with subsequent downstream SRS data analyses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript reviews contemporary statistical methods for mining spontaneous reporting system (SRS) data from databases such as FAERS, EudraVigilance, and VigiBase, covering traditional signal detection framed as binary decisions as well as more recent approaches to signal strength estimation and uncertainty quantification. It also supplies practical guidance on preprocessing steps to construct contingency tables from publicly available aggregated AE-drug counts and illustrates the guidance with opioid-related datasets from FAERS and VigiBase.

Significance. If the summaries of methods are accurate and the preprocessing guidance is internally consistent with the stated scope of public aggregated tables, the paper would serve as a useful reference for pharmacovigilance researchers seeking to move beyond binary signal detection toward nuanced inference while working with readily accessible data sources.

minor comments (1)
  1. [Abstract] Abstract, final sentence: the word 'complied' is almost certainly a typographical error and should read 'combined' to make the intended meaning clear.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our manuscript, accurate characterization of its scope, and recommendation for minor revision. The referee's assessment aligns well with our intent to provide both a methodological review and practical preprocessing guidance for SRS data mining.

Circularity Check

0 steps flagged

No significant circularity in this review paper

full rationale

This is a review paper summarizing existing SRS data mining methods from external literature and offering practical preprocessing guidance for aggregated counts from public databases. No new derivations, predictions, or equations are introduced that could reduce to the paper's own inputs by construction. The claims are descriptive and illustrative (e.g., opioid example as demonstration, not proof), with methods attributed to cited sources rather than self-referential fits or definitions. Any self-citations are incidental and non-load-bearing for novel results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper; no new free parameters, axioms, or invented entities are introduced.

pith-pipeline@v0.9.0 · 5466 in / 1011 out tokens · 44024 ms · 2026-05-10T02:43:09.810739+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 2 canonical work pages

  1. [1]

    What is Pharmacovigilance? Accessed: 2025-09-03, https://www.who.int/ teams/regulation-prequalification/regulation-and-safety/pharmacovigilance

    World Health Organization. What is Pharmacovigilance? Accessed: 2025-09-03, https://www.who.int/ teams/regulation-prequalification/regulation-and-safety/pharmacovigilance

  2. [2]

    FDA Adverse Event Reporting System

    US Food and Drug Administration. FDA Adverse Event Reporting System. Accessed: 2025-09-03, https: //open.fda.gov/data/faers/

  3. [3]

    EudraVigilance

    European Medicines Agency. EudraVigilance. Accessed: 2025-09-03, https://www.ema.europa.eu/en/human- regulatory-overview/research-development/pharmacovigilance-research-development/eudravigilance

  4. [4]

    Accessing global data with VigiBase search services

    World Health Organization. Accessing global data with VigiBase search services. Accessed: 2025-09-03, https://who-umc.org/vigibase-search-services/

  5. [5]

    Marianthi Markatou and Robert Ball. A pattern discovery framework for adverse event evaluation and inference in spontaneous reporting systems.Statistical Analysis and Data Mining: The ASA Data Science Journal, 7(5):352–367, 2014

  6. [6]

    Use of electronic health record data for drug safety signal identification: a scoping review.Drug Safety, 46(8):725–742, 2023

    Sharon E Davis, Luke Zabotka, Rishi J Desai, Shirley V Wang, Judith C Maro, Kevin Coughlin, José J Hernández- Muñoz, Danijela Stojanovic, Nigam H Shah, and Joshua C Smith. Use of electronic health record data for drug safety signal identification: a scoping review.Drug Safety, 46(8):725–742, 2023

  7. [7]

    Yihao Tan, Marianthi Markatou, and Saptarshi Chakraborty. Flexible empirical bayesian approaches to pharma- covigilance for simultaneous signal detection and signal strength estimation in spontaneous reporting systems data.Statistics in Medicine, 44(18-19):e70195, 2025

  8. [8]

    Use of proportional reporting ratios (prrs) for signal generation from spontaneous adverse drug reaction reports.Pharmacoepidemiology and Drug Safety, 10(6):483–486, 2001

    Stephen JW Evans, Patrick C Waller, and S Davis. Use of proportional reporting ratios (prrs) for signal generation from spontaneous adverse drug reaction reports.Pharmacoepidemiology and Drug Safety, 10(6):483–486, 2001

  9. [9]

    The reporting odds ratio and its advantages over the proportional reporting ratio.Pharmacoepidemiology and Drug Safety, 13(8):519–523, 2004

    Kenneth J Rothman, Stephan Lanes, and Susan T Sacks. The reporting odds ratio and its advantages over the proportional reporting ratio.Pharmacoepidemiology and Drug Safety, 13(8):519–523, 2004

  10. [10]

    A likelihood ratio test based method for signal detection with application to fda’s drug safety data.Journal of the American Statistical Association, 106(496):1230–1241, 2011

    Lan Huang, Jyoti Zalkikar, and Ram C Tiwari. A likelihood ratio test based method for signal detection with application to fda’s drug safety data.Journal of the American Statistical Association, 106(496):1230–1241, 2011

  11. [11]

    An evaluation of statistical approaches to postmarketing surveillance.Statistics in Medicine, 39(7):845–874, 2020

    Yuxin Ding, Marianthi Markatou, and Robert Ball. An evaluation of statistical approaches to postmarketing surveillance.Statistics in Medicine, 39(7):845–874, 2020

  12. [12]

    Zero-inflated poisson model based likelihood ratio test for drug safety signal detection.Statistical Methods in Medical Research, 26(1):471–488, 2017

    Lan Huang, Dan Zheng, Jyoti Zalkikar, and Ram Tiwari. Zero-inflated poisson model based likelihood ratio test for drug safety signal detection.Statistical Methods in Medical Research, 26(1):471–488, 2017

  13. [13]

    Yueqin Zhao, Min Yi, and Ram C Tiwari. Extended likelihood ratio test-based methods for signal detection in a drug class with application to fda’s adverse event reporting system database.Statistical Methods in Medical Research, 27(3):876–890, 2018

  14. [14]

    On the use of the likelihood ratio test methodology in pharmacovigilance.Statistics in Medicine, 41(27):5395–5420, 2022

    Saptarshi Chakraborty, Anran Liu, Robert Ball, and Marianthi Markatou. On the use of the likelihood ratio test methodology in pharmacovigilance.Statistics in Medicine, 41(27):5395–5420, 2022

  15. [15]

    A bayesian neural network method for adverse drug reaction signal generation.European Journal of Clinical Pharmacology, 54:315–321, 1998

    Andrew Bate, Marie Lindquist, I Ralph Edwards, Sten Olsson, Roland Orre, Anders Lansner, and R Melhado De Freitas. A bayesian neural network method for adverse drug reaction signal generation.European Journal of Clinical Pharmacology, 54:315–321, 1998

  16. [16]

    Bayesian data mining in large frequency tables, with an application to the fda spontaneous reporting system.The American Statistician, 53(3):177–190, 1999

    William DuMouchel. Bayesian data mining in large frequency tables, with an application to the fda spontaneous reporting system.The American Statistician, 53(3):177–190, 1999

  17. [17]

    Empirical bayes screening for multi-item associations

    William DuMouchel and Daryl Pregibon. Empirical bayes screening for multi-item associations. InProceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 67–76, 2001

  18. [18]

    Extended multi-item gamma poisson shrinker methods based on the zero-inflated poisson model for postmarket drug safety surveillance.Statistics in Medicine, 39(30):4636–4650, 2020

    Seok-Jae Heo and Inkyung Jung. Extended multi-item gamma poisson shrinker methods based on the zero-inflated poisson model for postmarket drug safety surveillance.Statistics in Medicine, 39(30):4636–4650, 2020

  19. [19]

    Signal detection in FDA AERS database using Dirichlet process.Statistics in Medicine, 34(19):2725–2742, 2015

    Na Hu, Lan Huang, and Ram C Tiwari. Signal detection in FDA AERS database using Dirichlet process.Statistics in Medicine, 34(19):2725–2742, 2015

  20. [20]

    Convex optimization, shape constraints, compound decisions, and empirical bayes rules.Journal of the American Statistical Association, 109(506):674–685, 2014

    Roger Koenker and Ivan Mizera. Convex optimization, shape constraints, compound decisions, and empirical bayes rules.Journal of the American Statistical Association, 109(506):674–685, 2014

  21. [21]

    Empirical Bayes deconvolution estimates.Biometrika, 103(1):1–20, 2016

    Bradley Efron. Empirical Bayes deconvolution estimates.Biometrika, 103(1):1–20, 2016

  22. [22]

    MDDC: An R and Python package for adverse event identification in pharmacovigilance data.Scientific Reports, 15(1):21317, 2025

    Anran Liu, Raktim Mukhopadhyay, and Marianthi Markatou. MDDC: An R and Python package for adverse event identification in pharmacovigilance data.Scientific Reports, 15(1):21317, 2025. 18 APREPRINT

  23. [23]

    Vaccine adverse event enrichment tests.Statistics in Medicine, 40(19):4269–4278, 2021

    Shuoran Li and Lili Zhao. Vaccine adverse event enrichment tests.Statistics in Medicine, 40(19):4269–4278, 2021

  24. [24]

    Sequential generalized likelihood ratio tests for vaccine safety evaluation.Statistics in Medicine, 29(26):2698–2708, 2010

    Mei-Chiung Shih, Tze Leung Lai, Joseph F Heyse, and Jie Chen. Sequential generalized likelihood ratio tests for vaccine safety evaluation.Statistics in Medicine, 29(26):2698–2708, 2010

  25. [25]

    New adaptive lasso approaches for variable selection in automated pharmacovigilance signal detection.BMC Medical Research Methodology, 21(1):271, 2021

    Émeline Courtois, Pascale Tubert-Bitter, and Ismaïl Ahmed. New adaptive lasso approaches for variable selection in automated pharmacovigilance signal detection.BMC Medical Research Methodology, 21(1):271, 2021

  26. [26]

    R package version 1.0.8

    Ismaïl Ahmed and Antoine Poncet.PhViD: PharmacoVigilance Signal Detection, 2016. R package version 1.0.8

  27. [27]

    John Ihrie and Travis Canida.openEBGM: EBGM Disproportionality Scores for Adverse Event Data Mining,

  28. [28]

    R package version 0.9.1

  29. [29]

    openEBGM: an R implementation of the gamma-Poisson shrinker data mining model.The R journal, 9(2):499–519, 2017

    Travis Canida and John Ihrie. openEBGM: an R implementation of the gamma-Poisson shrinker data mining model.The R journal, 9(2):499–519, 2017

  30. [30]

    R package version 0.5.1

    Anran Liu Saptarshi Chakraborty, Marianthi Markatou.pvLRT: Likelihood Ratio Test-Based Approaches to Pharmacovigilance, 2023. R package version 0.5.1

  31. [31]

    Likelihood Ratio Test-Based Drug Safety Assess- ment using R Package pvLRT.The R Journal, 15:101–121, 2023

    Saptarshi Chakraborty, Marianthi Markatou, and Robert Ball. Likelihood Ratio Test-Based Drug Safety Assess- ment using R Package pvLRT.The R Journal, 15:101–121, 2023. https://doi.org/10.32614/RJ-2023-027

  32. [32]

    R package version 0.8

    Balasubramanian Narasimhan.sglr: Sequential Generalized Likelihood Ratio Decision Boundaries, 2022. R package version 0.8

  33. [33]

    R package version 4.5.2

    Martin Kulldorff Ivair Ramos Silva.Sequential: Exact Sequential Analysis for Poisson and Binomial Data, 2025. R package version 4.5.2

  34. [34]

    R package version 1.1.1

    Shuoran Li et al.AEenrich: Adverse Event Enrichment Tests, 2026. R package version 1.1.1

  35. [35]

    R package version 1.1.0

    Marianthi Markatou Anran Liu, Raktim Mukhopadhyay.MDDC: Modified Detecting Deviating Cells Algorithm in Pharmacovigilance, 2025. R package version 1.1.0

  36. [36]

    R package version 0.2-3

    Hervé Perdry Emeline Courtois, Ismaïl Ahmed.adapt4pv: Adaptive Approaches for Signal Detection in Pharmacovigilance, 2023. R package version 0.2-3

  37. [37]

    R package version 0.2.2

    Yihao Tan, Saptarshi Chakraborty, Marianthi Markatou, and Raktim Mukhopadhyay.pvEBayes: Empirical Bayes Models for Pharmacovigilance, 2026. R package version 0.2.2

  38. [38]

    pvebayes: An r package for empirical bayes methods in pharmacovigilance.arXiv preprint arXiv:2512.01057, 2025

    Yihao Tan, Marianthi Markatou, and Saptarshi Chakraborty. pvebayes: An r package for empirical bayes methods in pharmacovigilance.arXiv preprint arXiv:2512.01057, 2025

  39. [39]

    Ahmed, C

    I. Ahmed, C. Dalmasso, F. Haramburu, F. Thiessard, P. Broët, and P. Tubert-Bitter. False discovery rate estimation for frequentist pharmacovigilance signal detection methods.Biometrics, 66(1):301–309, 03 2010

  40. [40]

    FDR and Bayesian Multiple Comparisons Rules

    Peter Müller, Giovanni Parmigiani, and Kenneth Rice. FDR and Bayesian Multiple Comparisons Rules. In Bayesian Statistics 8: Proceedings of the Eighth Valencia International Meeting, page 349–370. Oxford University Press, 07 2006

  41. [41]

    Chapman and Hall/CRC, 2017

    Simon N Wood.Generalized additive models: an introduction with R. Chapman and Hall/CRC, 2017

  42. [42]

    From here to infinity: sparse finite versus dirichlet process mixtures in model-based clustering.Advances in Data Analysis and Classification, 13:33–64, 2019

    Sylvia Frühwirth-Schnatter and Gertraud Malsiner-Walli. From here to infinity: sparse finite versus dirichlet process mixtures in model-based clustering.Advances in Data Analysis and Classification, 13:33–64, 2019

  43. [43]

    Model-based clustering based on sparse finite gaussian mixtures.Statistics and Computing, 26(1):303–324, 2016

    Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, and Bettina Grün. Model-based clustering based on sparse finite gaussian mixtures.Statistics and Computing, 26(1):303–324, 2016

  44. [44]

    Identifying mixtures of mixtures using bayesian estimation.Journal of Computational and Graphical Statistics, 26(2):285–295, 2017

    Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, and Bettina Grün. Identifying mixtures of mixtures using bayesian estimation.Journal of Computational and Graphical Statistics, 26(2):285–295, 2017

  45. [45]

    Asymptotic behaviour of the posterior distribution in overfitted mixture models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(5):689–710, 2011

    Judith Rousseau and Kerrie Mengersen. Asymptotic behaviour of the posterior distribution in overfitted mixture models.Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(5):689–710, 2011

  46. [46]

    European medicines agency policy on access to eudravigilance data for medicinal products for human use, 2025

    European Medicines Agency. European medicines agency policy on access to eudravigilance data for medicinal products for human use, 2025. Accessed: 2025-12-27,https://www.ema.europa.eu/en/documents/other/ european-medicines-agency-policy-access-eudravigilance-data-medicinal-products-human-use_ en.pdf

  47. [47]

    Survigilance: An application for accessing global pharmacovig- ilance data.SoftwareX, 34:102546, 2026

    Raktim Mukhopadhyay and Marianthi Markatou. Survigilance: An application for accessing global pharmacovig- ilance data.SoftwareX, 34:102546, 2026

  48. [48]

    Pentazocine (injection route) - side effects & uses

    Mayo Clinic. Pentazocine (injection route) - side effects & uses. Mayo Clinic: Drugs & Supple- ments, December 2025. Accessed: 2025-12-27, https://www.mayoclinic.org/drugs-supplements/ pentazocine-injection-route/description/drg-20074265. 19 APREPRINT

  49. [49]

    Pentazocine and naloxone (oral route) - side effects & dosage

    Mayo Clinic. Pentazocine and naloxone (oral route) - side effects & dosage. Mayo Clinic: Drugs & Sup- plements, December 2025. Accessed: 2025-12-27, https://www.mayoclinic.org/drugs-supplements/ pentazocine-and-naloxone-oral-route/description/drg-20074147

  50. [50]

    Oracle Corporation, 2025

    Oracle Corporation.Oracle Life Sciences Empirica Documentation, Release 2025.4.02. Oracle Corporation, 2025. Accessed: 2026-04-06

  51. [51]

    Bayesian pharmacovigilance signal detection methods revisited in a multiple comparison setting.Statistics in Medicine, 28(13):1774–1792, 2009

    Ismaïl Ahmed, Françoise Haramburu, Annie Fourrier-Réglat, Frantz Thiessard, Carmen Kreft-Jais, Ghada Miremont-Salamé, Bernard Bégaud, and Pascale Tubert-Bitter. Bayesian pharmacovigilance signal detection methods revisited in a multiple comparison setting.Statistics in Medicine, 28(13):1774–1792, 2009. 20