pith. machine review for the scientific record. sign in

arxiv: 2605.13905 · v1 · submitted 2026-05-13 · 💻 cs.SE · cs.AI

Recognition: no theorem link

A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:58 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords legacy systemsclinical reportingAI integrationpharmacoinformaticsSASmetadata layerintermediate representationregulatory compliance
0
0 comments X

The pith

A metadata layer wraps legacy SAS reporting systems to enable AI integration without altering source code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a non-destructive framework that adds a metadata layer around existing legacy clinical reporting pipelines, especially SAS-based ones used in drug development. This layer uses a bridge map, typed intermediate representation, and orchestrator to convert opaque outputs into structured, machine-readable data that LLMs can consume directly. It supports immediate coexistence with unchanged legacy components or optional incremental replacement of selected parts. Validation on a 558-component SAS library showed 92 percent reduction in proprietary code where consolidation occurred, plus cell-level parity above 80 percent on most tested reports and full parity on benchmarks. The approach allows AI tasks such as pharmacovigilance and report summarization while preserving regulatory compliance.

Core claim

We present a non-destructive methodological framework achieving AI-driven pharmacoinformatics readiness without altering legacy source code. A metadata layer—comprising a bridge map, a typed Intermediate Representation (IR), and an orchestrator—wraps existing components and re-exposes their outputs as structured data consumable by LLMs. It enables optional incremental consolidation, replacing selected legacy components with metadata-configured core routines while the remainder operates unchanged.

What carries the argument

Metadata layer with bridge map, typed Intermediate Representation (IR), and orchestrator that wraps legacy components to produce structured, LLM-consumable output.

Load-bearing premise

The metadata layer and typed IR can accurately capture all regulatory-grade logic from the legacy SAS components without loss of fidelity or introduction of errors.

What would settle it

Running the framework on a new report type and finding that the IR output deviates materially from the legacy SAS output or produces LLM summaries that fail manual expert review for regulatory accuracy.

Figures

Figures reproduced from arXiv: 2605.13905 by Jaime Yan.

Figure 1
Figure 1. Figure 1: Component taxonomy of the legacy 558-macro library, classified by functional role. Formatting and display macros constitute the largest category (27%), reflecting the high proportion of code dedicated to output appearance rather than statistical computation. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Five-phase pipeline architecture with explicit input/output contracts at each layer boundary. YAML configuration (top) feeds into every phase, while the Intermediate Representation between Compute and Render serves as the key decoupling contract. Cross-cutting concerns span all phases. Render (28, 3,139 LOC) – re_rtf, ir_to_json, ir_to_html ir_cells + ir_structure (typed cell grid) Framework (9, 790 LOC) –… view at source ↗
Figure 3
Figure 3. Figure 3: Layered architecture showing six SAS layers (ADaM Derivation, Data Preparation, Compute, Framework, Bridge Compatibility, Render) with explicit contracts at each boundary, plus the Python orchestration layer and AI interfaces. The bridge layer provides a 365-entry facade over the legacy library. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Intermediate Representation (IR) schema showing the decoupling of statistical computation from rendering. The cl_to_ir macro transforms long-format compute output (left) into a grid-based cell model consisting of ir_cells and ir_structure datasets (centre). The format-agnostic IR feeds multiple render engines (right). Numeric integrity is verified by cl_ir_reconcile. 3.2.4 YAML Registry and Configuration T… view at source ↗
Figure 5
Figure 5. Figure 5: AI integration pathways under the non-destructive adoption pattern... Left column: legacy workflow producing RTF that is opaque to AI agents. Centre column: coexistence-mode adoption, in which the legacy library is preserved unchanged and the metadata layer (bridge map + IR) re-exposes outputs as JSON consumable by LLMs, R, and Python. Right column: optional consolidation-mode adoption, in which selected l… view at source ↗
Figure 6
Figure 6. Figure 6: Parity validation methodology. (A) The parity harness executes each report through both the legacy macro library and the modernized pipeline, then performs cell-level comparison to produce a PASS/FAIL/ERROR/SKIP triage. (B) The seven-gate progression model ensures incremental validation from configuration syntax checks through full regression matrix. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Component count reduction from legacy library (558 components) to modernized SAS core (80 core macros plus 78 supporting/framework/bridge files = 158 total SAS files), by functional category. The largest reductions occur in Formatting/Display (100%, absorbed into the IR and render layer) and Statistical Compute (83%, consolidated from 120 specialized macros into parameterized core macros). Right panel show… view at source ↗
read the original abstract

Drug development and pharmacovigilance are frequently bottlenecked by legacy clinical reporting pipelines. These monolithic systems encode regulatory-grade logic but resist AI integration by producing opaque output with no machine-readable intermediate layer. Existing modernization approaches force a choice between full rewrites and incremental refactoring that preserves structural barriers. We present a non-destructive methodological framework achieving AI-driven pharmacoinformatics readiness without altering legacy source code. A metadata layer--comprising a bridge map, a typed Intermediate Representation (IR), and an orchestrator--wraps existing components and re-exposes their outputs as structured data consumable by LLMs. It enables optional incremental consolidation, replacing selected legacy components with metadata-configured core routines while the remainder operates unchanged. Validated on a 558-component SAS reporting library (373,000 lines of code), the framework demonstrated immediate AI-readiness under coexistence mode, yielding machine-readable output. Where consolidation was elected, the modernized core achieved a 92% reduction in proprietary code. Parity validation on 14 report types from a Phase III study achieved cell-level parity of 80% or above on 11 reports (mean 82.7%, best 99.2%). A benchmark using CDISC CDISCPilot01 data achieved 100% parity across 5 reports. LLM experiments confirmed the IR enables automated pharmacovigilance, table summarization, and trial configuration generation. The framework offers a regulation-aware path to AI-integrated clinical reporting, accelerating drug development without interrupting regulatory submissions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to present a non-destructive methodological framework for modernizing legacy SAS clinical reporting systems (558 components, 373k lines) via a metadata layer consisting of a bridge map, typed Intermediate Representation (IR), and orchestrator. This wrapper enables AI-driven pharmacoinformatics readiness and optional incremental consolidation (92% proprietary code reduction) without altering original source code, supported by cell-level output parity results (mean 82.7% on 14 Phase III reports, 100% on 5 CDISC CDISCPilot01 reports) and LLM usability demonstrations.

Significance. If the typed IR and metadata layer can be shown to preserve regulatory-grade logic fidelity, the framework would offer a practical path for the pharmaceutical industry to integrate AI tools into existing clinical pipelines while avoiding disruptive rewrites, potentially reducing bottlenecks in drug development and pharmacovigilance.

major comments (2)
  1. [Validation results] Validation results: Cell-level output parity (80%+ on 11 of 14 Phase III reports, mean 82.7%) is used to support the claim of no loss of fidelity in capturing regulatory logic, but this metric does not establish logic equivalence; SAS macros, implicit data steps, and conditional validations can produce matching cells via alternate paths that the IR may omit or simplify, as noted in the absence of exhaustive path comparisons or formal equivalence checks.
  2. [Framework description] Framework description: The manuscript provides high-level descriptions of the bridge map, typed IR, and orchestrator but lacks concrete details on how the typed IR is derived from the 373k-line SAS library (e.g., handling of specific macro expansions or regulatory checks), which is load-bearing for the non-destructive and fidelity claims.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'immediate AI-readiness under coexistence mode' would benefit from a brief concrete example of an LLM query against the IR output to illustrate the claimed usability.
  2. [Results] The 92% code reduction figure is reported for elected consolidation cases but does not specify which subset of the 558 components was replaced or the baseline for the reduction calculation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. The comments highlight important distinctions between output parity and formal logic equivalence, as well as the need for greater specificity in describing the IR derivation process. We address each point below and have revised the manuscript to incorporate clarifications and additional details.

read point-by-point responses
  1. Referee: [Validation results] Validation results: Cell-level output parity (80%+ on 11 of 14 Phase III reports, mean 82.7%) is used to support the claim of no loss of fidelity in capturing regulatory logic, but this metric does not establish logic equivalence; SAS macros, implicit data steps, and conditional validations can produce matching cells via alternate paths that the IR may omit or simplify, as noted in the absence of exhaustive path comparisons or formal equivalence checks.

    Authors: We agree that cell-level output parity does not constitute formal proof of logic equivalence, as alternate execution paths in SAS (e.g., via macros or conditional data steps) could yield identical cells without the IR capturing every intermediate step. Our validation was designed to demonstrate practical fidelity for regulatory submissions, where the final report outputs determine compliance rather than internal execution traces. To address the referee's concern, we have added a dedicated limitations subsection in the revised manuscript that explicitly notes the absence of exhaustive path coverage or formal equivalence proofs (such as bisimulation or model checking) and clarifies that the framework's non-destructive claim rests on output parity for coexistence and consolidation scenarios. This revision maintains the original empirical results while tempering the language around 'no loss of fidelity' to 'preservation of regulatory-grade output parity.' revision: yes

  2. Referee: [Framework description] Framework description: The manuscript provides high-level descriptions of the bridge map, typed IR, and orchestrator but lacks concrete details on how the typed IR is derived from the 373k-line SAS library (e.g., handling of specific macro expansions or regulatory checks), which is load-bearing for the non-destructive and fidelity claims.

    Authors: We acknowledge that the original description of IR derivation was kept at a high level to emphasize the overall methodology. In the revised manuscript, we have expanded the 'Deriving the Typed Intermediate Representation' subsection with concrete implementation details. This includes: (1) pseudocode for the bridge map's macro expansion traversal, showing how SAS %macro calls are resolved into typed nodes without executing the original code; (2) an example of encoding a regulatory check (e.g., a CDISC SDTM variable validation) as a typed IR constraint; and (3) a small annotated excerpt from the 373k-line library illustrating the mapping of a data step with implicit conditions to IR structures. These additions directly support the non-destructive and fidelity claims by making the derivation process reproducible from the provided high-level architecture. revision: yes

Circularity Check

0 steps flagged

No circularity: framework validated on external benchmarks without self-referential reductions

full rationale

The paper describes a methodological framework that wraps legacy SAS components via a metadata layer, typed IR, and orchestrator, achieving AI-readiness without source changes. All load-bearing claims rest on direct empirical validation through cell-level output parity against independent external datasets (14 Phase III reports and CDISC CDISCPilot01), with no fitted parameters, self-defined predictions, equations, or self-citations that reduce the central result to its own inputs by construction. The derivation chain is self-contained as a descriptive proposal plus parity checks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on assumptions about software wrapping and metadata sufficiency rather than new mathematical derivations or invented entities.

axioms (1)
  • domain assumption Legacy SAS components produce consistent outputs that can be accurately captured and represented in a typed intermediate representation without loss of regulatory logic.
    Invoked to support parity validation claims and AI-readiness assertions.

pith-pipeline@v0.9.0 · 5572 in / 1176 out tokens · 47721 ms · 2026-05-15T05:58:11.954271+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Artificial intelligence for clinical trial design.Trends Pharmacol Sci.2019;40(8):577–591

    Harrer S, Shah P, Antony B, Hu J. Artificial intelligence for clinical trial design.Trends Pharmacol Sci.2019;40(8):577–591. doi:10.1016/j.tips.2019.05.005

  2. [2]

    A scoping review of cloud computing in healthcare

    Griebel L, Prokosch HU, Köpcke F, et al. A scoping review of cloud computing in healthcare. BMC Med Inform Decis Mak.2015;15:17. doi:10.1186/s12911-015-0145-7

  3. [3]

    Food and Drug Administration.Providing regulatory submissions in electronic format – standardized study data: guidance for industry

    U.S. Food and Drug Administration.Providing regulatory submissions in electronic format – standardized study data: guidance for industry. Silver Spring, MD: FDA; 2021. Available from: https://www.fda.gov/regulatory-information/search-fda-guidance-documents

  4. [4]

    Food and Drug Administration.Electronic Submissions Gateway: submission statis- tics

    U.S. Food and Drug Administration.Electronic Submissions Gateway: submission statis- tics. Silver Spring, MD: FDA; 2024. Available from: https://www.fda.gov/industry/ electronic-submissions-gateway

  5. [5]

    Shostak J.SAS programming in the pharmaceutical industry. 2nd ed. Cary, NC: SAS Institute; 2014

  6. [6]

    Organizing and building a centralized SAS macro library

    Fu S, Wu J. Organizing and building a centralized SAS macro library. In:Proceedings of PharmaSUG; 2004. Paper AD11

  7. [7]

    Managing the organization of SAS format and macro code libraries in complex environments

    Muller RD. Managing the organization of SAS format and macro code libraries in complex environments. In:Proceedings of the SAS Global Forum; 2014. Paper 226-2014

  8. [8]

    Architecting a regulatory compliant macro library using SAS Drug Development

    Redner V, Coughlin MM. Architecting a regulatory compliant macro library using SAS Drug Development. In:Proceedings of PharmaSUG; 2007. Paper PO17

  9. [9]

    Carpenter AL.Carpenter’s complete guide to the SAS macro language. 3rd ed. Cary, NC: SAS Institute; 2016

  10. [10]

    Bass L, Clements P, Kazman R.Software architecture in practice. 4th ed. Boston, MA: Addison-Wesley; 2021

  11. [11]

    Cary, NC: SAS Institute; 2023

    SAS Institute.SAS Viya: architecture overview. Cary, NC: SAS Institute; 2023. Available from: https://www.sas.com/en_us/software/viya.html

  12. [12]

    Large language models in medicine.Nat Med.2023;29(8):1930–1940

    Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine.Nat Med.2023;29(8):1930–1940. doi:10.1038/s41591-023-02448- 8 1Repository URL will be provided upon acceptance; currently anonymized for peer review. 27

  13. [13]

    Adapted large language models can outperform medical experts in clinical text summarization.Nat Med.2024;30(4):1134–1142

    Van Veen D, Van Uden C, Blankemeier L, et al. Adapted large language models can outperform medical experts in clinical text summarization.Nat Med.2024;30(4):1134–1142. doi:10.1038/s41591-024-02855-5

  14. [14]

    Austin, TX: CDISC; 2024

    Clinical Data Interchange Standards Consortium.Analysis Results Standard v1.0. Austin, TX: CDISC; 2024. Available from: https://www.cdisc.org/standards/foundational/ analysis-results-standard

  15. [15]

    Geneva: ICH; 2021

    International Council for Harmonisation.ICH E8(R1): general considerations for clinical studies. Geneva: ICH; 2021. Available from: https://www.ich.org/page/efficacy-guidelines

  16. [16]

    Slaughter SJ, Carpenter AL.SAS macro programming made easy. 3rd ed. Cary, NC: SAS Institute; 2018

  17. [17]

    Developing and managing a SAS macro library

    James M, Maass C, Redner G. Developing and managing a SAS macro library. In:Proceedings of PharmaSUG; 2006. Paper PO10

  18. [18]

    Austin, TX: CDISC; 2021

    Clinical Data Interchange Standards Consortium.Analysis Data Model (ADaM) v2.1. Austin, TX: CDISC; 2021. Available from: https://www.cdisc.org/standards/foundational/adam

  19. [19]

    A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature.Proc Assoc Comput Linguist.2018;2018:197–207

    Nye B, Li JJ, Patel R, et al. A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical literature.Proc Assoc Comput Linguist.2018;2018:197–207

  20. [20]

    Evaluating eligibility criteria of oncology trials using real-world data and AI.Nature.2021;592(7855):629–633

    Liu R, Rizzo S, Whipple S, et al. Evaluating eligibility criteria of oncology trials using real-world data and AI.Nature.2021;592(7855):629–633. doi:10.1038/s41586-021-03430-5

  21. [21]

    The use of artificial intelligence in phar- macovigilance: a systematic review of the literature.Pharm Med.2022;36(5):295–306

    Salas M, Petracek J, Yalamanchili P, et al. The use of artificial intelligence in phar- macovigilance: a systematic review of the literature.Pharm Med.2022;36(5):295–306. doi:10.1007/s40290-022-00441-z

  22. [22]

    How is chatgpt’s behav- ior changing over time? arXiv preprint arXiv:2307.09009, 2023

    Chen L, Zaharia M, Zou J. How is ChatGPT’s behavior changing over time?arXiv preprint arXiv:2307.09009. 2023

  23. [23]

    Large-scale TFL automation for regulated pharmaceutical trials using CDISC Analysis Results Metadata (ARM)

    Malcolm S. Large-scale TFL automation for regulated pharmaceutical trials using CDISC Analysis Results Metadata (ARM). In:Proceedings of PharmaSUG; 2019. Paper AD-203

  24. [24]

    Using large language models to generate clinical trial tables and figures.arXiv preprintarXiv:2409.12046

    Yang Y, Krusche P, Pantoja K, Shi C, Ludmir E, Roberts K, Zhu G. Using large language models to generate clinical trial tables and figures.arXiv preprintarXiv:2409.12046. 2024

  25. [25]

    BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief Bioinform.2022;23(6):bbac409

    Luo R, Sun L, Xia Y, et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining.Brief Bioinform.2022;23(6):bbac409. doi:10.1093/bib/bbac409

  26. [26]

    Migrating from COBOL to Java

    Sneed HM. Migrating from COBOL to Java. In:Proceedings of the IEEE International Conference on Software Maintenance; 2010:1–7. doi:10.1109/ICSM.2010.5609583

  27. [27]

    Washington, DC: RTCA; 2012

    RTCA.DO-178C: software considerations in airborne systems and equipment certification. Washington, DC: RTCA; 2012

  28. [28]

    Food and Drug Administration.General principles of software validation: guidance for industry and FDA staff

    U.S. Food and Drug Administration.General principles of software validation: guidance for industry and FDA staff. Silver Spring, MD: FDA; 2002. Available from: https://www.fda. gov/regulatory-information/search-fda-guidance-documents

  29. [29]

    Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University

    Comella-Dorda S, Wallnau K, Seacord R, Robert J.A survey of legacy system modernization approaches. Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University

  30. [30]

    Technical Note CMU/SEI-2000-TN-003

  31. [31]

    Available from: https://www.R-project.org/

    RCoreTeam.R: a language and environment for statistical computing.Vienna: RFoundation for Statistical Computing; 2024. Available from: https://www.R-project.org/

  32. [32]

    Scotts Valley, CA: CreateSpace; 2009

    Van Rossum G, Drake FL.Python 3 reference manual. Scotts Valley, CA: CreateSpace; 2009

  33. [33]

    Pinnacle 21.Pinnacle 21 Enterprise: CDISC compliance and data review. 2024. Available from: https://www.pinnacle21.com/

  34. [34]

    A complexity measure.IEEE Trans Softw Eng.1976;SE-2(4):308–320

    McCabe TJ. A complexity measure.IEEE Trans Softw Eng.1976;SE-2(4):308–320. doi:10.1109/TSE.1976.233837 28

  35. [35]

    Geneva: ICH; 1995

    International Council for Harmonisation.ICH E3: structure and content of clinical study reports. Geneva: ICH; 1995. Available from: https://www.ich.org/page/efficacy-guidelines

  36. [36]

    Boston, MA: Addison-Wesley; 2002

    Fowler M.Patterns of enterprise application architecture. Boston, MA: Addison-Wesley; 2002

  37. [37]

    Fed Regist.1997;62(54):13430–13466

    U.S.FoodandDrugAdministration.21CFRPart11: electronicrecords; electronicsignatures. Fed Regist.1997;62(54):13430–13466

  38. [38]

    London: Springer; 2012

    Cleland-Huang J, Gotel O, Zisman A.Software and systems traceability. London: Springer; 2012

  39. [39]

    pharmaverse.Open-source R packages for clinical reporting. 2024. Available from: https: //pharmaverse.org/

  40. [40]

    Austin, TX: CDISC; 2013

    Clinical Data Interchange Standards Consortium.CDISC Pilot Project (CDISCPi- lot01). Austin, TX: CDISC; 2013. Available from: https://github.com/cdisc-org/ sdtm-adam-pilot-project

  41. [41]

    International Society for Pharmaceutical Engineering.GAMP 5: a risk-based approach to compliant GxP computerized systems. 2nd ed. Tampa, FL: ISPE; 2022

  42. [42]

    Fowler M.StranglerFigApplication. 2004. Available from: https://martinfowler.com/bliki/ StranglerFigApplication.html

  43. [43]

    Reading, MA: Addison-Wesley; 1994

    Gamma E, Helm R, Johnson R, Vlissides J.Design patterns: elements of reusable object- oriented software. Reading, MA: Addison-Wesley; 1994

  44. [44]

    Boston, MA: Addison-Wesley; 2003

    Evans E.Domain-driven design: tackling complexity in the heart of software. Boston, MA: Addison-Wesley; 2003. 29