arxiv: 2604.11261 · v1 · submitted 2026-04-13 · 💻 cs.AI

Recognition: unknown

Inspectable AI for Science: A Research Object Approach to Generative AI Governance

Ruta Binkyte , Sharif Abuaddba , Chamikara Mahawaga , Ming Ding , Natasha Fernandes , Mario Fritz

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:02 UTC · model grok-4.3

classification 💻 cs.AI

keywords generative AIresearch objectsAI governancescientific workflowprovenancedocumentationFAIR principlesaccountability

0 comments

The pith

Treating AI interactions as structured Research Objects makes generative model use in science accountable through documentation and provenance rather than authorship debates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting focus from whether generative AI qualifies as an author to how its interactions are integrated, logged, and made inspectable within the research workflow. It draws on Research Object theory and FAIR principles to package model configurations, prompts, and outputs as metadata that supports verification and compliance. This approach is presented as especially necessary in security and privacy research, where generic disclosures cannot meet confidentiality and audit requirements. The authors illustrate the idea with a pipeline in which an AI synthesizes literature notes under explicit constraints while generating a complete provenance record.

Core claim

The legitimacy of an AI-assisted scientific paper depends on how model use is integrated into the workflow, documented, and made accountable. By treating AI interactions as AI-ROs, the framework records configuration details, prompts, and outputs through interaction logs and metadata packaging to create verifiable provenance that satisfies integrity and compliance needs.

What carries the argument

AI as a Research Object (AI-RO), which structures each generative model interaction as an inspectable component containing configuration, prompts, outputs, and metadata to enable provenance capture and accountability.

If this is right

Governance of generative AI in science can be implemented through structured documentation, controlled disclosure, and integrity-preserving provenance capture.
Provenance artifacts produced by the framework can address confidentiality, integrity, and auditability requirements specific to security and privacy research.
A lightweight pipeline demonstrates how language models can synthesize constrained content while automatically producing verifiable interaction records.
Additional developments in tooling and standards will be needed to make such documentation practices practical and widely adopted.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar logging requirements could be extended to non-scientific domains where AI assists in decision-making or content creation.
Automated capture tools may be necessary to reduce the risk of incomplete human-recorded logs.
Publishing venues could require AI-RO packages as supplementary material to standardize disclosure beyond current guidelines.

Load-bearing premise

That researchers can and will accurately record every AI interaction without omission, and that the resulting logs and metadata will prove sufficient to verify integrity and meet confidentiality constraints in practice.

What would settle it

A published AI-assisted paper that follows the proposed logging and packaging steps but later reveals key undisclosed AI contributions that changed its scientific conclusions or violated confidentiality rules.

Figures

Figures reproduced from arXiv: 2604.11261 by Chamikara Mahawaga, Mario Fritz, Ming Ding, Natasha Fernandes, Ruta Binkyte, Sharif Abuaddba.

**Figure 2.** Figure 2: AI-RO Workflow for Literature Review Writing: [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt template used to guide the language [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

read the original abstract

This paper introduces AI as a Research Object (AI-RO), a paradigm for governing the use of generative AI in scientific research. Instead of debating whether AI is an author or merely a tool, we propose treating AI interactions as structured, inspectable components of the research process. Under this view, the legitimacy of an AI-assisted scientific paper depends on how model use is integrated into the workflow, documented, and made accountable. Drawing on Research Object theory and FAIR principles, we propose a framework for recording model configuration, prompts, and outputs through interaction logs and metadata packaging. These properties are particularly consequential in security and privacy (S&P) research, where provenance artifacts must satisfy confidentiality constraints, integrity guarantees, and auditability requirements that generic disclosure practices do not address. We implement a lightweight writing pipeline in which a language model synthesizes human-authored structured literature review notes under explicit constraints and produces a verifiable provenance record. We present this work as a position supported by an initial demonstrative workflow, arguing that governance of generative AI in science can be implemented as structured documentation, controlled disclosure, and integrity-preserving provenance capture. Based on this example, we outline and motivate a set of necessary future developments required to make such practices practical and widely adoptable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sensibly reframes AI use in research as documented Research Objects for governance, but its single example leaves the hard part—getting complete, accurate records in practice—untested.

read the letter

The main takeaway is that legitimacy of AI-assisted work comes from treating interactions as inspectable objects with logs, configs, and provenance rather than debating authorship. This draws directly on existing Research Object and FAIR ideas and applies them to generative models, with extra attention to security and privacy constraints around confidentiality and auditability. That extension is reasonable and the demonstrative pipeline—human notes turned into a constrained synthesis with a verifiable record—shows one way the packaging could work in a writing task. Credit to the authors for keeping the proposal lightweight and focused on workflow integration instead of abstract principles alone. The central weakness is that the whole approach assumes researchers will produce complete, accurate logs without omission or selective disclosure. The paper's example is narrow and does not address how this holds up in iterative or exploratory work, nor how audit needs can be reconciled with confidentiality incentives when full records might expose sensitive details. No enforcement or verification step is sketched, so the practical sufficiency remains an open question. This is a position paper aimed at people working on research integrity and AI policy, especially in domains with strict provenance requirements. It is coherent on its own terms and engages the literature without circularity, so it deserves a serious referee to test whether the documentation model can be made operational. I would send it out but flag the enforcement gap for the reviewers.

Referee Report

2 major / 2 minor

Summary. The paper proposes AI as a Research Object (AI-RO) as a governance paradigm for generative AI in science. It argues that legitimacy of AI-assisted work stems from integrating model use into workflows via structured documentation, interaction logs, model configurations, prompts, outputs, and metadata packaging, drawing on Research Object theory and FAIR principles. This is positioned as particularly suited to security and privacy (S&P) research for meeting confidentiality, integrity, and auditability needs. The work is supported by a position statement and a demonstrative workflow in which a language model synthesizes human-authored structured literature review notes under explicit constraints to generate a verifiable provenance record, along with an outline of required future developments for broader adoption.

Significance. If operationalized, the framework could shift AI governance in science from authorship debates toward inspectable, provenance-based accountability, leveraging established Research Object and FAIR concepts to make AI interactions first-class, documented components of research. The S&P emphasis correctly identifies the tension between auditability and confidentiality as a key challenge not met by generic disclosure. The demonstrative pipeline provides a concrete, constrained example of controlled synthesis with provenance capture, which is a strength in grounding the conceptual proposal.

major comments (2)

[AI-RO framework] § on the AI-RO framework: the assertion that provenance artifacts simultaneously satisfy confidentiality constraints and auditability requirements in S&P research lacks any described mechanism for reconciling complete records with selective non-disclosure; this assumption is load-bearing for the central claim that structured documentation suffices for accountability.
[Demonstrative workflow] Demonstrative workflow section: the single pipeline (human notes synthesized under constraints) does not test or address completeness enforcement for iterative, exploratory, or partially internal AI uses, leaving the practical sufficiency of self-reported logs unexamined despite being central to the accountability argument.

minor comments (2)

[Abstract] The abstract and conclusion could more clearly delineate the scope and limitations of the single demonstrative example to manage reader expectations about generalizability.
[Future developments] Consider citing additional prior work on scientific provenance systems or AI reproducibility frameworks to better situate the future developments outlined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our position paper. The comments identify key areas where the manuscript can be strengthened by providing more explicit mechanisms and acknowledging limitations of the demonstration. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [AI-RO framework] § on the AI-RO framework: the assertion that provenance artifacts simultaneously satisfy confidentiality constraints and auditability requirements in S&P research lacks any described mechanism for reconciling complete records with selective non-disclosure; this assumption is load-bearing for the central claim that structured documentation suffices for accountability.

Authors: We agree that the manuscript does not currently describe a concrete mechanism for reconciling complete provenance records with selective non-disclosure while preserving auditability in security and privacy contexts. This is a valid observation on a load-bearing aspect of the framework. In the revised version, we will add a new subsection under the AI-RO framework that outlines practical approaches, including cryptographic commitments such as Merkle trees for verifiable partial disclosure and policy-driven redaction tied to Research Object metadata packaging. These additions will explicitly show how structured documentation can support accountability without violating confidentiality constraints. revision: yes
Referee: [Demonstrative workflow] Demonstrative workflow section: the single pipeline (human notes synthesized under constraints) does not test or address completeness enforcement for iterative, exploratory, or partially internal AI uses, leaving the practical sufficiency of self-reported logs unexamined despite being central to the accountability argument.

Authors: The demonstrative workflow is intentionally presented as a constrained, verifiable proof-of-concept to illustrate provenance capture rather than a comprehensive evaluation across all AI usage patterns. We acknowledge that it does not examine completeness enforcement for iterative, exploratory, or internal AI interactions, nor does it test the sufficiency of self-reported logs in those settings. In the revision, we will expand the discussion and future developments sections to analyze these limitations explicitly and specify requirements for automated logging and enforcement mechanisms that could address them in broader deployments. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal rests on external theories and demonstrative example

full rationale

The paper advances a position that AI-assisted science legitimacy follows from treating interactions as inspectable Research Objects via logs, metadata, and provenance. This is explicitly framed as drawing on established external Research Object theory and FAIR principles rather than deriving from self-citations or prior author results. The single demonstrative pipeline is presented as an illustrative workflow, not as a fitted prediction or self-definitional reduction. No equations, parameter fits, uniqueness theorems, or load-bearing self-citations appear in the derivation chain; the framework is proposed rather than forced by construction from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on extending Research Object theory to AI interactions and assuming provenance records can satisfy governance needs; AI-RO is introduced as the core new framing without independent empirical support beyond the demo.

axioms (2)

domain assumption Research Object theory and FAIR principles can be directly extended to document generative AI interactions without loss of utility or additional adaptations for model-specific behaviors
Invoked when proposing the framework for recording prompts, configurations, and outputs.
ad hoc to paper Structured documentation and integrity-preserving provenance capture are sufficient to address accountability, confidentiality, and auditability requirements in AI-assisted research
Core assumption underlying the argument that governance can be implemented this way.

invented entities (1)

AI as a Research Object (AI-RO) no independent evidence
purpose: To frame AI interactions as structured, inspectable components of the research process for governance and provenance
New concept introduced to organize the proposal; no independent falsifiable evidence provided beyond the conceptual argument.

pith-pipeline@v0.9.0 · 5533 in / 1495 out tokens · 85309 ms · 2026-05-10T16:02:15.528441+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Altm ¨ae, A

S. Altm ¨ae, A. Sola-Leyva, and A. Salumets. Artificial intelligence in scientific writing: a friend or a foe?Reproductive BioMedicine Online, 47(1):3–9, 2023

2023
[2]

S. Ansari. Compound deception in elite peer review: A failure mode taxonomy of 100 fabricated citations at neurips 2025, 2026

2025
[3]

A. Asai, J. He, R. Shao, W. Shi, A. Singh, J. C. Chang, K. Lo, L. Soldaini, S. Feldman, M. D’Arcy, et al. Synthesizing scientific literature with retrieval-augmented language models.Nature, pages 1–7, 2026

2026
[4]

Belhajjame, C

K. Belhajjame, C. Goble, and D. De Roure. Research object man- agement: opportunities and challenges. InConference on Computer Supported Cooperative Work. CSCW, 2012

2012
[5]

R. E. Blackwell, J. Barry, and A. G. Cohn. Towards reproducible llm evaluation: Quantifying uncertainty in llm benchmark scores.arXiv preprint arXiv:2410.03492, 2024

work page arXiv 2024
[6]

Bonadio and H

E. Bonadio and H. Felisberto. Copyrightability of ai outputs: The us copyright office’s perspective.Available at SSRN 5252465, 2025

2025
[7]

Authorship and ai tools: Cope position

Committee on Publication Ethics (COPE). Authorship and ai tools: Cope position. 2023. Accessed: 2026-02-01

2023
[8]

G. Core. Louisiana, the potential safe-haven for creators of artificial intelligence: Louisiana’s response to thaler v. perlmutter.La. L. Rev., 85:651, 2024

2024
[9]

Dempere, L

J. Dempere, L. K. Ramasamy, J. Harris, et al. Ai as a research partner: Advocating for co-authorship in academic publications.The Artificial Intelligence Business Review, 1(2), 2025

2025
[10]

Editorials

N. Editorials. Tools such as chatgpt threaten transparent science; here are our ground rules for their use.Nature, 613(7945):612, 2023

2023
[11]

Generative ai policies for journals, 2026

Elsevier. Generative ai policies for journals, 2026

2026
[12]

J. Fritz. Understanding authorship in artificial intelligence-assisted works.Journal of Intellectual Property Law and Practice, page jpae119, 2025

2025
[13]

Ganjavi, M

C. Ganjavi, M. B. Eppler, A. Pekcan, B. Biedermann, A. Abreu, G. S. Collins, I. S. Gill, and G. E. Cacciamani. Bibliometric analysis of publisher and journal instructions to authors on generative-ai in academic and scientific publishing.arXiv preprint arXiv:2307.11918, 2023

work page arXiv 2023
[14]

Towards an AI co-scientist

J. Gottweis, W. Weng, A. Daryin, T. Tu, A. Palepu, P. Sirkovic, and V . Natarajan. Towards an ai co-scientist: A multi-agent system for scientific discovery.arXiv preprint arXiv:2502.18864, page 3, 2025

work page internal anchor Pith review arXiv 2025
[15]

T. Hesman. The machine that invents.St. Louis Post-Dispatch, 2004

2004
[16]

Hwang, D

Y . Hwang, D. Shin, and J. H. Lee. Who owns ai-generated artwork? revisiting the work of generative ai based on human-ai co-creation. Telematics and Informatics, 98:102266, 2025

2025
[17]

Ieee pspb operations manual

IEEE Publication Services and Products Board. Ieee pspb operations manual. 2018. Accessed on 2026-02-01

2018
[18]

Liang, Y

W. Liang, Y . Zhang, Z. Wu, H. Lepp, W. Ji, X. Zhao, H. Cao, S. Liu, S. He, Z. Huang, et al. Mapping the increasing use of llms in scientific papers.arXiv preprint arXiv:2404.01268, 2024

work page arXiv 2024
[19]

N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long con- texts.Transactions of the association for computational linguistics, 12:157–173, 2024

2024
[20]

Liu and W

W. Liu and W. Huang. Authorship in human-ai collaborative creation: A creative control theory perspective.Computer Law & Security Review, 57:106139, 2025

2025
[21]

C. Lu, C. Lu, R. T. Lange, J. Foerster, J. Clune, and D. Ha. The ai scientist: Towards fully automated open-ended scientific discovery. arXiv preprint arXiv:2408.06292, 2024

work page internal anchor Pith review arXiv 2024
[22]

J. Miao, J. R. Davis, Y . Zhang, J. K. Pritchard, and J. Zou. Pa- per2agent: Reimagining research papers as interactive and reliable ai agents.arXiv preprint arXiv:2509.06917, 2025

work page arXiv 2025
[23]

Mitchell, S

M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchin- son, E. Spitzer, I. D. Raji, and T. Gebru. Model cards for model re- porting. InProceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019

2019
[24]

human-ai collaboration

A. Sarkar. Enough with “human-ai collaboration”. InExtended abstracts of the 2023 CHI conference on human factors in computing systems, pages 1–8, 2023

2023
[25]

Schilke and M

O. Schilke and M. Reimann. The transparency dilemma: How ai disclosure erodes trust.Organizational Behavior and Human Decision Processes, 188:104405, 2025

2025
[26]

Soiland-Reyes, P

S. Soiland-Reyes, P. Sefton, L. J. Castro, F. Coppens, D. Garijo, S. Leo, M. Portier, and P. Groth. Creating lightweight fair digital objects with ro-crate.Research Ideas and Outcomes, 8:e93937, 2022

2022
[27]

Soiland-Reyes, S

S. Soiland-Reyes, S. Wheater, T. Giles, J. Couldridge, P. Quinlan, and C. Goble. The five safes ro-crate: Fair digital objects for trusted research environments for health data research. InOpen Conference Proceedings, volume 5, 2024

2024
[28]

Ai policy, 2026

Taylor & Francis. Ai policy, 2026

2026
[29]

H. H. Thorp. Chatgpt is fun, but not an author, 2023

2023
[30]

Van Noorden and J

R. Van Noorden and J. M. Perkel. Ai and science: what 1,600 researchers think.Nature, 621(7980):672–675, 2023

2023
[31]

S. S. M. Vasu, I. Sheth, H.-P. Wang, R. Binkyte, and M. Fritz. Justice in judgment: Unveiling (hidden) bias in llm-assisted peer reviews. arXiv preprint arXiv:2509.13400, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[32]

R. Watkins. Guidance for researchers and peer-reviewers on the ethical use of large language models (llms) in scientific research workflows.AI and Ethics, 4(4):969–974, 2024

2024
[33]

M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne, et al. The fair guiding principles for scientific data management and stewardship.Scientific data, 3(1):1–9, 2016. Appendix This appendix describes how the AI-RO corresponds to the literature review workflow exa...

2016
[34]

The workflow proceeds in five stages:

Workflow Overview AI-assisted writing is operationalized as a bounded syn- thesis process embedded in a human research pipeline. The workflow proceeds in five stages:
[35]

preparation of structured input notes,
[36]

constrained model invocation,
[37]

intermediate artifact generation,
[38]

The language model functions as a synthesis and orga- nization tool

provenance recording. The language model functions as a synthesis and orga- nization tool
[39]

Instead it receives a curated input bundle consisting of human-authored reading notes

Structured Inputs The model is never prompted with open-ended literature queries. Instead it receives a curated input bundle consisting of human-authored reading notes. Each record follows a fixed schema: identifier, citation, persistent identifier, summary, strengths, limitations, and relation to the contribu- tion These notes are derived from direct rea...
[40]

First, the model groups papers into thematic clusters and provides a brief rationale for each cluster

Model Invocation The model is invoked in two stages. First, the model groups papers into thematic clusters and provides a brief rationale for each cluster. Second, the model produces a structured narrative out- line derived only from the provided notes. Hard constraints embedded in the prompt require that: citations originate from the input bundle, compar...
[41]

Generated Artifacts Each run produces inspection artifacts rather than only a final text. The artifact includes: •a taxonomy file describing conceptual groupings, •a demonstration draft showing synthesis structure, •an audit checklist linking claims to sources, •a verification table for human review, •a provenance record of model interaction. These output...
[42]

Hashes allow verification that an artifact corresponds to a specific input without exposing the underlying text

Provenance Logging Every model invocation generates a structured log con- taining: •model identity and configuration, •generation parameters, •cryptographic hashes of prompt, response, and input bundle. Hashes allow verification that an artifact corresponds to a specific input without exposing the underlying text. The released artifact therefore enables a...
[43]

Log Redaction and Anonymization For double-blind review, interaction transcripts are not publicly distributed. Instead: •full prompts and responses are replaced by integrity hashes, •timestamps are removed, •local file paths and machine identifiers are removed, •backend service endpoints are generalized. This preserves verifiability while preventing attri...
[44]

RO-Crate Structure The artifact is packaged using the RO-Crate specifi- cation, which describes research objects using machine- readable metadata. The released archive contains: •source code implementing the workflow, •prompt templates, •structured input notes, •redacted provenance logs, •generated demonstration outputs, •RO-Crate metadata describing file...
[45]

Verification Procedure The audit workflow is designed so a reviewer can:
[46]

inspect structured notes,
[47]

examine generated clusters and draft structure,
[48]

check claim-to-source mappings,
[49]

verify artifact hashes,
[50]

Human verification is required before any generated text can be treated as scholarly writing

confirm that outputs derive from the recorded inputs. Human verification is required before any generated text can be treated as scholarly writing