arxiv: 2604.23924 · v1 · submitted 2026-04-27 · 💻 cs.AI · q-bio.BM

Recognition: unknown

Agentic AI platforms for autonomous training and rule induction of human-human and virus-human protein-protein interactions

Hung N. Do , Jessica Z. Kubicek-Sutherland , Oscar A. Negrete , S. Gnanakaran

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:00 UTC · model grok-4.3

classification 💻 cs.AI q-bio.BM

keywords agentic AIprotein-protein interactionsPPI predictionrule inductionmachine learning interpretabilityhuman-virus interactionsautonomous data handling

0 comments

The pith

Agentic AI platforms can autonomously train protein-protein interaction models to 87 percent accuracy while inducing matching explanatory rules.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that an AI agent can direct two separate platforms of sub-agents to manage every step from gathering protein interaction data to building and validating predictive models, and then to writing explicit rules that describe the same interactions. These models, trained on three-way protein-disjoint cross-fold datasets, reach 87.3 percent accuracy for human-human interactions and 86.5 percent for human-virus interactions. The second platform derives simple rules for human-human cases and weighted rules for human-virus cases from protein embeddings, physicochemical descriptors, compartment annotations, pathway overlaps, and graph contexts; these rules line up with the features the models themselves treat as most important. A sympathetic reader would care because the work claims to remove manual steps in both prediction and interpretation of biological interactions, potentially allowing faster and more consistent analysis of how viruses interact with human proteins.

Core claim

An AI agent can construct one platform of five sub-agents that autonomously collects, verifies, embeds, designs, trains, and validates machine-learning models for human-human and human-virus protein-protein interactions on three-way protein-disjoint datasets, reaching 87.3 percent and 86.5 percent accuracy respectively; a second platform replaces the models with human-readable rules derived from the same input features, and these rules align with the SHAP-identified features of the trained models.

What carries the argument

Two agentic AI platforms: the first with five sub-agents that handle data collection through training and validation on protein-disjoint folds; the second that induces explicit rules from protein embeddings, autocovariance descriptors, compartment annotations, pathway-domain overlaps, and graph contexts.

If this is right

Models can be built and validated without manual data handling or feature selection at each step.
Rule sets provide human-readable descriptions of interaction mechanisms for both human-human and human-virus cases.
Alignment between the induced rules and model feature rankings increases the interpretability of the high-accuracy predictions.
The same agent-orchestrated process can be applied to other interaction-prediction tasks that require both accuracy and explicit explanations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The platforms could shorten the time from new experimental data to usable models and rules by eliminating repeated human curation steps.
If the rules prove stable across datasets, they might directly suggest testable hypotheses about which protein regions drive viral entry or immune evasion.
Extending the approach to multi-protein complexes or to time-series infection data would test whether the same autonomy holds for more complex biological questions.

Load-bearing premise

The AI agents can autonomously collect, verify, and embed biological data without introducing systematic errors or selection bias, and the induced rules represent general mechanisms rather than dataset-specific correlations.

What would settle it

Application of the induced rules to a fresh set of experimentally confirmed protein-protein interactions absent from the training data yields prediction accuracy well below the reported 86-87 percent levels, or inspection of the autonomously gathered data reveals consistent omissions or biases that alter the top SHAP features.

Figures

Figures reproduced from arXiv: 2604.23924 by Hung N. Do, Jessica Z. Kubicek-Sutherland, Oscar A. Negrete, S. Gnanakaran.

**Figure 1.** Figure 1: Scheme of the agentic AI platform for autonomous training of predictive ML models for human-human and virus-human protein-protein interactions (PPI). A coding agent received high-level instructions from the human developer to generate the necessary codes to build the agentic AI platform. The agentic AI platform consisted of five AI agents. The first AI agent, data collector, integrated data from reliable p… view at source ↗

read the original abstract

We instruct an AI agent to construct two separate agentic AI platforms: one for autonomous training of predictive ML models for human-human and virus-human PPI, and the other for inducing explicit general rules governing human-human and virus-human PPI. The first agentic AI platform for autonomous training of predictive ML models for PPI is designed to consist of five AI agents that handle autonomous data collection, data verification, feature embedding, model design, and training and validation on three-way protein-disjoint cross-fold datasets. For human-human and human-virus PPIs, the final three-way protein-disjoint ensemble achieves an accuracy of 87.3% and 86.5%, respectively. For cross-checking and interpretability purposes, the second agentic AI platform is designed to replace ML predictions with human-readable rules derived from protein embeddings, physicochemical autocovariance descriptors, compartment annotations, pathway-domain overlap, and graph contexts. For human-human PPI, it is defined by a two-rule induction, whereas human-virus is induced by a more complex set of weighted rules. The rules induced by the second agentic platform align with the SHAP-identified features from the predictive ML models built by the first agentic platform. Taken together, our work demonstrates the agentic AI's ability to orchestrate from data planning to execution, and from rule induction to explanation in ML, opening the door to various applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a five-agent pipeline for end-to-end PPI model training plus a rule-induction platform that claims 87% accuracy and SHAP alignment, but the autonomous data collection step has no reported audits.

read the letter

The paper's core contribution is a pair of agentic platforms: one with five specialized agents that collect data, verify it, embed features, design models, and train on three-way protein-disjoint folds for human-human and human-virus PPI prediction, and a second that induces explicit rules from embeddings, autocovariance descriptors, compartments, pathways, and graphs. The first platform reports 87.3% and 86.5% accuracy on the respective tasks, while the rules from the second align with SHAP values from the models. The protein-disjoint validation is a sensible choice that reduces leakage risk in this domain. The attempt to close the loop from automated training to human-readable rules is also a reasonable direction for interpretability work in bioinformatics. What is new here is the concrete orchestration of these steps inside agentic AI rather than a standard scripted pipeline. The architecture itself is described at a level that lets a reader see how the agents are supposed to divide labor. The main soft spot is the lack of any external check on the data collection and verification agents. Without sample logs, human review of a subset of labels, or cross-references to established PPI databases, it is difficult to rule out systematic selection bias or incorrect annotations that could inflate the accuracy numbers. The rule induction also re-uses the same feature families that fed the original models, so the reported alignment is not a fully independent test. No direct comparisons to non-agent baselines appear in the abstract, which makes it hard to judge whether the agentic layer adds measurable value beyond standard ML practice. This work is aimed at groups already exploring multi-agent systems for scientific automation or PPI modeling. Readers who want concrete examples of agent-driven workflows in biology could extract useful ideas from the setup, even if they treat the performance claims as preliminary. The paper deserves peer review because the claims are specific enough to be tested and the methods section would likely expand on the missing details. Referees can push for data provenance and ablation studies without the work being fundamentally incoherent.

Referee Report

3 major / 1 minor

Summary. The manuscript describes the construction of two agentic AI platforms. The first uses five specialized AI agents to autonomously handle data collection, verification, feature embedding, model design, and training/validation of ML models for human-human and human-virus protein-protein interactions (PPIs) on three-way protein-disjoint cross-validation folds, reporting final ensemble accuracies of 87.3% and 86.5%, respectively. The second platform induces explicit human-readable rules from protein embeddings, physicochemical autocovariance descriptors, compartment annotations, pathway-domain overlaps, and graph contexts; these rules are claimed to align with SHAP-identified features from the predictive models.

Significance. If the performance and alignment claims hold after proper validation, the work would demonstrate the viability of agentic AI for orchestrating end-to-end bioinformatics workflows, from autonomous data pipelines to interpretable rule extraction. This could accelerate scalable PPI research and provide a template for combining high-accuracy prediction with mechanistic insight in biological networks.

major comments (3)

Abstract: The headline accuracies (87.3% human-human, 86.5% human-virus) and the claim that the induced rules reflect general mechanisms rest on the five-agent pipeline, yet the abstract supplies no dataset sizes, source databases, verification procedures, or statistical significance tests for the ensemble. This absence prevents evaluation of whether the reported performance is robust or vulnerable to systematic errors in autonomous data collection and labeling.
Abstract (rule induction section): The second platform induces rules from the identical protein embeddings, autocovariance descriptors, compartment annotations, and graph contexts used to train the ML models in the first platform, with alignment asserted via SHAP values derived from those same models. This creates a circularity risk that undermines the independence of the cross-check and the assertion that the rules (two-rule set for human-human; weighted rules for human-virus) capture general mechanisms rather than dataset-specific correlations.
Abstract / agent architecture description: The manuscript asserts that the data-collection and verification agents autonomously gather and audit biological data without introducing bias, but provides no human-audited samples, query logs, external database cross-checks, or error-rate estimates. Because both the ensemble accuracies and the subsequent rule induction depend on the fidelity of this step, the lack of auditability is load-bearing for the central claims.

minor comments (1)

Abstract: The description of the 'three-way protein-disjoint ensemble' and the exact feature sets fed to the models could be expanded for immediate clarity, even if full methods appear later.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript describing agentic AI platforms for PPI prediction and rule induction. We provide point-by-point responses to the major comments below and have revised the manuscript to improve clarity and address concerns where possible.

read point-by-point responses

Referee: Abstract: The headline accuracies (87.3% human-human, 86.5% human-virus) and the claim that the induced rules reflect general mechanisms rest on the five-agent pipeline, yet the abstract supplies no dataset sizes, source databases, verification procedures, or statistical significance tests for the ensemble. This absence prevents evaluation of whether the reported performance is robust or vulnerable to systematic errors in autonomous data collection and labeling.

Authors: We agree that the abstract should include these details to facilitate evaluation of robustness. We have revised the abstract to incorporate the dataset sizes, source databases, verification procedures, and statistical significance tests as described in the methods section of the manuscript. revision: yes
Referee: Abstract (rule induction section): The second platform induces rules from the identical protein embeddings, autocovariance descriptors, compartment annotations, and graph contexts used to train the ML models in the first platform, with alignment asserted via SHAP values derived from those same models. This creates a circularity risk that undermines the independence of the cross-check and the assertion that the rules (two-rule set for human-human; weighted rules for human-virus) capture general mechanisms rather than dataset-specific correlations.

Authors: We acknowledge the potential circularity arising from the shared use of features. However, the rule induction is performed by a distinct agentic platform using symbolic methods independent of the ML training process, with SHAP serving as an alignment check. We have revised the abstract and added discussion to clarify this independence and to note additional validation of the rules against external biological knowledge to support their generality. revision: partial
Referee: Abstract / agent architecture description: The manuscript asserts that the data-collection and verification agents autonomously gather and audit biological data without introducing bias, but provides no human-audited samples, query logs, external database cross-checks, or error-rate estimates. Because both the ensemble accuracies and the subsequent rule induction depend on the fidelity of this step, the lack of auditability is load-bearing for the central claims.

Authors: We recognize the importance of providing evidence for the fidelity of the autonomous data pipeline. While human-audited samples were not part of the original study design, we have added to the supplementary materials example query logs, error-rate estimates, and external cross-check results from the verification agent to demonstrate data quality. This addresses the auditability concern while maintaining the autonomous framework. revision: yes

Circularity Check

0 steps flagged

No circularity: separate agentic pipelines for training and rule induction remain independent of self-referential reduction.

full rationale

The paper describes two distinct agentic AI platforms. The first collects, verifies, embeds, and trains ML models on three-way protein-disjoint cross-folds, reporting ensemble accuracies of 87.3% and 86.5%. The second induces explicit rules from the same class of inputs (embeddings, autocovariance descriptors, annotations, graph contexts) and notes alignment with SHAP values from the first platform. This alignment is a post-hoc consistency check rather than a derivation that reduces to its inputs by construction. No equations, fitted parameters renamed as predictions, self-citations, or uniqueness theorems appear in the provided architecture or abstract. The process is self-contained against external biological databases and cross-validation folds, with no load-bearing step that equates output to input by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the untested premise that autonomous AI agents can perform reliable biological data curation and that induced rules capture transferable biological principles rather than training-set artifacts.

axioms (2)

domain assumption AI agents can autonomously collect, verify, and embed PPI data without systematic bias or omission
Invoked in the design of the five-agent data-collection and verification stage.
domain assumption Rules derived from embeddings and descriptors are generalizable beyond the training proteins
Required for the claim that the two-rule and weighted-rule sets constitute 'explicit general rules'.

invented entities (1)

Five specialized AI agents (data collection, verification, feature embedding, model design, training/validation) no independent evidence
purpose: To orchestrate the entire autonomous ML pipeline
Introduced by the paper as the core architecture of the first platform; no external validation of their reliability is provided.

pith-pipeline@v0.9.0 · 5568 in / 1696 out tokens · 126394 ms · 2026-05-08T04:00:02.209438+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 4 canonical work pages · 1 internal anchor

[1]

developed from it. In total, the agentic AI platform for autonomous training of predictive ML models for PPI comprised ﬁve AI agents with speciﬁed functions: data collector, data veriﬁer, feature embedder, model designer, and executor (Figure 1). On the other hand, the agentic AI platform for rule induction of PPI consisted of four AI agents, retaining th...

2023
[2]

and the facebook/esm2_t30_150M_UR50D38 used as the encoder for virus-human protein amino acid sequences (Supplementary Data 4). The rule induction agent used a hybrid search strategy for rule induction of human-human PPI, allowing it to compare greedy68 forward selection with sparse logistic69 rule induction and retaining the ruleset with the better valid...

work page doi:10.5281/zenodo.19701990 2025
[3]

Deep Learning using Rectified Linear Units (ReLU)

Nucleic Acids Research 51, D523-D531 (2023). https://doi.org/https://doi.org/10.1093/nar/gkac1052 37 Brandes, N., Ofer, D., Peleg, Y ., Rappoport, N. & Linial, M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38, 2102-2110 (2022). https://doi.org/10.1093/bioinformatics/btac020 38 Lin, Z. et al. Evolutionary-...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/nar/gkac1052 2023
[4]

https://doi.org/10.1093/nar/gkaa913 64 Finn, R

Nucleic Acids Research 49, D412-D419 (2020). https://doi.org/10.1093/nar/gkaa913 64 Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Research 42, D222–D230 (2013). https://doi.org/10.1093/nar/gkt1223 65 Consortium, T. G. O. et al. Gene Ontology: tool for the uniﬁcation of biology. Nat Genet 25, 25-29 (2000). https://doi.org/10.1038/75...

work page doi:10.1093/nar/gkaa913 2020
[5]

Abramson, J

Nature, In Press (2024). https://doi.org/https://doi.org/10.1038/s41586-024-07487-w 79 Passaro, S. et al. Boltz-2: Towards Accurate and Egicient Binding Aginity Prediction. bioRxiv (2025). https://doi.org/10.1101/2025.06.14.659707 25 Figures Figure

work page doi:10.1038/s41586-024-07487-w 2024