q-bio.BM — Pith

0

q-bio.BM 2026-05-12 Recognition

Compact tokenizer generates plausible proteins from scratch with 10x fewer parameters

Yeti: A compact protein structure tokenizer for reconstruction and multi-modal generation

Yeti achieves top codebook use and token diversity, letting a small multimodal model match larger systems in joint sequence and structure出力.

abstract click to expand

Multimodal models that jointly reason over protein sequences, structures, and function annotations within a unified representation hold immense potential for integrating multimodal data and generating new proteins with designed functional properties. To utilize transformer architectures, such models require a tokenizer that converts protein structure from continuous atomic coordinates into discrete representations suitable for scalable multimodal training. The quality of such models are fundamentally upper bounded by the fidelity and expressiveness of the underlying tokenized structure. However, existing tokenizers prioritize reconstruction over generative abilities. To address these gaps, we introduce Yeti, a simple and compact protein structure tokenizer based on lookup free quantization and trained end to end with a flow matching objective for multimodal learning. Compared to existing models, Yeti generally achieves the best codebook utilization and token diversity, and second best reconstruction accuracy (with 10x fewer parameters than ESM3) on diverse datasets. To validate Yeti's generative capability, we trained a compact multimodal model jointly over its structure tokens and amino acid sequence entirely from scratch, with no pretrained initialization. The resulting multimodal model generates plausible structures under unconditional cogeneration of protein sequence and structures, achieving comparable results to 10x larger models. Together, these results demonstrate that Yeti is a compact and expressive protein structure tokenizer suitable for training multimodal models that cogenerates highly plausible sequences and structures.

0

q-bio.BM 2026-05-11 2 theorems

New diffusion model generates directional agonists and antagonists

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

TD3B controls protein state transitions to produce binders whose functional bias is independent of binding affinity.

abstract click to expand

Protein function is often controlled by ligands that bias the direction of state transitions, such as agonists and antagonists, rather than stabilizing a single conformation. This is especially important for clinically relevant G protein-coupled receptors (GPCRs), where therapeutic efficacy depends on functional directionality. Structure-based design methods optimize binding to static conformations and cannot represent non-reversible, directional effects or systematically distinguish agonist from antagonist behavior. To address this gap, we introduce Transition-Directed Discrete Diffusion for Allosteric Binder Design (TD3B), a sequence-based generative framework that designs binders with specified agonist or antagonist behavior via a directional transition control objective. TD3B combines a target-aware Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model, enabling targeted agonist and antagonist generation decoupled from binding affinity and unattainable by equilibrium-based or inference-only guidance baselines. The code and checkpoints are available at https://huggingface.co/ChatterjeeLab/TD3B.

0

q-bio.BM 2026-05-11 2 theorems

BP180 homotrimer model stays folded in simulations

A putative, computationally stable structure of homotrimeric BP180/collagen XVII

Molecular dynamics shows the AI-predicted structure of the skin adhesion protein remains mostly stable over 500 ns with domain-specific flex

abstract click to expand

Background: BP180, also known as collagen XVII and BPAG2 (bullous pemphigoid antigen 2), is a 180-kDa transmembrane protein within the hemidesmosomal plaque complex, and which is known to be a major antigen in bullous pemphigoid, gestational pemphigoid, cicatricial (mucous membrane) pemphigoid, and linear IgA bullous disease. Objective: At present, the 3D structure of BP180 is not known. The goal is to predict a reasonable structure for BP180 through machine learning and molecular dynamics. Methods: In this work, we use the recent Boltz-2 model to predict a putative structure for the intracellular, transmembrane, and proximal extracellular domains, including the NC16A antigenic region and a portion of its first extracellular collagenous domain, Col-15. We computationally embed BP180 in a simple phospholipid bilayer, demonstrate that the putative structure is stable using molecular dynamics, and analyze its allosteric properties. Results: The structures presented satisfy symmetry and secondary structure properties which are expected from homology modelling. Over three 500 ns trajectories, there is minor instability of the predicted globular head domain, but the homotrimer otherwise stays mostly folded. The putative NC16A domain is stiff, whereas the truncated Col-15 domain is highly flexible. There does not appear to be a nearby stable conformation distinct from the initial state. Conclusion: The structure presented is a useful starting point for targeting BP180 pharmacologically, for further experimental characterization of BP180, and for generating hypotheses regarding the relevant epitopes contributing to bullous disease. Diffusion models such as Boltz-2 and AlphaFold3 are useful, but their results must be evaluated carefully.

0

q-bio.BM 2026-05-11 2 theorems

Benchmark trains models on noisy DEL screens and tests them on real Ki affinities

CA-DEL: An Open Multi-Target, Multi-Modal Benchmark for Learning from DNA-Encoded Library Screens

CA-DEL pairs screens for three carbonic anhydrase isoforms with ChEMBL validation data to check generalization from indirect signals.

abstract click to expand

The success of machine learning in drug discovery hinges on learning the relationship between a chemical structure and its biological activity. While DNA-Encoded Library (DEL) technology can generate the massive datasets required for this task, its primary signal -- sequencing read counts -- is an indirect and often noisy proxy for true molecular binding affinity. To address the scarcity of public benchmarks for developing robust models that can overcome this data challenge, we introduce CA-DEL, a multi-dimensional public benchmark featuring screens against three homologous carbonic anhydrase isoforms. While recent benchmarks like KinDEL have introduced 3D poses for kinase targets, CA-DEL distinguishes itself by focusing on the selectivity challenge among homologous Carbonic Anhydrase isoforms (CAII, CAIX, CAXII). Unlike benchmarks relying solely on noisy enrichment scores, CA-DEL integrates a rigorous validation set of experimentally determined binding affinities ($K_i$) from ChEMBL, establishing a critical Sim-to-Real evaluation paradigm: training on noisy DEL screens and testing on high-fidelity biophysical data.

0

q-bio.BM 2026-05-08

Tree search guides diffusion to balance multiple protein design goals

MP2D: Constrained Monte Carlo Tree-Guided Diffusion for Multi-Objective Protein Sequence Design

MP2D explores denoising paths with Pareto constraints to improve four or five conflicting properties without retraining the base model.

abstract click to expand

Designing functional protein sequences that satisfy multiple desired properties is a core research focus of protein engineering. Prior methods struggle with inability or inefficiency when dealing with numerous, often conflicting, properties. We propose Multi-Property Protein Diffusion (MP2D), a unified framework for multi-objective protein sequence optimization that integrates conditional discrete diffusion with constrained MCTS and global iterative refinement. MP2D formulates diffusion denoising as a constrained sequential decision-making process and employs MCTS to explore diverse denoising trajectories guided by Pareto-based rewards. A global iterative refinement strategy further enables repeated remasking and re-optimization of candidate sequences, while a dynamic Pareto constraint prevents candidate bloat and maintains balanced trade-offs across objectives. We evaluate MP2D on two challenging multi-objective protein design tasks: antimicrobial peptide and protein binder optimization, involving four to five conflicting properties. Experimental results demonstrate that MP2D consistently outperforms existing multi-objective baselines, achieving robust and balanced improvements across all objectives without retraining generative models. These results highlight MP2D as a practical and scalable solution for multi-objective functional protein design.

0

q-bio.BM 2026-05-07

AlphaFold guidance raises TM-scores in Phenix cryo-EM models

Enhancing Cryo-EM Density Map Segmentation in Phenix for Improved Atomic Model Building

Inserting predicted structures into the segmentation step improves sequence accuracy over standard Phenix on noisy maps.

abstract click to expand

We introduce PhenixCraft, a fully automated pipeline for building atomic models from cryo-EM density maps. By integrating AlphaFold predictions, we enhance the map-segmentation step in Phenix during model building, addressing challenges posed by noise and artifacts that traditionally hinder this step. Our results demonstrate PhenixCraft's superior performance in TM-scores and sequence accuracy, significantly improving upon the limitations and inefficiencies of traditional model building using Phenix.

0

q-bio.BM 2026-05-06

Boltz-2 and fine-tuned DrugFormDTA top antiviral affinity benchmarks

Benchmarking open-source tools for in silico antiviral drug discovery

Tests on 853 compounds across 16 viral targets show ML models outperform docking, with fine-tuning lifting correlation to 0.7.

abstract click to expand

Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom dataset we fine-tuned the DrugFormDTA binding affinity prediction model (Khokhlov et al. 2025). We then benchmarked 15 open-source binding affinity prediction tools on a custom test set of 853 antiviral compounds spread across 16 different protein targets from 10 virus species. Models tested include Boltz-2, GNINA, FlowDock, Interformer, AutoDock-GPU, and others. We found that Boltz-2 and DrugFormDTA ranked highest overall among ML-based approaches, and GNINA did best among docking approaches, with notable variance across specific viral proteins. Fine-tuning DrugFormDTA on our custom cleaned antiviral dataset boosted performance from $r=0.5$ to $r=0.7$. As part of this work we also compiled a library of approved drugs and a comprehensive list of investigational and approved antiviral drugs that can be viewed at https://antivirals-database.radvac.org. Together, this work provides a foundation for future work towards new tools and platforms for rapid drug repurposing and rapid design of antiviral combinations.

0

q-bio.BM 2026-05-06

Agentic AI matches Smina at 50% docking pose accuracy

AgenticPosesRanker: An Agentic AI Framework for Physically Grounded Ranking of Protein-Ligand Docking Poses

It recovers some baseline failures through physical tool analysis and LLM reasoning on a balanced benchmark of protein-ligand systems.

abstract click to expand

Scoring functions remain the principal bottleneck in molecular docking: they routinely fail to rank near-native poses above decoys, and their composite single-score design obscures the physicochemical basis of each ranking error. We present AgenticPosesRanker, an agentic AI framework that combines six deterministic, physically grounded analysis tools (interaction fingerprinting, solvent-accessible burial, conformational strain, steric-clash detection, unsatisfied-polar-atom penalty, and chemical-identity extraction) with large-language-model (GPT-5) chain-of-thought reasoning to evaluate and rank docking poses. On a curated benchmark of ten protein-ligand systems (162 poses) balanced by construction between Smina scoring-function successes and failures, the agent achieved 50.0% best-pose accuracy, matching the design-fixed Smina baseline of 50.0% and significantly exceeding a 7.7% uniformly random baseline (p < 0.001, one-sided exact binomial test). The balanced-benchmark accuracy decomposes symmetrically: the agent retained 80% (4/5) of the Smina-success systems and recovered 20% (1/5) of the Smina-failure systems, so the aggregate 50% reflects one regression offset by one recovery rather than any net improvement over the Smina reference. Decision-attribution analysis showed high alignment between the agent's self-reported tool weights and objective metric separations of the selected pose (median \r{ho} = +0.83), consistent across correct and incorrect outcomes, localising the performance ceiling to tool-suite coverage rather than reasoning inconsistency. These results establish a methodological template for evaluating agentic AI against objective ground truth in the natural sciences and position the framework as an interpretable curation layer for late-stage pose refinement in structure-based drug design.

0

q-bio.BM 2026-05-01

Salt changes BSA scattering via ions and hydration water

Complex Effects of Salt on Small-Angle X-ray Scattering of BSA Originate From the Interplay of Ions and Hydration Water

Simulations trace the complex SAXS patterns to the combined distributions of ions and water around the protein

abstract click to expand

Salts are an integral part of the environment for living systems and, therefore, understanding their effects on proteins and other biomolecules is of fundamental interest. Small-angle X-ray scattering (SAXS) of protein solutions can provide valuable information on salt effects, but extracting this information has been a significant challenge. For example, SAXS data of bovine serum albumin (BSA) at various salt concentrations were fit to three different spherical models. Here we combined the newly developed FMAPIq approach with explicit-solvent all-atom molecular dynamics simulations to show that the complex effects of salt on the SAXS of BSA originate from the interplay of ions and hydration water, leading to a general picture of protein-ion-water interactions.

0

q-bio.BM 2026-04-29

Three strategies organize AI work on protein dynamics

Learning Structure, Energy, and Dynamics: A Survey of Artificial Intelligence for Protein Dynamics

Structure learning, energy signals, and simulation acceleration cover recent methods while noting open problems in scale and physical match.

abstract click to expand

Protein dynamics underlie many biological functions, yet remain difficult to characterize due to the high computational cost of molecular dynamics simulations and the scarcity of dynamic structural data. This survey reviews recent advances in artificial intelligence for protein dynamics from three perspectives: learning from structural ensembles and trajectories, learning from physical energy signals, and learning to accelerate molecular simulations. We summarize representative methods for conformation ensemble generation, trajectory generation, Boltzmann generators, physics-aware adaptation, machine learning potentials, coarse-grained modeling, and collective variable discovery. We further discuss available datasets and key open challenges, such as scalability, thermodynamic consistency, kinetic fidelity, and integration with experimental constraints.

0

q-bio.BM 2026-04-21

Channel-wise transforms give AlphaFold3 control over protein conformations

ConforNets: Latents-Based Conformational Control in OpenFold3

The reusable latent adjustments recover alternate states at state-of-the-art rates and transfer conformational changes across protein family

abstract click to expand

Models from the AlphaFold (AF) family reliably predict one dominant conformation for most well-ordered proteins but struggle to capture biologically relevant alternate states. Several efforts have focused on eliciting greater conformational variability through ad hoc inference-time perturbations of AF models or their inputs. Despite their progress, these approaches remain inefficient and fail to consistently recover major conformational modes. Here, we investigate both the optimal location and manner-of-operation for perturbing latent representations in the AF3 architecture. We distill our findings in ConforNets: channel-wise affine transforms of the pre-Pairformer pair latents. Unlike previous methods, ConforNets globally modulate AF3 representations, making them reusable across proteins. On unsupervised generation of alternate states, ConforNets achieve state-of-the-art success rates on all existing multi-state benchmarks. On the novel supervised task of conformational transfer, ConforNets trained on one source protein can induce a conserved conformational change across a protein family. Collectively, these results introduce a mechanism for conformational control in AF3-based models.

0

q-bio.BM 2026-04-21

Parallel MCMC makes protein coupling estimates reproducible

Boltzmann Machine Learning with a Parallel, Persistent Markov chain Monte Carlo method for Estimating Evolutionary Fields and Couplings from a Protein Multiple Sequence Alignment

Persistent chains and conformation-based tuning let Boltzmann learning recover fields from alignments without contact-precision reliance.

abstract click to expand

The inverse Potts problem for estimating evolutionary single-site fields and pairwise couplings in homologous protein sequences from their single-site and pairwise amino acid frequencies observed in their multiple sequence alignment would be still one of useful methods in the studies of protein structure and evolution. Since the reproducibility of fields and couplings are the most important, the Boltzmann machine method is employed here, although it is computationally intensive. In order to reduce computational time required for the Boltzmann machine, parallel, persistent Markov chain Monte Carlo method is employed to estimate the single-site and pairwise marginal distributions in each learning step. Also, stochastic gradient descent methods are used to reduce computational time for each learning. Another problem is how to adjust the values of hyperparameters; there are two regularization parameters for evolutionary fields and couplings. The precision of contact residue pair prediction is often used to adjust the hyperparameters. However, it is not sensitive to these regularization parameters. Here, they are adjusted for the fields and couplings to satisfy a specific condition that is appropriate for protein conformations. This method has been applied to eight protein families.

0

q-bio.BM 2026-04-17

Graph model partitions proteins into functional units

PUFFIN: Protein Unit Discovery with Functional Supervision

A neural network learns to split structures into coherent groups that associate with specific molecular roles and align with existing domain

abstract click to expand

Proteins carry out biological functions through the coordinated action of groups of residues organized into structural arrangements. These arrangements, which we refer to as protein units, exist at an intermediate scale, being larger than individual residues yet smaller than entire proteins. A deeper understanding of protein function can be achieved by identifying these units and their associations with function. However, existing approaches either focus on residue-level signals, rely on curated annotations, or segment protein structures without incorporating functional information, thereby limiting interpretable analysis of structure-function relationships. We introduce PUFFIN, a data-driven framework for discovering protein units by jointly learning structural partitioning and functional supervision. PUFFIN represents proteins as residue-level structure graphs and applies a graph neural network with a structure-aware pooling mechanism that partitions each protein into multi-residue units, with functional supervision that shapes the partition. We show that the learned units are structurally coherent, exhibit organized associations with molecular function, and show meaningful correspondence with curated InterPro annotations. Together, these results demonstrate that PUFFIN provides an interpretable framework for analyzing structure-function relationships using learned protein units and their statistical function associations. We made our source code available at https://github.com/boun-tabi-lifelu/puffin.

0

q-bio.BM 2026-04-17

Distance term doubles recovery of linkable fragment pose pairs

Simultaneous Fragment Docking for Geometrically Linkable Pose Pairs

Q-SFD with inter-fragment distance favors chemically connectable arrangements in two-fragment docking.

abstract click to expand

Computational molecular design requires binding arrangements that are not only energetically favorable but also chemically realizable. However, computational methods remain limited in directly recovering fragment pose pairs that can later be connected into a single molecule. To address this problem, we formulated the simultaneous placement of two fragments as a quadratic unconstrained binary optimization problem, Q-SFD, and introduced an explicit inter-fragment distance term to favor reconstruction-feasible arrangements. Relative to the formulation without this term, Q-SFD approximately doubled top-1 recovery of reconstruction-feasible pairs, and the top-5 solutions contained at least one feasible pair for more than 90% of benchmark cases without loss of fragment-level pose accuracy.

0

q-bio.BM 2026-04-16

Generative model samples protein shapes at any temperature

Polyformer: a generative framework for thermodynamic modeling of polymeric molecules

Given sequence and temperature, it produces ensembles that match molecular dynamics results for protein domains.

abstract click to expand

The classic paradigm of structural biology is that the sequence of a biomolecule (protein, nucleic acid, lipid, etc) determines its conformation (shape) which determines its biological function. Protein folding programs like AlphaFold address this paradigm by predicting the single best conformation given a sequence that defines the molecule. However, biomolecules are not static structures, and their conformational ensemble determines their function. We present the Polyformer -- a generative framework for thermodynamic modeling of polymeric molecules. Given the sequence and temperature (or another thermodynamic variable), the Polyformer generates conformations faithful to the molecule's thermodynamic conformational ensemble. It is the first generative model that solves three problems simultaneously: how does a molecule fold, what is its conformational ensemble, and how does the conformational ensemble change as we change physical temperature. As a concrete test case, we apply Polyformer to protein domains with 50-111 residues and report good agreement of model predictions to Molecular Dynamics (MD) trajectories.

0

q-bio.BM 2026-04-13 2 theorems

Antibody oracles gain 12-20% on developability assays

Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design

Property-specific decoders invert expected chain-interaction needs, with self-attention sufficing for aggregation and cross-attention for-st

abstract click to expand

Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization. Here we present CrossAbSense, a framework of property-specific neural oracles that combine frozen protein language model encoders with configurable attention decoders, identified through a systematic hyperparameter campaign totaling over 200 runs per property. On the GDPa1 benchmark of 242 therapeutic IgGs, our oracles achieve notable improvements of 12--20\% over established baselines on three of five developability assays and competitive performance on the remaining two. The central finding is that optimal decoder architectures \emph{invert} our initial biological hypotheses: self-attention alone suffices for aggregation-related properties (hydrophobic interaction chromatography, polyreactivity), where the relevant sequence signatures -- such as CDR-H3 hydrophobic patches -- are already fully resolved within single-chain embeddings by the high-capacity 6B encoder. Bidirectional cross-attention, by contrast, is required for expression yield and thermal stability -- properties that inherently depend on the compatibility between heavy and light chains. Learned chain fusion weights independently confirm heavy-chain dominance in aggregation ($w_H = 0.62$) versus balanced contributions for stability ($w_H = 0.51$). We demonstrate practical utility by deploying CrossAbSense on 100 IgLM-generated antibody designs, illustrating a path toward substantial reduction in experimental screening costs.

0

q-bio.BM 2026-04-10 2 theorems

Dense platelet plugs speed edge fibrin but block core gelation

Platelet plug microstructure and flow modulate fibrin gelation dynamics: Insights from computational simulations

The tradeoff implies quick wound sealing may compromise long-term clot durability by limiting interior fibrin.

abstract click to expand

During the formation of a thrombus, the architecture of the growing platelet aggregate is heterogeneous, with areas of dense and loosely packed platelets. The surface of activated platelets facilitate biochemical coagulation reactions that ultimately result in the formation of a fibrin network which stabilizes the thrombus. How platelet-plug microstructure and flow jointly govern the onset and development of fibrin is incompletely understood. We developed a novel 2D computational framework that integrates (1) a pre-adhered, discrete platelet aggregate, (2) a reduced coagulation model that generates thrombin, and (3) a fibrin polymerization model. Three platelet-plug configurations were constructed with prescribed interplatelet gaps and simulations were performed with various wall shear rates. We quantified spatiotemporal clotting metrics, including coagulation factor concentrations, fibrin evolution, and gelation onset. Across geometries, gelation initiation accelerated with increasing plug density. For more dense geometries, gelation emerged first near the plug periphery. As the platelet density increased, intraplug transport was increasingly restricted and the thrombin concentrations in between platelets increased. In contrast, the loose plug supported fibrinogen replenishment deeper into the plug core. Despite slower coagulation initiation due to reduced platelet surface area, monomer generation persisted in the interior, causing gelation to begin at the vessel wall. These results suggest a mechanistic tradeoff: rapid sealing of the injured vessel wall by early platelet contraction, i.e. plug densification, may impede the intraplug fibrin formation needed for durable stabilization. The proposed model provides a basis for studies of platelet-coagulation interactions under flow, including therapeutic developments relevant to prevention of cardiovascular disease.

0

q-bio.BM 2026-04-07 2 theorems

Policy-driven model matches folding rates for 73 proteins

Towards protein folding pathways by reconstructing protein residue networks with a policy-driven model

Reconstructing residue networks with suitable node and edge policies yields outputs correlating at r below -0.83 with published rates.

abstract click to expand

A method that reconstructs protein residue networks using suitable node selection and edge recovery policies produced numerical observations that correlate strongly (Pearson's correlation coefficient < -0.83) with published folding rates for 52 two-state folders and 21 multi-state folders; correlations are also strong at the fold-family level. These results were obtained serendipitously with the ND model, which was introduced previously, but is here extended with policies that dictate actions according to feature states. This result points to the importance of both the starting search point and the prevailing condition (random seed) for the quick success of policy search by a simple hill-climber. The two conditions, suitable policies and random seed, which (evidenced by the strong correlation statistic) setup a conducive environment for modelling protein folding within ND, could be compared to appropriate physiological conditions required by proteins to fold naturally. Of interest is an examination of the sequence of restored edges for potential as plausible protein folding pathways. Towards this end, trajectory data is collected for analysis and further model evaluation and development.

0

q-bio.BM 2026-04-06 2 theorems

Dual-modal AI identifies 33 shared host factors in flu viruses

ViraHinter: a dual-modal artificial intelligence framework for predicting virus-host interactions

By blending generated structures with sequence data, the model ranks true virus-host pairs more reliably and flags common targets across flu

abstract click to expand

Protein-protein interactions (PPIs) between a virus and its host govern infection, replication, and pathogenesis. While high-throughput mapping has identified thousands of virus-host associations, much of the virus-host interactome remains uncharacterized due to the labor-intensive nature of experimental screens, the inherent difficulty in capturing transient interactions, and the limited sequence homology across divergent viral families. Here, we introduce ViraHinter, a dual-modal deep learning framework for the precise prediction of virus-host interactions and large-scale inference of interaction landscapes. ViraHinter couples a structure-generation branch with a sequence-representation branch, integrating structure-informed pair representations with ESM-derived embeddings to learn generalizable interaction rules across unseen viruses. We benchmark ViraHinter on pathogenic coronaviruses and influenza A viruses and show that it consistently outperforms RoseTTAFold2-PPI, AlphaFold 3 and RoseTTAFold2-Lite in prioritizing high-confidence candidates even under severe class imbalance and across diverse interface regimes. Notably, it successfully identifies novel functionally relevant host factors and recapitulates the structural plasticity of the complex interfaces. By intersecting predictions across multiple influenza subtypes, ViraHinter reveals 33 shared host factors, offering a roadmap for broad-spectrum antiviral discovery. ViraHinter therefore serves as a robust computational approach for studying virus-host interactions, enabling systematic screening of host factors for all known human-infecting viruses, providing new insights into the shared mechanisms of viral pathogenesis, and accelerating the discovery of novel therapeutic targets and the development of broad-spectrum antivirals.

0