arxiv: 2603.22018 · v2 · submitted 2026-03-23 · 💻 cs.LG · cs.SE

Recognition: no theorem link

Do Papers Tell the Whole Story? A Benchmark and Framework for Uncovering Hidden Implementation Gaps in Bioinformatics

Tianxiang Xu , Xiaoyan Zhu , Xin Lai , Sizhe Dang , Xin Lian , Hangyu Cheng , Jiayin Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:30 UTC · model grok-4.3

classification 💻 cs.LG cs.SE

keywords paper-code consistencybioinformatics reproducibilitycross-modal alignmentbenchmark datasetimplementation gapscode verificationscientific software reliability

0 comments

The pith

A new benchmark and cross-modal framework can detect when bioinformatics papers diverge from their actual code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces paper-code consistency detection as a task that measures whether methodological descriptions in research papers align with the functions in their accompanying software. It builds BioCon, the first such dataset in bioinformatics, by aligning sentence-level text from 48 project papers to function-level code snippets, then applying expert review and hard negative sampling to create paired examples. A framework based on pre-trained models encodes both the sentences and code to support three analyses: direct classification of matches, cross-modal retrieval, and project-wide consistency scoring. If the approach holds, it supplies a practical way to surface hidden implementation gaps that currently undermine reproducibility in the field.

Core claim

The authors establish that a high-quality sentence-to-function paired dataset in bioinformatics, constructed through fine-grained alignment and expert annotation, combined with a unified cross-modal framework that jointly encodes paper text and code via pre-trained models, enables effective discrimination of consistency at sentence, retrieval, and project levels.

What carries the argument

The unified cross-modal consistency detection framework that jointly encodes paper sentences and code functions using pre-trained models to quantify semantic alignment.

If this is right

Consistency between papers and code can be assessed systematically across classification, retrieval, and full-project views.
Reproducibility problems in bioinformatics software become quantifiable rather than anecdotal.
The benchmark supplies training and evaluation data for future consistency-checking tools.
Project-level scores can flag entire software releases that diverge from their published methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Automated checks built on the framework could be run before publication or code release to catch mismatches early.
Similar datasets and frameworks could be created for other computational domains where paper-code drift is common.
High-consistency scores might eventually serve as a filter when selecting tools for downstream research.
The same alignment process could help maintainers update outdated documentation to match current code.

Load-bearing premise

Expert annotations of sentence-to-function alignments together with hard negative sampling produce labels that faithfully capture real-world implementation gaps without systematic bias or omitted mismatch types.

What would settle it

Independent experts re-annotating a sample of the BioCon pairs and showing low agreement with the original labels would demonstrate that the benchmark does not reliably reflect actual gaps.

Figures

Figures reproduced from arXiv: 2603.22018 by Hangyu Cheng, Jiayin Wang, Sizhe Dang, Tianxiang Xu, Xiaoyan Zhu, Xin Lai, Xin Lian.

**Figure 2.** Figure 2: Overall architecture of the proposed cross-modal consistency detection framework. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative distribution function curves of ranking results. [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Inconsistency ratios across 23 real-world bioinformatics software projects. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Ensuring consistency between research papers and their corresponding software code implementations is a fundamental prerequisite for guaranteeing the reproducibility of scientific findings and the reliability of software systems. However, this issue has received limited attention to date, particularly in the field of bioinformatics, where inconsistencies between methodological descriptions in papers and their actual code implementations are prevalent. To address this gap, we introduce a novel research task, namely paper-code consistency detection, which aims to characterize the cross-modal semantic alignment between methodological descriptions in papers and their corresponding code implementations. At the data level, we construct the first benchmark dataset for this task in the bioinformatics domain, termed BioCon, comprising 48 bioinformatics software projects and their associated publications. BioCon is built by fine-grained alignment between sentence-level methodological descriptions in papers and function-level code snippets, combined with expert annotation and hard negative sampling strategies, resulting in a high-quality sentence-code paired dataset. At the methodological level, we propose a unified cross-modal consistency detection framework that leverages pre-trained models to jointly encode paper sentences and code functions. We conduct a systematic analysis from three perspectives: sentence-level classification, cross-modal retrieval, and project-level consistency assessment. Experimental results demonstrate that the proposed approach achieves strong performance in both consistency discrimination and semantic alignment. Overall, this work establishes the first systematic benchmark and framework for paper-code consistency analysis, opening a new research direction and providing a foundation for improving reproducibility and reliability in bioinformatics software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BioCon offers a starting benchmark for paper-code consistency in bioinformatics, but its small size and lack of label validation details hold it back for now.

read the letter

This paper sets up the first benchmark for checking if bioinformatics papers match their code implementations. BioCon has 48 projects with sentence-to-function pairs labeled for consistency. They build it by aligning paper text to code snippets and using expert review plus hard negatives. The framework then uses standard pre-trained models to do classification, retrieval, and overall project checks. That approach is straightforward and targets a real issue in the field. Reproducibility suffers when descriptions and code diverge, so having a dataset to study this is useful. The main weaknesses are the small scale and thin validation. Forty-eight projects is not much for a benchmark, and without inter-annotator agreement numbers or clear guidelines, the labels might have biases. The reported strong performance lacks specifics on metrics or comparisons, which makes it hard to gauge the real advance. This work fits readers focused on bioinformatics tools or multimodal models for scientific documents. It could spark more research on consistency checks. I would recommend sending it for peer review. The task is new and worth exploring, even if the current version needs more details on how the data was made reliable.

Referee Report

2 major / 1 minor

Summary. The paper introduces the task of paper-code consistency detection in bioinformatics, constructs the BioCon benchmark dataset from 48 projects via fine-grained sentence-to-function alignments, expert annotation, and hard negative sampling, and proposes a unified cross-modal framework using pre-trained models. It evaluates the framework on sentence-level classification, cross-modal retrieval, and project-level consistency assessment, claiming strong performance and establishing the first systematic benchmark for uncovering hidden implementation gaps.

Significance. If the BioCon labels prove reliable, the work has clear significance as the first dedicated benchmark and framework for paper-code consistency analysis in bioinformatics, directly addressing reproducibility challenges by enabling systematic detection of mismatches between methodological descriptions and code implementations.

major comments (2)

[BioCon dataset construction] BioCon dataset construction (abstract and §3): no inter-annotator agreement metrics (Cohen’s or Fleiss’ kappa), annotation guidelines, annotator background details, or ablation on hard-negative selection are reported, yet these labels are load-bearing for all downstream classification, retrieval, and project-level results.
[Experimental evaluation] Experimental evaluation (abstract and §4): the claims of 'strong performance' on classification, retrieval, and project-level tasks are presented without specific metrics, baselines, error bars, or analysis of how post-hoc modeling choices affected results, leaving the central empirical support under-specified.

minor comments (1)

[Framework description] The abstract and framework description could clarify which specific pre-trained models are used for joint encoding and whether any domain adaptation was applied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of dataset reliability and empirical reporting. We address each point below and plan revisions to strengthen the manuscript.

read point-by-point responses

Referee: [BioCon dataset construction] BioCon dataset construction (abstract and §3): no inter-annotator agreement metrics (Cohen’s or Fleiss’ kappa), annotation guidelines, annotator background details, or ablation on hard-negative selection are reported, yet these labels are load-bearing for all downstream classification, retrieval, and project-level results.

Authors: We agree these details are essential for establishing label reliability. In the revised manuscript we will report inter-annotator agreement using Cohen’s kappa on the expert annotations, include the full annotation guidelines as supplementary material, describe annotator backgrounds (bioinformatics researchers with 5+ years experience), and add an ablation study quantifying the effect of hard-negative sampling on downstream task performance. revision: yes
Referee: [Experimental evaluation] Experimental evaluation (abstract and §4): the claims of 'strong performance' on classification, retrieval, and project-level tasks are presented without specific metrics, baselines, error bars, or analysis of how post-hoc modeling choices affected results, leaving the central empirical support under-specified.

Authors: The full manuscript already contains concrete metrics (accuracy, F1, MRR, Recall@K), baseline comparisons (BERT, CodeBERT, random), and project-level consistency scores, but we acknowledge the abstract and §4 could be more explicit. We will revise the abstract to list key metrics, add error bars from 5 random seeds, and include a dedicated subsection analyzing sensitivity to post-hoc modeling choices such as temperature scaling and threshold selection. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces a new task of paper-code consistency detection and constructs the BioCon benchmark dataset from 48 projects via sentence-to-function alignment, expert annotation, and hard negative sampling. It then applies standard pre-trained models for joint encoding without any equations, fitted parameters, or predictions that reduce to the inputs by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked in a load-bearing manner; the experimental results on classification, retrieval, and project-level assessment are independent evaluations on the newly created dataset rather than self-referential outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the quality of the newly constructed BioCon dataset and the untested assumption that pre-trained models transfer effectively to sentence-code semantic alignment in bioinformatics.

axioms (1)

domain assumption Pre-trained models can jointly encode paper sentences and code functions to measure cross-modal semantic alignment
Invoked as the basis for the proposed framework without further justification.

invented entities (1)

BioCon dataset no independent evidence
purpose: Benchmark for paper-code consistency detection task
Newly constructed from 48 projects with sentence-function alignments and expert labels.

pith-pipeline@v0.9.0 · 5575 in / 1227 out tokens · 34838 ms · 2026-05-15T00:30:02.256479+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Bioinformatics software development: Principles and future directions.The Innovation Life, 2(3):100083–1, 2024

Xu-Kai Ma, Yan Yu, Tao Huang, Dake Zhang, Caihuan Tian, Wenli Tang, Ming Luo, Pufeng Du, Guangchuang Yu, and Li Yang. Bioinformatics software development: Principles and future directions.The Innovation Life, 2(3):100083–1, 2024

work page 2024
[2]

A global perspective on evolving bioinformatics and data science training needs.Briefings in bioinformatics, 20(2):398–404, 2019

Teresa K Attwood, Sarah Blackford, Michelle D Brazas, Angela Davies, and Maria Victoria Schneider. A global perspective on evolving bioinformatics and data science training needs.Briefings in bioinformatics, 20(2):398–404, 2019

work page 2019
[3]

Bioinformatics in the age of big data: leveraging computational tools for biological discoveries.Computational Molecular Biology, 14, 2024

Xiaoming Liu, Wei Zhang, et al. Bioinformatics in the age of big data: leveraging computational tools for biological discoveries.Computational Molecular Biology, 14, 2024

work page 2024
[4]

Analytical code sharing practices in biomedical research.PeerJ Computer Science, 10:e2066, 2024

Nitesh Kumar Sharma, Ram Ayyala, Dhrithi Deshpande, Yesha Patel, Viorel Munteanu, Dumitru Ciorba, Viorel Bostan, Andrada Fiscutean, Mohammad Vahed, Aditya Sarkar, et al. Analytical code sharing practices in biomedical research.PeerJ Computer Science, 10:e2066, 2024

work page 2024
[5]

Validating the knowledge bank approach for personalized prediction of survival in acute myeloid leukemia: a reproducibility study.Human Genetics, 141(9):1467–1480, 2022

Yujun Xu and Ulrich Mansmann. Validating the knowledge bank approach for personalized prediction of survival in acute myeloid leukemia: a reproducibility study.Human Genetics, 141(9):1467–1480, 2022

work page 2022
[6]

Reproducibility standards for machine learning in the life sciences.Nature methods, 18(10):1132–1135, 2021

Benjamin J Heil, Michael M Hoffman, Florian Markowetz, Su-In Lee, Casey S Greene, and Stephanie C Hicks. Reproducibility standards for machine learning in the life sciences.Nature methods, 18(10):1132–1135, 2021

work page 2021
[7]

State of the art: Reproducibility in artificial intelligence

Odd Erik Gundersen and Sigbjørn Kjensmo. State of the art: Reproducibility in artificial intelligence. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[8]

Deep rein- forcement learning that matters

Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. Deep rein- forcement learning that matters. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

work page 2018
[9]

Let’s talk about it: Making scientific computational repro- ducibility easier

Lázaro Costa, Susana Barbosa, and Jácome Cunha. Let’s talk about it: Making scientific computational repro- ducibility easier. In2025 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pages 46–56. IEEE, 2025

work page 2025
[10]

Reproducible research policies and software/data management in scientific computing journals: a survey, discussion, and perspectives.Frontiers in Computer Science, 6:1491823, 2025

Jose Armando Hernandez and Miguel Colom. Reproducible research policies and software/data management in scientific computing journals: a survey, discussion, and perspectives.Frontiers in Computer Science, 6:1491823, 2025

work page 2025
[11]

Layered similarity detection for programming plagiarism and collusion on weekly assessments.Computer Applications in Engineering Education, 30(6):1739–1752, 2022

Oscar Karnalim, Simon, and William Chivers. Layered similarity detection for programming plagiarism and collusion on weekly assessments.Computer Applications in Engineering Education, 30(6):1739–1752, 2022

work page 2022
[12]

Retrieval-augmented code generation: A survey with focus on repository- level approaches.arXiv preprint arXiv:2510.04905, 2025

Yicheng Tao, Yao Qin, and Yepang Liu. Retrieval-augmented code generation: A survey with focus on repository- level approaches.arXiv preprint arXiv:2510.04905, 2025

work page arXiv 2025
[13]

A survey on large language models for software engineering,

Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. A survey on large language models for software engineering.arXiv preprint arXiv:2312.15223, 2023

work page arXiv 2023
[14]

Using an llm to help with code understanding

Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. Using an llm to help with code understanding. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering, pages 1–13, 2024

work page 2024
[15]

Pseudobridge: Pseudo code as the bridge for better semantic and logic alignment in code retrieval.arXiv preprint arXiv:2509.20881, 2025

Yixuan Li, Xinyi Liu, Weidong Yang, Ben Fei, Shuhao Li, Mingjie Zhou, and Lipeng Ma. Pseudobridge: Pseudo code as the bridge for better semantic and logic alignment in code retrieval.arXiv preprint arXiv:2509.20881, 2025

work page arXiv 2025
[16]

Towards realistic project-level code generation via multi-agent collaboration and semantic architecture modeling.arXiv preprint arXiv:2511.03404, 2025

Qianhui Zhao, Li Zhang, Fang Liu, Junhang Cheng, Chengru Wu, Junchen Ai, Qiaoyuanhe Meng, Lichen Zhang, Xiaoli Lian, Shubin Song, et al. Towards realistic project-level code generation via multi-agent collaboration and semantic architecture modeling.arXiv preprint arXiv:2511.03404, 2025

work page arXiv 2025
[17]

Mind the gap: Under- standing the modality gap in multi-modal contrastive representation learning.Advances in Neural Information Processing Systems, 35:17612–17625, 2022

Victor Weixin Liang, Yuhui Zhang, Yongchan Kwon, Serena Yeung, and James Y Zou. Mind the gap: Under- standing the modality gap in multi-modal contrastive representation learning.Advances in Neural Information Processing Systems, 35:17612–17625, 2022

work page 2022
[18]

Codebert: A pre-trained model for programming and natural languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Codebert: A pre-trained model for programming and natural languages. InFindings of the association for computational linguistics: EMNLP 2020, pages 1536–1547, 2020

work page 2020
[19]

Unixcoder: Unified cross-modal pre- training for code representation

Daya Guo, Shuai Lu, Nan Duan, Yanlin Wang, Ming Zhou, and Jian Yin. Unixcoder: Unified cross-modal pre- training for code representation. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7212–7225, 2022. 17

work page 2022
[20]

Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation

Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. Codet5: Identifier-aware unified pre-trained encoder- decoder models for code understanding and generation. InProceedings of the 2021 conference on empirical methods in natural language processing, pages 8696–8708, 2021

work page 2021
[21]

Codet5+: Open code large language models for code understanding and generation

Yue Wang, Hung Le, Akhilesh Gotmare, Nghi Bui, Junnan Li, and Steven Hoi. Codet5+: Open code large language models for code understanding and generation. InProceedings of the 2023 conference on empirical methods in natural language processing, pages 1069–1088, 2023

work page 2023
[22]

Code representation learning at scale.arXiv preprint arXiv:2402.01935, 2024

Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, and Bing Xiang. Code representation learning at scale.arXiv preprint arXiv:2402.01935, 2024

work page arXiv 2024
[23]

An empirical study of exploring the capabilities of large language models in code learning.IEEE Transactions on Software Engineering, 2025

Shangqing Liu, Daya Guo, Jian Zhang, Wei Ma, Yanzhou Li, and Yang Liu. An empirical study of exploring the capabilities of large language models in code learning.IEEE Transactions on Software Engineering, 2025

work page 2025
[24]

Code representation pre-training with complements from program executions

Jiabo Huang, Jianyu Zhao, Yuyang Rong, Yiwen Guo, Yifeng He, and Hao Chen. Code representation pre-training with complements from program executions. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 267–278, 2024

work page 2024
[25]

Investigating the impact of code comment inconsistency on bug introducing.arXiv preprint arXiv:2409.10781, 2024

Shiva Radmanesh, Aaron Imani, Iftekhar Ahmed, and Mohammad Moshirpour. Investigating the impact of code comment inconsistency on bug introducing.arXiv preprint arXiv:2409.10781, 2024

work page arXiv 2024
[26]

Detecting fragile comments

Inderjot Kaur Ratol and Martin P Robillard. Detecting fragile comments. In2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 112–122. IEEE, 2017

work page 2017
[27]

Code comment inconsistency detection with bert and longformer.arXiv preprint arXiv:2207.14444, 2022

Theo Steiner and Rui Zhang. Code comment inconsistency detection with bert and longformer.arXiv preprint arXiv:2207.14444, 2022

work page arXiv 2022
[28]

Deep just-in-time inconsistency detection between comments and source code

Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, and Raymond J Mooney. Deep just-in-time inconsistency detection between comments and source code. InProceedings of the AAAI conference on artificial intelligence, volume 35, pages 427–435, 2021

work page 2021
[29]

Code comment inconsistency detection based on confidence learning.IEEE Transactions on Software Engineering, 50(3):598–617, 2024

Zhengkang Xu, Shikai Guo, Yumiao Wang, Rong Chen, Hui Li, Xiaochen Li, and He Jiang. Code comment inconsistency detection based on confidence learning.IEEE Transactions on Software Engineering, 50(3):598–617, 2024

work page 2024
[30]

Code comment inconsistency detection and rectification using a large language model

Guoping Rong, Yongda Yu, Song Liu, Xin Tan, Tianyi Zhang, Haifeng Shen, and Jidong Hu. Code comment inconsistency detection and rectification using a large language model. InProceedings of the IEEE/ACM 47th International Conference on Software Engineering, pages 1832–1843, 2025

work page 2025
[31]

Genomic reproducibility in the bioinformatics era.Genome Biology, 25(1):213, 2024

Pelin Icer Baykal, Paweł Piotr Łabaj, Florian Markowetz, Lynn M Schriml, Daniel J Stekhoven, Serghei Mangul, and Niko Beerenwinkel. Genomic reproducibility in the bioinformatics era.Genome Biology, 25(1):213, 2024

work page 2024
[32]

Bioarchlinux: community-driven fresh reproducible software repository for life sciences

Guoyi Zhang, Pekka Ristola, Han Su, Bipin Kumar, Boyu Zhang, Yujin Hu, Michael G Elliot, Viktor Drobot, Jie Zhu, Jens Staal, et al. Bioarchlinux: community-driven fresh reproducible software repository for life sciences. Bioinformatics, 41(3):btaf106, 2025

work page 2025
[33]

SciCoQA: Quality Assurance for Scientific Paper--Code Alignment

Tim Baumgärtner and Iryna Gurevych. Scicoqa: Quality assurance for scientific paper–code alignment.arXiv preprint arXiv:2601.12910, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[34]

Reaching software quality for bioinformatics applications: How far are we?IEEE Transactions on Software Engineering, 2025

Xiaoyan Zhu, Tianxiang Xu, Xin Lai, Xin Lian, Hangyu Cheng, and Jiayin Wang. Reaching software quality for bioinformatics applications: How far are we?IEEE Transactions on Software Engineering, 2025

work page 2025
[35]

Focal loss for dense object detection

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017

work page 2017
[36]

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. Codegen: An open large language model for code with multi-turn program synthesis.arXiv preprint arXiv:2203.13474, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[37]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

work page 2020
[38]

The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment.Ieee Access, 9:78368–78381, 2021

Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. The matthews correlation coefficient (mcc) is more informative than cohen’s kappa and brier score in binary classification assessment.Ieee Access, 9:78368–78381, 2021

work page 2021
[39]

On calibration of modern neural networks

Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q Weinberger. On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR, 2017. 18

work page 2017
[40]

Prediction of bacterial protein–compound interactions with only positive samples.Bioinformatics, 42(3):btag067, 2026

Ki-Hwa Kim, Avinash Yaganapu, Sai Kosaraju, Aashish Bhatt, Yun Lyna Luo, Sai Phani Parsa, Juyeon Park, Hyun Lee, Jun Hyuck Lee, Tae-Jin Oh, et al. Prediction of bacterial protein–compound interactions with only positive samples.Bioinformatics, 42(3):btag067, 2026

work page 2026
[41]

A deep learning framework for comprehensive prediction of human rna g-quadruplex-binding proteins.Bioinformatics, 42(3):btag088, 2026

Serena Rosignoli, Sophie Taraglio, Francesco Di Luzio, Elisa Lustrino, Dario Marzella, Arne Elofsson, Massimo Panella, and Alessandro Paiardini. A deep learning framework for comprehensive prediction of human rna g-quadruplex-binding proteins.Bioinformatics, 42(3):btag088, 2026

work page 2026
[42]

Scalable analysis of whole slide spatial proteomics with harpy.Bioinformatics, 42(3):btag122, 2026

Benjamin Rombaut, Arne Defauw, Frank Vernaillen, Julien Mortier, Evelien Van Hamme, Sofie Van Gassen, Ruth Seurinck, and Yvan Saeys. Scalable analysis of whole slide spatial proteomics with harpy.Bioinformatics, 42(3):btag122, 2026

work page 2026
[43]

Insitupy: a framework for histology-guided, multi-sample analysis of single-cell spatial omics data.Bioinformatics, 42(3):btag073, 2026

Johannes Wirth, Anna Chernysheva, Birthe Lemke, Isabel Giray, and Katja Steiger. Insitupy: a framework for histology-guided, multi-sample analysis of single-cell spatial omics data.Bioinformatics, 42(3):btag073, 2026

work page 2026
[44]

Terminatornet: comprehensive identification of intrinsic transcription terminators in bacteria

Brian Tjaden. Terminatornet: comprehensive identification of intrinsic transcription terminators in bacteria. Bioinformatics, 42(3):btag116, 2026. 19

work page 2026