Endeavor: Efficient PairHMM for Detection of DNA Variants in Genome-Scale Datasets

Aleksandar Ilic; Miguel Gra\c{c}a

arxiv: 2606.25738 · v1 · pith:5QMMH35Pnew · submitted 2026-06-24 · 💻 cs.DC

Endeavor: Efficient PairHMM for Detection of DNA Variants in Genome-Scale Datasets

Miguel Gra\c{c}a , Aleksandar Ilic This is my paper

Pith reviewed 2026-06-25 19:52 UTC · model grok-4.3

classification 💻 cs.DC

keywords PairHMMvariant callingparallel algorithmsSIMDDNA sequencesbioinformatics pipelinesCPUsGPUs

0 comments

The pith

Endeavor redefines PairHMM to unlock row-level parallelism for accurate variant calling on sequences up to 100k basepairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a new formulation of the Pair Hidden Markov Model that shifts from anti-diagonal to row-level parallelism while keeping the same numerical results. This change opens the way to SIMD vectorization on both CPUs and GPUs. The approach processes DNA sequences far longer than prior methods could handle efficiently. In large genomic datasets the restructured computation reduces the time spent on the main bottleneck of variant detection pipelines. A reader would care because genomic data volumes are growing rapidly and current tools cannot keep pace on standard hardware.

Core claim

Endeavor redefines the traditional PairHMM formulation to explore row-level fine-grained parallelism without loss in solution accuracy. Based on this, a novel and portable SIMD-based approach is derived for efficient and high-performance processing of short and long sequences in CPUs and GPUs, leveraging novel levels of parallelism and synchronization to achieve high throughput in sequences up to 100k basepairs for the first time.

What carries the argument

The redefinition of the PairHMM recurrence relations that exposes independent row computations instead of the conventional anti-diagonal wavefront.

If this is right

CPUs achieve up to 2.14 times higher peak throughput than GKL.
Real-world GATK HaplotypeCaller runs become at least twice as fast.
GPUs deliver up to 2.05 times speedup over existing GPU PairHMM implementations.
Sequences of 100k basepairs become practical on commodity hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same row-wise reformulation could be applied to other dynamic-programming bioinformatics kernels that currently rely on anti-diagonal parallelism.
Portable SIMD code generated from the new formulation might reduce the need for separate CPU and GPU code paths in production pipelines.
If the numerical invariance holds under reduced precision, further speedups on low-precision accelerators become possible.

Load-bearing premise

Changing the order of PairHMM operations preserves exact numerical accuracy while exposing new parallelism.

What would settle it

Running the original and redefined formulations on identical input sequences of length 50k basepairs and checking whether the computed variant probabilities differ by more than floating-point roundoff.

Figures

Figures reproduced from arXiv: 2606.25738 by Aleksandar Ilic, Miguel Gra\c{c}a.

**Figure 1.** Figure 1: Antidiagonal Dependencies of PairHMM. the conditional probability of the read at position 𝑖, given the haplotype at position 𝑗, calculated as 𝑃 (𝑟𝑖 |ℎ𝑗) = ( 10 − (𝑄𝑖 −33) 10 /3 if 𝑟𝑖 ≠ ℎ𝑗 1 − 10 − (𝑄𝑖 −33) 10 if 𝑟𝑖 = ℎ𝑗 (3) where 𝑄𝑖 is a base quality score for the read at position 𝑖. The final result is a likelihood, 𝐿, which is the cumulative probability of all sequence alignments, calculated as 𝐿 = ∑︁ 𝑗… view at source ↗

**Figure 2.** Figure 2: Rowwise Dependencies of Endeavor [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Multithreading+SIMD-Based (CPU) and Warp [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Endeavor’s GPU Pipeline for 𝑀 (for this example, 𝑁 = 4, with 2 threads per read-haplotype pair). defined as𝑚256 arrays of size 𝐾 where, for example, the first 32 bits in each position define the 𝑀 and 𝐼 rows for the first read-haplotype pair to process. The first read and quality characters are read in lines 15 to 17. In line 18, a gather operation (𝑚256_𝑔𝑎𝑡ℎ𝑒𝑟_𝑝𝑠) loads from memory the powers of 10 associ… view at source ↗

**Figure 5.** Figure 5: TCUPS evolution with the elements processed by each thread ( [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: TCUPS evolution in Intel Xeon Gold 6438 for En [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: TCUPS evolution in AMD EPYC Zen 4 for Endeavor [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 9.** Figure 9: TCUPS evolution with sequence length in Endeavor [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: CARM Roofline for Endeavor and gpuPairHMM. [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Speedup of Endeavor-CPU over GATK AVX-512 [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Speedup of Endeavor-GPU over gpuPairHMM (higher is better). and datasets from the original paper [45]. Finally, the 10s dataset [9], a typical benchmark used in the literature to test novel approaches for PairHMM, is also evaluated with Endeavor and compared to gpuPairHMM and GKL, as well as other well-known PairHMM implementations in the literature (based on CDP [27] and on interand intra-task paralleli… view at source ↗

read the original abstract

DNA variant calling represents a key operation in bioinformatics pipelines that aims at identifying genetic variants. Given an evidenced explosion in genomic data availability, there is an urgent need for a high-performant, portable and efficient solution for variant calling, which can further improve our understanding of genomic structure and genetic basis for complex diseases. In its most common formulation, the Pair Hidden Markov Model (PairHMM) algorithm for variant calling stands as the main bottleneck in the pipeline, accounting for up to 70% of the execution time in large-scale genomic datasets. The state-of-the-art approaches for accelerating PairHMM in CPUs and GPUs do not scale to long DNA sequences and only explore very limited anti-diagonal data parallelism, which yields poor performance. In this work, Endeavor is proposed as a new parallelization strategy for PairHMM that redefines its traditional formulation to explore row-level fine-grained parallelism without loss in solution accuracy. Based on this, a novel and portable SIMD-based approach is derived for efficient and high-performance processing of short and long sequences in CPUs and GPUs, leveraging novel levels of parallelism and synchronization to achieve high throughput in sequences up to 100k basepairs for the first time. Evaluation on Intel and AMD CPUs shows that Endeavor outperforms GKL up to 2.14x in peak throughput and GATK HaplotypeCaller by at least 2x in real-world datasets, while NVIDIA and AMD GPUs achieve up to 2.05x speedups in genome-scale datasets when compared to state-of-the-art GPU-based methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Endeavor offers a row-level reformulation of PairHMM enabling better parallelism for long sequences with practical speedups and no apparent flaws in the core argument.

read the letter

The main thing your colleague should know is that Endeavor redefines the PairHMM to explore row-level fine-grained parallelism. This allows a SIMD approach that works on long sequences up to 100k basepairs for the first time, with reported speedups of around 2x on CPUs and GPUs.

The paper does well in focusing on the practical bottleneck in DNA variant calling, which can take up to 70% of pipeline time. It targets both short and long sequences and demonstrates gains over existing tools like GKL and GATK on real-world datasets across Intel, AMD, NVIDIA, and AMD hardware. The stress-test note indicates that the argument for preserving accuracy holds up without internal contradictions once the full details are considered.

Soft spots are limited. The abstract does not include the specific reformulation equations or detailed error analysis, so the exact mechanism for maintaining numerical accuracy while changing the parallelism order needs the full manuscript to fully assess. The evaluation results are presented at a summary level without deep methodology description, but this is common and not a major concern here.

This work is aimed at bioinformatics practitioners and hardware-aware algorithm developers dealing with genome-scale data. Readers who need faster PairHMM implementations on heterogeneous systems would find it relevant. It deserves a serious referee because the performance improvement is practical and the core idea appears sound.

I recommend putting it through peer review to verify the implementation and results in detail.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces Endeavor, a new parallelization strategy for the Pair Hidden Markov Model (PairHMM) algorithm used in DNA variant calling. It redefines the traditional formulation to enable row-level fine-grained parallelism without loss in solution accuracy, deriving a portable SIMD-based approach for CPUs and GPUs that achieves high throughput on sequences up to 100k basepairs. Evaluations report speedups of up to 2.14x over GKL on Intel/AMD CPUs and 2.05x over prior GPU methods on NVIDIA/AMD GPUs in genome-scale datasets.

Significance. If the row-level reformulation preserves numerical accuracy while unlocking the claimed parallelism and scalability, the work could meaningfully accelerate a key bottleneck (up to 70% of runtime) in bioinformatics pipelines for large genomic datasets. The emphasis on portability across CPU and GPU architectures and handling of long sequences represents a practical advance over prior anti-diagonal limited approaches.

minor comments (2)

[Abstract] Abstract: performance numbers and accuracy-preservation claims are asserted without any reference to the specific reformulation equations, error bounds, or benchmark methodology; adding a one-sentence pointer to the relevant section would improve readability.
The manuscript would benefit from an explicit statement (perhaps in the evaluation section) of the sequence-length distribution in the real-world datasets used for the GATK HaplotypeCaller comparison.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary and recommendation of minor revision. The report highlights the potential significance of the row-level reformulation for PairHMM and its portability across architectures, which aligns with our goals. No major comments were provided in the report, so we have no specific points to address point-by-point. We will incorporate any minor suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper claims a row-level reformulation of PairHMM that enables fine-grained SIMD parallelism on long sequences while preserving exact numerical accuracy. No equations, fitted parameters, self-citations, or ansatzes appear in the supplied abstract or skeptic analysis that reduce any prediction or uniqueness claim to the inputs by construction. The central premise is granted as a novel reformulation, after which standard SIMD/GPU techniques are applied; the argument structure is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or background assumptions; ledger is empty by necessity.

pith-pipeline@v0.9.1-grok · 5812 in / 1043 out tokens · 32649 ms · 2026-06-25T19:52:01.820198+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 1 canonical work pages

[1]

Andrew Adinetz. 2014. Adaptive parallel computation with CUDA dynamic parallelism.NVIDIA Corporation) Retrieved January4 (2014), 2016

2014
[2]

Srinivas Aluru, Natsuhiko Futamura, and Kishan Mehrotra. 2003. Parallel bio- logical sequence comparison using prefix computations.J. Parallel and Distrib. Comput.63, 3 (2003), 264–272

2003
[3]

Euan A Ashley. 2016. Towards precision medicine.Nature Reviews Genetics17, 9 (2016), 507–522

2016
[4]

Subho S Banerjee, Mohamed El-Hadedy, Ching Y Tan, Zbigniew T Kalbarczyk, Steve Lumetta, and Ravishankar K Iyer. 2017. On accelerating pair-HMM compu- tations in programmable hardware. In2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–8

2017
[5]

Ravi Bhargava and Kai Troester. 2024. AMD next-generation “Zen 4” core and 4th gen AMD EPYC server CPUs.IEEE Micro44, 3 (2024), 8–17

2024
[6]

Beatrice Branchini, Alberto Zeni, and Marco D Santambrogio. 2021. A Methodol- ogy for Accelerating Variant Calling on GPU. (2021)

2021
[7]

Benjamin Buchfink, Klaus Reuter, and Hajk-Georg Drost. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND.Nature methods18, 4 (2021), 366–368

2021
[8]

Christiam Camacho, Grzegorz M Boratyn, Victor Joukov, Roberto Vera Alvarez, and Thomas L Madden. 2023. ElasticBLAST: accelerating sequence search via cloud computing.BMC bioinformatics24, 1 (2023), 117

2023
[9]

M Carneiro. 2013. Optimization of a Haplotype Pair-HMM class for GPU/FPGA and AVX processing. https://github.com/MauricioCarneiro/PairHMM

2013
[10]

Tiago Carneiro Pessoa, Jan Gmys, Francisco Heron de Carvalho Júnior, Nouredine Melab, and Daniel Tuyttens. 2018. GPU-accelerated backtracking using CUDA Dynamic Parallelism.Concurrency and Computation: Practice and Experience30, 9 (2018), e4374

2018
[11]

Ming-Hung Chen, Mao-Jan Lin, Yu-Cheng Li, and Yi-Chang Lu. 2019. Banded Pair- HMM Algorithm for DNA Variant Calling and Its Hardware Accelerator Design. In2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 563–566

2019
[12]

1000 Genomes Project Consortium et al . 2015. A global reference for human genetic variation.Nature526, 7571 (2015), 68

2015
[13]

Mark A DePristo, Eric Banks, Ryan Poplin, Kiran V Garimella, Jared R Maguire, Christopher Hartl, Anthony A Philippakis, Guillermo Del Angel, Manuel A Rivas, Matt Hanna, et al. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nature genetics43, 5 (2011)

2011
[14]

1998.Biolog- ical sequence analysis: probabilistic models of proteins and nucleic acids

Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. 1998.Biolog- ical sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press

1998
[15]

Sean R Eddy. 2004. What is dynamic programming?Nature biotechnology22, 7 (2004), 909–910

2004
[16]

Patrick Foley, Abirami Prabhakaran, Karthik Gururaj, Mishali Naik, Shiva Gopalan, Aleksandr Shargorodskiy, and Ernesto Brau. 2017. Accelerate Genomics Research with the Broad-Intel Genomics Stack

2017
[17]

Efstathia Giannopoulou, Theodora Katsila, Christina Mitropoulou, Evangelia- Eirini Tsermpini, and George P Patrinos. 2019. Integrating next-generation sequencing in the clinical pharmacogenomics workflow.Frontiers in pharmacol- ogy10 (2019), 384

2019
[18]

Richard A Gibbs. 2020. The human genome project changed everything.Nature Reviews Genetics21, 10 (2020), 575–576

2020
[19]

Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA.GPU gems3, 39 (2007), 851–876

2007
[20]

Taishan Hu, Nilesh Chitnis, Dimitri Monos, and Anh Dinh. 2021. Next-generation sequencing technologies: An overview.Human Immunology82, 11 (2021)

2021
[21]

Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rup- now, Wen-mei W Hwu, and Deming Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. InProceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

2017
[22]

Miten Jain, Sergey Koren, Karen H Miga, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, Ian T Fiddes, et al
[23]

Nanopore sequencing and assembly of a human genome with ultra-long reads.Nature biotechnology36, 4 (2018), 338–345

2018
[24]

Hákon Jónsson, Patrick Sulem, Birte Kehr, Snaedis Kristmundsdottir, Florian Zink, Eirikur Hjartarson, Marteinn T Hardarson, Kristjan E Hjorleifsson, Hannes P Eg- gertsson, Sigurjon Axel Gudjonsson, et al. 2017. Whole genome characterization of sequence diversity of 15,220 Icelanders.Scientific data4, 1 (2017), 1–9

2017
[25]

Ali Khajeh-Saeed, Stephen Poole, and J Blair Perot. 2010. Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors.J. Comput. Phys.229, 11 (2010), 4247–4258

2010
[26]

Daniel C Koboldt. 2020. Best practices for variant calling in clinical sequencing. Genome Medicine12, 1 (2020), 91

2020
[27]

Enliang Li, Subho S Banerjee, Sitao Huang, Ravishankar K Iyer, and Deming Chen
[28]

In2021 IEEE 39th International Conference on Computer Design (ICCD)

Improved gpu implementations of the pair-hmm forward algorithm for dna sequence alignment. In2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 299–306
[29]

Zhuren Liu, Shouzhe Zhang, Justin Garrigus, and Hui Zhao. 2023. Genomics- GPU: A Benchmark Suite for GPU-accelerated Genome Analysis. In2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 178–188

2023
[30]

Chengwei Luo, Despina Tsementzi, Nikos Kyrpides, Timothy Read, and Kon- stantinos T Konstantinidis. 2012. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.PloS one7, 2 (2012), e30087

2012
[31]

Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt Von Haeseler, and Robert Lanfear. 2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era.Molecular biology and evolution37, 5 (2020), 1530–1534

2020
[32]

José Morgado, Leonel Sousa, and Aleksandar Ilic. 2024. CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis. In2024 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 68–81

2024
[33]

Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V Bzikadze, Alla Mikheenko, Mitchell R Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gersh- man, et al. 2022. The complete sequence of a human genome.Science376, 6588 (2022), 44–53

2022
[34]

National Institute of Health. 2024. NA12878 Pacific Biosciences BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/PacBio_ SequelII_CCS_11kb/HG001.SequelII.pbmm2.hs37d5.whatshap.haplotag.RTG. trio.bam

2024
[35]

National Institute of Health. 2024. NA24149 Chromium Long Ranger BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ AshkenazimTrio/analysis/10XGenomics_ChromiumGenome_LongRanger2.0_ 06202016/HG003_NA24149_father/NA24149_GRCh37.bam

2024
[36]

National Institute of Health. 2024. NA24695 Oxford Nanopore Tech- nologies BAM Dataset. Available at https://ftp-trace.ncbi.nlm. nih.gov/giab/ftp/data/ChineseTrio/HG007_NA24695-hu38168_mother/ UCSC_Ultralong_OxfordNanopore_Promethion/HG007_GRCh37_ONT- UL_UCSC_20200109.phased.bam

2024
[37]

National Institute of Health. 2024. NIH NA12878 Illumina BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_ NA12878_HG001_HiSeq_300x/RMNISTHS_30xdownsample.bam. Endeavor: Efficient PairHMM for Detection of DNA Variants in Genome-Scale Datasets HPDC ’26, July 13–16, 2026, Cleveland, OH, USA

2024
[38]

National Institute of Health. 2024. NIH NA12878 Ion Torrent BAM Dataset. Avail- able at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/ion_exome/ IonXpress_020_rawlib.b37.bam

2024
[39]

National Institute of Health. 2024. NIH NA12878 SoLiD BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/NA12878_data_other_ projects/alignment/NA12878.SOLID.SRP012400.Xprize_SRR643700.bam

2024
[40]

National Institute of Health. 2024. NIH NA24631 BGISEQ500 BAM Dataset. https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/HG005_NA24631_ son/NIST_BGIseq_2x150bp_100x/GRCh38/HG005_GRCh38_BGIseq-2x150- 100x_NIST_20211126.bam

2024
[41]

Nathan D Olson, Justin Wagner, Nathan Dwarshuis, Karen H Miga, Fritz J Sed- lazeck, Marc Salit, and Justin M Zook. 2023. Variant calling and benchmarking in an era of complete human genome sequences.Nature Reviews Genetics24, 7 (2023), 464–483

2023
[42]

Johan Peltenburg, Shanshan Ren, and Zaid Al-Ars. 2016. Maximizing systolic array efficiency to accelerate the PairHMM forward algorithm. In2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE

2016
[43]

Shanshan Ren, Koen Bertels, and Zaid Al-Ars. 2017. GPU-accelerated GATK haplotypecaller with load-balanced multi-process optimization. In2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 497–502

2017
[44]

Shanshan Ren, Koen Bertels, and Zaid Al-Ars. 2018. Efficient acceleration of the pair-hmms forward algorithm for gatk haplotypecaller on graphics processing units.Evolutionary Bioinformatics14 (2018), 1176934318760543

2018
[45]

Tony Robinson, Jim Harkin, and Priyank Shukla. 2021. Hardware acceleration of genomics data analysis: challenges and opportunities.Bioinformatics37, 13 (2021), 1785–1795

2021
[46]

Davide Sampietro, Chiara Crippa, Lorenzo Di Tucci, Emanuele Del Sozzo, and Marco D Santambrogio. 2018. Fpga-based pairhmm forward algorithm for dna variant calling. In2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1–8

2018
[47]

Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, and Christian Hundt. 2026. gpuPairHMM: High-Speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs.IEEE Transactions on Computational Biology and Bioinformatics(2026), 1–8. doi:10.1109/TCBBIO.2026.3657252

work page doi:10.1109/tcbbio.2026.3657252 2026
[48]

Roman Snytsar. 2023. PairHMM Improvements for Modern Instruction Set Archi- tectures. In2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 3328–3331

2023
[49]

TOP500.org. [n. d.]. TOP500 June 2025. https://www.top500.org/lists/top500/ 2025/06/. [Online; Jun-2025]

2025
[50]

Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 51–60

2014
[51]

Rick Wertenbroek and Yann Thoma. 2019. Acceleration of the Pair-HMM forward algorithm on FPGA with cloud integration for GATK. In2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 534–541

2019
[52]

Chunlin Xiao, Justin Zook, Shane Trask, Stephen Sherry, and Genome in-a Bot- tle Consortium. 2014. GIAB: Genome reference material development resources for clinical sequencing.Cancer Research74, 19_Supplement (2014), 5328–5328

2014
[53]

Byung-Jun Yoon. 2009. Hidden Markov models and their applications in biological sequence analysis.Current genomics10, 6 (2009), 402–415

2009
[54]

Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, and Guangming Tan
[55]

InProceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs. InProceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 466–479

[1] [1]

Andrew Adinetz. 2014. Adaptive parallel computation with CUDA dynamic parallelism.NVIDIA Corporation) Retrieved January4 (2014), 2016

2014

[2] [2]

Srinivas Aluru, Natsuhiko Futamura, and Kishan Mehrotra. 2003. Parallel bio- logical sequence comparison using prefix computations.J. Parallel and Distrib. Comput.63, 3 (2003), 264–272

2003

[3] [3]

Euan A Ashley. 2016. Towards precision medicine.Nature Reviews Genetics17, 9 (2016), 507–522

2016

[4] [4]

Subho S Banerjee, Mohamed El-Hadedy, Ching Y Tan, Zbigniew T Kalbarczyk, Steve Lumetta, and Ravishankar K Iyer. 2017. On accelerating pair-HMM compu- tations in programmable hardware. In2017 27th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1–8

2017

[5] [5]

Ravi Bhargava and Kai Troester. 2024. AMD next-generation “Zen 4” core and 4th gen AMD EPYC server CPUs.IEEE Micro44, 3 (2024), 8–17

2024

[6] [6]

Beatrice Branchini, Alberto Zeni, and Marco D Santambrogio. 2021. A Methodol- ogy for Accelerating Variant Calling on GPU. (2021)

2021

[7] [7]

Benjamin Buchfink, Klaus Reuter, and Hajk-Georg Drost. 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND.Nature methods18, 4 (2021), 366–368

2021

[8] [8]

Christiam Camacho, Grzegorz M Boratyn, Victor Joukov, Roberto Vera Alvarez, and Thomas L Madden. 2023. ElasticBLAST: accelerating sequence search via cloud computing.BMC bioinformatics24, 1 (2023), 117

2023

[9] [9]

M Carneiro. 2013. Optimization of a Haplotype Pair-HMM class for GPU/FPGA and AVX processing. https://github.com/MauricioCarneiro/PairHMM

2013

[10] [10]

Tiago Carneiro Pessoa, Jan Gmys, Francisco Heron de Carvalho Júnior, Nouredine Melab, and Daniel Tuyttens. 2018. GPU-accelerated backtracking using CUDA Dynamic Parallelism.Concurrency and Computation: Practice and Experience30, 9 (2018), e4374

2018

[11] [11]

Ming-Hung Chen, Mao-Jan Lin, Yu-Cheng Li, and Yi-Chang Lu. 2019. Banded Pair- HMM Algorithm for DNA Variant Calling and Its Hardware Accelerator Design. In2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 563–566

2019

[12] [12]

1000 Genomes Project Consortium et al . 2015. A global reference for human genetic variation.Nature526, 7571 (2015), 68

2015

[13] [13]

Mark A DePristo, Eric Banks, Ryan Poplin, Kiran V Garimella, Jared R Maguire, Christopher Hartl, Anthony A Philippakis, Guillermo Del Angel, Manuel A Rivas, Matt Hanna, et al. 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data.Nature genetics43, 5 (2011)

2011

[14] [14]

1998.Biolog- ical sequence analysis: probabilistic models of proteins and nucleic acids

Richard Durbin, Sean R Eddy, Anders Krogh, and Graeme Mitchison. 1998.Biolog- ical sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge university press

1998

[15] [15]

Sean R Eddy. 2004. What is dynamic programming?Nature biotechnology22, 7 (2004), 909–910

2004

[16] [16]

Patrick Foley, Abirami Prabhakaran, Karthik Gururaj, Mishali Naik, Shiva Gopalan, Aleksandr Shargorodskiy, and Ernesto Brau. 2017. Accelerate Genomics Research with the Broad-Intel Genomics Stack

2017

[17] [17]

Efstathia Giannopoulou, Theodora Katsila, Christina Mitropoulou, Evangelia- Eirini Tsermpini, and George P Patrinos. 2019. Integrating next-generation sequencing in the clinical pharmacogenomics workflow.Frontiers in pharmacol- ogy10 (2019), 384

2019

[18] [18]

Richard A Gibbs. 2020. The human genome project changed everything.Nature Reviews Genetics21, 10 (2020), 575–576

2020

[19] [19]

Mark Harris, Shubhabrata Sengupta, and John D Owens. 2007. Parallel prefix sum (scan) with CUDA.GPU gems3, 39 (2007), 851–876

2007

[20] [20]

Taishan Hu, Nilesh Chitnis, Dimitri Monos, and Anh Dinh. 2021. Next-generation sequencing technologies: An overview.Human Immunology82, 11 (2021)

2021

[21] [21]

Sitao Huang, Gowthami Jayashri Manikandan, Anand Ramachandran, Kyle Rup- now, Wen-mei W Hwu, and Deming Chen. 2017. Hardware acceleration of the pair-HMM algorithm for DNA variant calling. InProceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

2017

[22] [22]

Miten Jain, Sergey Koren, Karen H Miga, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, Ian T Fiddes, et al

[23] [23]

Nanopore sequencing and assembly of a human genome with ultra-long reads.Nature biotechnology36, 4 (2018), 338–345

2018

[24] [24]

Hákon Jónsson, Patrick Sulem, Birte Kehr, Snaedis Kristmundsdottir, Florian Zink, Eirikur Hjartarson, Marteinn T Hardarson, Kristjan E Hjorleifsson, Hannes P Eg- gertsson, Sigurjon Axel Gudjonsson, et al. 2017. Whole genome characterization of sequence diversity of 15,220 Icelanders.Scientific data4, 1 (2017), 1–9

2017

[25] [25]

Ali Khajeh-Saeed, Stephen Poole, and J Blair Perot. 2010. Acceleration of the Smith–Waterman algorithm using single and multiple graphics processors.J. Comput. Phys.229, 11 (2010), 4247–4258

2010

[26] [26]

Daniel C Koboldt. 2020. Best practices for variant calling in clinical sequencing. Genome Medicine12, 1 (2020), 91

2020

[27] [27]

Enliang Li, Subho S Banerjee, Sitao Huang, Ravishankar K Iyer, and Deming Chen

[28] [28]

In2021 IEEE 39th International Conference on Computer Design (ICCD)

Improved gpu implementations of the pair-hmm forward algorithm for dna sequence alignment. In2021 IEEE 39th International Conference on Computer Design (ICCD). IEEE, 299–306

[29] [29]

Zhuren Liu, Shouzhe Zhang, Justin Garrigus, and Hui Zhao. 2023. Genomics- GPU: A Benchmark Suite for GPU-accelerated Genome Analysis. In2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 178–188

2023

[30] [30]

Chengwei Luo, Despina Tsementzi, Nikos Kyrpides, Timothy Read, and Kon- stantinos T Konstantinidis. 2012. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.PloS one7, 2 (2012), e30087

2012

[31] [31]

Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt Von Haeseler, and Robert Lanfear. 2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era.Molecular biology and evolution37, 5 (2020), 1530–1534

2020

[32] [32]

José Morgado, Leonel Sousa, and Aleksandar Ilic. 2024. CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis. In2024 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 68–81

2024

[33] [33]

Sergey Nurk, Sergey Koren, Arang Rhie, Mikko Rautiainen, Andrey V Bzikadze, Alla Mikheenko, Mitchell R Vollger, Nicolas Altemose, Lev Uralsky, Ariel Gersh- man, et al. 2022. The complete sequence of a human genome.Science376, 6588 (2022), 44–53

2022

[34] [34]

National Institute of Health. 2024. NA12878 Pacific Biosciences BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/PacBio_ SequelII_CCS_11kb/HG001.SequelII.pbmm2.hs37d5.whatshap.haplotag.RTG. trio.bam

2024

[35] [35]

National Institute of Health. 2024. NA24149 Chromium Long Ranger BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ AshkenazimTrio/analysis/10XGenomics_ChromiumGenome_LongRanger2.0_ 06202016/HG003_NA24149_father/NA24149_GRCh37.bam

2024

[36] [36]

National Institute of Health. 2024. NA24695 Oxford Nanopore Tech- nologies BAM Dataset. Available at https://ftp-trace.ncbi.nlm. nih.gov/giab/ftp/data/ChineseTrio/HG007_NA24695-hu38168_mother/ UCSC_Ultralong_OxfordNanopore_Promethion/HG007_GRCh37_ONT- UL_UCSC_20200109.phased.bam

2024

[37] [37]

National Institute of Health. 2024. NIH NA12878 Illumina BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/NIST_ NA12878_HG001_HiSeq_300x/RMNISTHS_30xdownsample.bam. Endeavor: Efficient PairHMM for Detection of DNA Variants in Genome-Scale Datasets HPDC ’26, July 13–16, 2026, Cleveland, OH, USA

2024

[38] [38]

National Institute of Health. 2024. NIH NA12878 Ion Torrent BAM Dataset. Avail- able at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/NA12878/ion_exome/ IonXpress_020_rawlib.b37.bam

2024

[39] [39]

National Institute of Health. 2024. NIH NA12878 SoLiD BAM Dataset. Available at https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/technical/NA12878_data_other_ projects/alignment/NA12878.SOLID.SRP012400.Xprize_SRR643700.bam

2024

[40] [40]

National Institute of Health. 2024. NIH NA24631 BGISEQ500 BAM Dataset. https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/HG005_NA24631_ son/NIST_BGIseq_2x150bp_100x/GRCh38/HG005_GRCh38_BGIseq-2x150- 100x_NIST_20211126.bam

2024

[41] [41]

Nathan D Olson, Justin Wagner, Nathan Dwarshuis, Karen H Miga, Fritz J Sed- lazeck, Marc Salit, and Justin M Zook. 2023. Variant calling and benchmarking in an era of complete human genome sequences.Nature Reviews Genetics24, 7 (2023), 464–483

2023

[42] [42]

Johan Peltenburg, Shanshan Ren, and Zaid Al-Ars. 2016. Maximizing systolic array efficiency to accelerate the PairHMM forward algorithm. In2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE

2016

[43] [43]

Shanshan Ren, Koen Bertels, and Zaid Al-Ars. 2017. GPU-accelerated GATK haplotypecaller with load-balanced multi-process optimization. In2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE). IEEE, 497–502

2017

[44] [44]

Shanshan Ren, Koen Bertels, and Zaid Al-Ars. 2018. Efficient acceleration of the pair-hmms forward algorithm for gatk haplotypecaller on graphics processing units.Evolutionary Bioinformatics14 (2018), 1176934318760543

2018

[45] [45]

Tony Robinson, Jim Harkin, and Priyank Shukla. 2021. Hardware acceleration of genomics data analysis: challenges and opportunities.Bioinformatics37, 13 (2021), 1785–1795

2021

[46] [46]

Davide Sampietro, Chiara Crippa, Lorenzo Di Tucci, Emanuele Del Sozzo, and Marco D Santambrogio. 2018. Fpga-based pairhmm forward algorithm for dna variant calling. In2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 1–8

2018

[47] [47]

Bertil Schmidt, Felix Kallenborn, Alexander Wichmann, Alejandro Chacon, and Christian Hundt. 2026. gpuPairHMM: High-Speed Pair-HMM Forward Algorithm for DNA Variant Calling on GPUs.IEEE Transactions on Computational Biology and Bioinformatics(2026), 1–8. doi:10.1109/TCBBIO.2026.3657252

work page doi:10.1109/tcbbio.2026.3657252 2026

[48] [48]

Roman Snytsar. 2023. PairHMM Improvements for Modern Instruction Set Archi- tectures. In2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 3328–3331

2023

[49] [49]

TOP500.org. [n. d.]. TOP500 June 2025. https://www.top500.org/lists/top500/ 2025/06/. [Online; Jun-2025]

2025

[50] [50]

Jin Wang and Sudhakar Yalamanchili. 2014. Characterization and analysis of dynamic parallelism in unstructured GPU applications. In2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 51–60

2014

[51] [51]

Rick Wertenbroek and Yann Thoma. 2019. Acceleration of the Pair-HMM forward algorithm on FPGA with cloud integration for GATK. In2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 534–541

2019

[52] [52]

Chunlin Xiao, Justin Zook, Shane Trask, Stephen Sherry, and Genome in-a Bot- tle Consortium. 2014. GIAB: Genome reference material development resources for clinical sequencing.Cancer Research74, 19_Supplement (2014), 5328–5328

2014

[53] [53]

Byung-Jun Yoon. 2009. Hidden Markov models and their applications in biological sequence analysis.Current genomics10, 6 (2009), 402–415

2009

[54] [54]

Zhonghai Zhang, Yewen Li, Ke Meng, Chunming Zhang, and Guangming Tan

[55] [55]

InProceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs. InProceedings of the 31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. 466–479