pith. sign in

arxiv: 2601.14653 · v2 · pith:HBTHV2G4new · submitted 2026-01-21 · 💻 cs.LG · q-bio.GN

Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport

classification 💻 cs.LG q-bio.GN
keywords dataimputationmissingdatasetsoptimalbiologicalchallengescluster-regularized
0
0 comments X
read the original abstract

Missing data in single-cell sequencing datasets poses significant challenges for extracting meaningful biological insights. However, existing imputation approaches, which often assume uniformity and data completeness, struggle to address cases with large patches of missing data. In this paper, we present CROT (Cluster-Regularized Optimal Transport), an optimal transport-based imputation algorithm designed to handle patch-based missing data in tabular formats. Our approach effectively captures the underlying data structure in the presence of significant missingness. Notably, it achieves superior imputation accuracy while significantly reducing runtime, demonstrating its scalability and efficiency for large-scale datasets. This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence, addressing critical challenges in both biological and clinical data analysis. Our code is available on GitHub, https://github.com/yuyuliu11037/CROT.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

    cs.AI 2026-04 unverdicted novelty 6.0

    HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.