Recognition: 2 theorem links
· Lean TheoremSpherical Mixture Integration for Latent Embedding Alignment across Multi-Source Feature Spaces
Pith reviewed 2026-05-12 00:50 UTC · model grok-4.3
The pith
A spherical mixture model integrates embeddings from multiple EHR sources to align feature spaces and recover synonym clusters with proven error bounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SMILE models synonymy in multi-source clinical embeddings via a mixture of von Mises-Fisher distributions to produce unified latent representations. A composite quasi-likelihood estimator is developed for the latent embeddings and mixture parameters, for which non-asymptotic error bounds are established, along with consistent recovery of the synonym clusters. The theoretical results demonstrate the gains in statistical efficiency from integrating multiple sources and auxiliary information.
What carries the argument
Mixture of von Mises-Fisher distributions on the sphere for synonym modeling, with composite quasi-likelihood estimation for alignment.
Load-bearing premise
The embeddings from different sources lie in a shared latent space that can be aligned using the spherical geometry and the sparse auxiliary pairs provide sufficient supervision for the mixture components.
What would settle it
Running the method on simulated data with known ground-truth latent embeddings, mixture means, and synonym labels, and checking whether the observed estimation errors exceed the derived non-asymptotic bounds or if cluster recovery accuracy falls below the consistency claim.
Figures
read the original abstract
Multi-institutional electronic health record (Multi-EHR) data have emerged as a powerful resource for developing predictive models to support clinical decisions and for generating reliable real-world evidence. By aggregating information from diverse patient populations and institutions, they enhance the robustness and generalizability of models and findings. However, analyzing multi-EHR remains challenging because disparate institutions rarely map all data elements to common ontology, and raw EHR codes are often overly granular and institution-specific, fragmenting representations of the same clinical concept. Hence, integrative analysis must overcome two key hurdles: harmonizing codes with the same clinical meaning (synonymy), and aligning institutional feature spaces. To address these challenges, we propose SMILE, a Spherical Mixture Integration for Latent Embedding alignment across multi-source feature spaces, where embeddings from heterogeneous sources serve as privacy-preserving summaries of clinical concepts and sparse auxiliary relationship pairs provide weak supervision on the latent geometry. Synonymy is modeled via a mixture of von Mises-Fisher distributions, yielding unified representations that consolidate semantically equivalent raw codes. We develop a composite quasi-likelihood estimation procedure and establish non-asymptotic error bounds for latent representations and mixture mean directions, together with consistent recovery of synonym clusters. The theory quantifies statistical gains from integrating multiple sources and auxiliary knowledge graph information. Simulations and a multi-institutional EHR application demonstrate improved alignment and synonym clustering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SMILE, a spherical mixture model using von Mises-Fisher distributions to align latent embeddings across heterogeneous multi-source feature spaces (e.g., multi-institutional EHR data). Embeddings serve as privacy-preserving summaries, synonymy is captured via mixture components, and sparse auxiliary knowledge-graph pairs provide weak supervision for geometry. A composite quasi-likelihood estimator is developed, with non-asymptotic error bounds derived for latent representations and mixture mean directions, plus consistency results for synonym-cluster recovery. The theory claims to quantify statistical gains from multi-source integration and auxiliary information; simulations and a real EHR application illustrate improved alignment and clustering.
Significance. If the non-asymptotic bounds and identifiability results hold under the stated conditions, the work offers a principled, privacy-aware framework for harmonizing fragmented clinical codes across institutions, which is a pressing need in real-world evidence generation. The vMF mixture is a natural choice for directional embeddings, and providing explicit non-asymptotic rates plus quantification of multi-source gains strengthens the contribution beyond purely empirical alignment methods. Credit is due for combining theoretical guarantees with empirical validation on both simulated and real multi-EHR data.
major comments (2)
- [§4] §4 (Theoretical Analysis), Theorem on non-asymptotic bounds: The error bounds for latent representations and mean directions rest on the auxiliary relationship pairs supplying sufficient weak supervision to resolve spherical rotational invariance and ensure component separation in the vMF mixture. No explicit lower bound on the number, density, or connectivity of these pairs is stated to guarantee the claimed rates uniformly; if the pairs are too sparse, the identifiability step fails and the bounds do not hold, which is load-bearing for the central consistency and gain-quantification claims.
- [§3] §3 (Estimation Procedure), composite quasi-likelihood: The procedure integrates multiple sources and auxiliary pairs, but the derivation does not explicitly address how heterogeneous source-specific concentration parameters or mixture weights are jointly optimized without introducing additional bias terms that could offset the claimed statistical gains from integration.
minor comments (2)
- Notation for the von Mises-Fisher concentration parameter κ and mean direction μ should be introduced with a brief reminder of the density formula at first use to aid readers unfamiliar with directional statistics.
- In the simulation section, the metrics for 'improved alignment' (e.g., Procrustes distance or cluster purity) need explicit definitions and baseline comparisons to make the reported gains interpretable.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review, as well as the positive assessment of the significance of SMILE for multi-EHR alignment. We appreciate the recognition of the vMF mixture approach and the non-asymptotic theory. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Theoretical Analysis), Theorem on non-asymptotic bounds: The error bounds for latent representations and mean directions rest on the auxiliary relationship pairs supplying sufficient weak supervision to resolve spherical rotational invariance and ensure component separation in the vMF mixture. No explicit lower bound on the number, density, or connectivity of these pairs is stated to guarantee the claimed rates uniformly; if the pairs are too sparse, the identifiability step fails and the bounds do not hold, which is load-bearing for the central consistency and gain-quantification claims.
Authors: We agree that the current presentation would benefit from an explicit minimal condition on the auxiliary pairs. The manuscript assumes the pairs resolve rotational invariance and ensure separation but does not state a quantitative lower bound (e.g., on the number of pairs per component or graph connectivity). We will revise the theorem in §4 to include such a condition, for instance requiring that the auxiliary knowledge graph is connected and contains at least Ω(K log K) pairs for K components, under which the stated rates hold uniformly. This makes the assumptions transparent while preserving the core results on multi-source gains. revision: yes
-
Referee: [§3] §3 (Estimation Procedure), composite quasi-likelihood: The procedure integrates multiple sources and auxiliary pairs, but the derivation does not explicitly address how heterogeneous source-specific concentration parameters or mixture weights are jointly optimized without introducing additional bias terms that could offset the claimed statistical gains from integration.
Authors: The composite quasi-likelihood is the sum of source-specific vMF quasi-log-likelihoods plus the auxiliary-pair term; source-specific concentrations κ_s and weights π_s are treated as separate parameters and updated jointly with the shared embeddings and common mean directions via a block-coordinate EM procedure. Because the heterogeneity is explicitly parameterized and the quasi-likelihood remains consistent for the shared latent geometry, no offsetting bias is introduced beyond the standard quasi-likelihood approximation already accounted for in the §4 bounds. To improve clarity we will add a short remark in §3 describing the alternation steps and confirming that the multi-source efficiency gains are retained. We would welcome further specification if a particular bias mechanism is intended. revision: partial
Circularity Check
No circularity: derivation rests on external statistical assumptions rather than self-reference.
full rationale
The abstract and described procedure introduce a new spherical mixture model (von Mises-Fisher) with composite quasi-likelihood estimation and derive non-asymptotic bounds under explicit modeling assumptions on embeddings as privacy-preserving summaries and sparse auxiliary KG pairs as weak supervision. No quoted equations or steps reduce predictions to fitted inputs by construction, invoke self-citations as load-bearing uniqueness theorems, or smuggle ansatzes via prior work. The central claims (error bounds, cluster recovery, multi-source gains) are presented as consequences of standard concentration arguments for mixtures once identifiability is granted by the auxiliary pairs; the sufficiency of those pairs is an assumption, not a tautology. This matches the default expectation for a methods paper whose theory is externally falsifiable via simulation and real-data application.
Axiom & Free-Parameter Ledger
free parameters (1)
- mixture weights and concentration parameters of the von Mises-Fisher components
axioms (2)
- domain assumption Embeddings from heterogeneous sources serve as privacy-preserving summaries of clinical concepts
- domain assumption Sparse auxiliary relationship pairs supply weak supervision on the latent geometry
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Vi ∼ f_r(x; μ_zi, κ), f_r(x; μ, κ) = C_r(κ) exp(κ μ^T x) ... composite quasi-likelihood ... non-asymptotic error bounds for latent representations and mixture mean directions
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_injective unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
identifiable only up to ... orthogonal matrix O ∈ O_{r×r}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Statistical guarantees for the
Balakrishnan, Sivaraman and Wainwright, Martin J and Yu, Bin , journal =. Statistical guarantees for the
- [2]
-
[3]
Wang, Zhaoran and Gu, Quanquan and Ning, Yang and Liu, Han , booktitle =. High dimensional
-
[4]
Ma, Cong and Wang, Kaizheng and Chi, Yuejie and Chen, Yuxin , journal =. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion, and blind deconvolution , volume =
-
[5]
Tropp, Joel A. , journal =. User-friendly tail bounds for sums of random matrices , volume =
-
[6]
Hanson--Wright inequality and sub-Gaussian concentration , volume =
Rudelson, Mark and Vershynin, Roman , journal =. Hanson--Wright inequality and sub-Gaussian concentration , volume =
-
[7]
High-Dimensional Probability: An Introduction with Applications in Data Science , year =
Vershynin, Roman , publisher =. High-Dimensional Probability: An Introduction with Applications in Data Science , year =
-
[8]
NCCN guidelines insights: prostate cancer early detection, version 2.2016 , volume =
Carroll, Peter R and Parsons, J Kellogg and Andriole, Gerald and Bahnson, Robert R and Castle, Erik P and Catalona, William J and Dahl, Douglas M and Davis, John W and Epstein, Jonathan I and Etzioni, Ruth B and others , date-added =. NCCN guidelines insights: prostate cancer early detection, version 2.2016 , volume =. Journal of the National Comprehensiv...
work page 2016
-
[9]
Health care spending in the United States and other high-income countries , volume =
Papanicolas, Irene and Woskie, Liana R and Jha, Ashish K , date-added =. Health care spending in the United States and other high-income countries , volume =. Jama , number =
-
[10]
Spectral Clustering and the High-Dimensional Stochastic Blockmodel , volume =
Rohe, Karl and Chatterjee, Sourav and Yu, Bin , journal =. Spectral Clustering and the High-Dimensional Stochastic Blockmodel , volume =
-
[11]
Stochastic blockmodels with a growing number of classes , volume =
Choi, David S and Wolfe, Patrick J and Airoldi, Edoardo M , journal =. Stochastic blockmodels with a growing number of classes , volume =
-
[12]
Day, Oscar and Khoshgoftaar, Taghi M. , journal =. A survey on heterogeneous transfer learning , volume =
-
[13]
Feuz, Kyle D. and Cook, Diane J. , journal =. Transfer Learning across Feature-Rich Heterogeneous Feature Spaces via Feature-Space Remapping (FSR) , year =
-
[14]
arXiv preprint arXiv:2310.08459 , year=
A recent survey of heterogeneous transfer learning , author=. arXiv preprint arXiv:2310.08459 , year=
-
[15]
Li, Mengyan and Li, Xiaoou and Pan, Kevin and Geva, Alon and Yang, Doris and Sweet, Sara Morini and Bonzel, Clara-Lea and Ayakulangara Panickan, Vidul and Xiong, Xin and Mandl, Kenneth and others , date-added =. Multisource representation learning for pediatric knowledge extraction from electronic health records , volume =. NPJ Digital Medicine , number =
-
[16]
Brat, Gabriel A and Weber, Griffin M and Gehlenborg, Nils and Avillach, Paul and Palmer, Nathan P and Chiovato, Luca and Cimino, James and Waitman, Lemuel R and Omenn, Gilbert S and Malovini, Alberto and others , date-added =. International electronic health record-derived COVID-19 clinical course profiles: the 4CE consortium , volume =. NPJ Digital Medic...
-
[17]
Li, Siqi and Liu, Pinyan and Nascimento, Gustavo G and Wang, Xinru and Leite, Fabio Renato Manzolli and Chakraborty, Bibhas and Hong, Chuan and Ning, Yilin and Xie, Feng and Teo, Zhen Ling and others , date-added =. Federated and distributed learning applications for electronic health records and structured medical data: a scoping review , volume =. Journ...
-
[18]
Si, Yuqi and Du, Jingcheng and Li, Zhao and Jiang, Xiaoqian and Miller, Timothy and Wang, Fei and Zheng, W Jim and Roberts, Kirk , date-added =. Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review , volume =. Journal of Biomedical Informatics , pages =
-
[19]
Kades, Klaus , date-added =. Current Challenges in the Application of Algorithms in Multi-institutional Clinical Settings , year =
-
[20]
Garcia, Brittany and Hogarth, Michael and Wang, Yu and Zhu, Xi and Tu, Shin-Ping , date-added =. Multi-site research using electronic health record data: Lessons learned from a case study , volume =. Learning Health Systems , number =
-
[21]
Sittig, Dean F and Hazlehurst, Brian L and Brown, Jeffrey and Murphy, Shawn and Rosenman, Marc and Tarczy-Hornoch, Peter and Wilcox, Adam B , date-added =. A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data , volume =. Medical Care , pages =
-
[22]
Bianchi, Diana W and Brennan, Patricia Flatley and Chiang, Michael F and Criswell, Lindsey A and D'Souza, Rena N and Gibbons, Gary H and Gilman, James K and Gordon, Joshua A and Green, Eric D and Gregurick, Susan and others , date-added =. The All of Us Research Program is an opportunity to enhance the diversity of US biomedical research , volume =. Natur...
-
[23]
Marwaha, Jayson S and Downing, Maren and Halamka, John and Abernethy, Amy and Franklin, Joseph B and Anderson, Brian and Kohane, Isaac and Wagholikar, Kavishwar and Brownstein, John and Haendel, Melissa and others , booktitle =. Mobilizing data during a crisis: Building rapid evidence pipelines using multi-institutional real world data , volume =
-
[24]
Gan, Ziming and Zhou, Doudou and Rush, Everett and Panickan, Vidul A and Ho, Yuk-Lam and Ostrouchovm, George and Xu, Zhiwei and Shen, Shuting and Xiong, Xin and Greco, Kimberly F and others , journal =. Arch: Large-scale knowledge graph via aggregated narrative codified health records analysis , volume =
-
[25]
Code2vec: Embedding and clustering medical diagnosis data , year =
Kartchner, David and Christensen, Tanner and Humpherys, Jeffrey and Wade, Sean , booktitle =. Code2vec: Embedding and clustering medical diagnosis data , year =
-
[26]
Multi-layer representation learning for medical concepts , year =
Choi, Edward and Bahadori, Mohammad Taha and Searles, Elizabeth and Coffey, Catherine and Thompson, Michael and Bost, James and Tejedor-Sojo, Javier and Sun, Jimeng , booktitle =. Multi-layer representation learning for medical concepts , year =
-
[27]
McInnes, Bridget T and Pedersen, Ted and Carlis, John , booktitle =
-
[28]
A latent variable model approach to PMI-based word embeddings , year =
Arora, Sanjeev and Li, Yuanzhi and Liang, Yingyu and Ma, Tengyu and Risteski, Andrej , journal =. A latent variable model approach to PMI-based word embeddings , year =
-
[29]
Graph alignment with noisy supervision , year =
Pei, Shichao and Yu, Lu and Yu, Guoxian and Zhang, Xiangliang , booktitle =. Graph alignment with noisy supervision , year =
-
[30]
Exact Recovery of Two-Latent Variable Stochastic Block Model with Side Information , year =
Shahiri, Mohammad and Eskandari, Mahdi , booktitle =. Exact Recovery of Two-Latent Variable Stochastic Block Model with Side Information , year =
-
[31]
Von mises-fisher clustering models , year =
Gopal, Siddharth and Yang, Yiming , booktitle =. Von mises-fisher clustering models , year =
-
[32]
Sparse mixture of von Mises-Fisher distribution
Barbaro, Florian and Rossi, Fabrice , booktitle =. Sparse mixture of von Mises-Fisher distribution. , year =
-
[33]
Zhou, Doudou and Gan, Ziming and Shi, Xu and Patwari, Alina and Rush, Everett and Bonzel, Clara-Lea and Panickan, Vidul A and Hong, Chuan and Ho, Yuk-Lam and Cai, Tianrun and others , journal =. Multiview incomplete knowledge graph integration with application to cross-institutional ehr data harmonization , volume =
-
[34]
Shi, Xu and Li, Xiaoou and Cai, Tianxi , journal =. Spherical regression under mismatch corruption with application to automated knowledge translation , volume =
-
[35]
Multi-source learning via completion of block-wise overlapping noisy matrices , volume =
Zhou, Doudou and Cai, Tianxi and Lu, Junwei , journal =. Multi-source learning via completion of block-wise overlapping noisy matrices , volume =
-
[36]
Hong, Chuan and Rush, Everett and Liu, Molei and Zhou, Doudou and Sun, Jiehuan and Sonabend, Aaron and Castro, Victor M and Schubert, Petra and Panickan, Vidul A and Cai, Tianrun and others , journal =. Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data , volume =
-
[37]
Maximum likelihood from incomplete data via the EM algorithm , volume =
Dempster, Arthur P and Laird, Nan M and Rubin, Donald B , journal =. Maximum likelihood from incomplete data via the EM algorithm , volume =
-
[38]
The EM algorithm and extensions , year =
McLachlan, Geoffrey J and Krishnan, Thriyambakam , publisher =. The EM algorithm and extensions , year =
-
[39]
Self-alignment pretraining for biomedical entity representations , year =
Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel , journal =. Self-alignment pretraining for biomedical entity representations , year =
-
[40]
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization , volume =
Yuan, Zheng and Zhao, Zhengyun and Sun, Haixia and Li, Jiao and Wang, Fei and Yu, Sheng , journal =. CODER: Knowledge-infused cross-lingual medical term embedding for term normalization , volume =
-
[41]
Unsupervised hyperalignment for multilingual word embeddings , year =
Alaux, Jean and Grave, Edouard and Cuturi, Marco and Joulin, Armand , journal =. Unsupervised hyperalignment for multilingual word embeddings , year =
-
[42]
Minimax rates in permutation estimation for feature matching , volume =
Collier, Olivier and Dalalyan, Arnak S , journal =. Minimax rates in permutation estimation for feature matching , volume =
-
[43]
Correlation alignment for unsupervised domain adaptation , year =
Sun, Baochen and Feng, Jiashi and Saenko, Kate , journal =. Correlation alignment for unsupervised domain adaptation , year =
-
[44]
Unsupervised alignment of embeddings with wasserstein procrustes , year =
Grave, Edouard and Joulin, Armand and Berthet, Quentin , booktitle =. Unsupervised alignment of embeddings with wasserstein procrustes , year =
-
[45]
Covariance alignment: from maximum likelihood estimation to Gromov-Wasserstein , year =
Han, Yanjun and Rigollet, Philippe and Stepaniants, George , journal =. Covariance alignment: from maximum likelihood estimation to Gromov-Wasserstein , year =
-
[46]
Correlated topic models , volume =
Blei, David and Lafferty, John , journal =. Correlated topic models , volume =
-
[47]
Strong recovery of geometric planted matchings , year =
Kunisky, Dmitriy and Niles-Weed, Jonathan , booktitle =. Strong recovery of geometric planted matchings , year =
-
[48]
Loh, Po-Ling and Wainwright, Martin J , journal =. Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , volume =
-
[49]
The multivariate Poisson-log normal distribution , volume =
Aitchison, John and Ho, CH , journal =. The multivariate Poisson-log normal distribution , volume =
-
[50]
Variational inference for probabilistic Poisson PCA , volume =
Chiquet, Julien and Mariadassou, Mahendra and Robin, St. Variational inference for probabilistic Poisson PCA , volume =. The Annals of Applied Statistics , number =
-
[51]
Variational inference for sparse network reconstruction from count data , year =
Chiquet, Julien and Robin, Stephane and Mariadassou, Mahendra , booktitle =. Variational inference for sparse network reconstruction from count data , year =
-
[52]
Guillaume Braun and Hemant Tyagi and Christophe Biernacki , booktitle =. An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees , url =. 2022 , Bdsk-Url-1 =
work page 2022
-
[53]
Lock, Eric F and Hoadley, Katherine A and Marron, James Stephen and Nobel, Andrew B , journal =. Joint and individual variation explained (JIVE) for integrated analysis of multiple data types , volume =
-
[54]
Angle-based joint and individual variation explained , volume =
Feng, Qing and Jiang, Meilei and Hannig, Jan and Marron, JS , journal =. Angle-based joint and individual variation explained , volume =
-
[55]
Group component analysis for multiblock data: Common and individual feature extraction , volume =
Zhou, Guoxu and Cichocki, Andrzej and Zhang, Yu and Mandic, Danilo P , journal =. Group component analysis for multiblock data: Common and individual feature extraction , volume =
-
[56]
Yang, Zi and Michailidis, George , journal =. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data , volume =
-
[57]
Structural learning and integrative decomposition of multi-view data , volume =
Gaynanova, Irina and Li, Gen , journal =. Structural learning and integrative decomposition of multi-view data , volume =
-
[58]
Integrative factorization of bidimensionally linked matrices , volume =
Park, Jun Young and Lock, Eric F , journal =. Integrative factorization of bidimensionally linked matrices , volume =
-
[59]
Bidimensional linked matrix factorization for pan-omics pan-cancer analysis , volume =
Lock, Eric F and Park, Jun Young and Hoadley, Katherine A , journal =. Bidimensional linked matrix factorization for pan-omics pan-cancer analysis , volume =
-
[60]
Hierarchical nuclear norm penalization for multi-view data integration , volume =
Yi, Sangyoon and Wong, Raymond Ka Wai and Gaynanova, Irina , journal =. Hierarchical nuclear norm penalization for multi-view data integration , volume =
-
[61]
Network-adjusted covariates for community detection , volume =
Hu, Yaofang and Wang, Wanjie , journal =. Network-adjusted covariates for community detection , volume =
-
[62]
International statistical classification of diseases and related health problems
Br. International statistical classification of diseases and related health problems. World Health Statistics Quarterly. Rapport Trimestriel de Statistiques Sanitaires Mondiales , number =
-
[63]
LOINC, a universal standard for identifying laboratory observations: a 5-year update , volume =
McDonald, Clement J and Huff, Stanley M and Suico, Jeffrey G and Hill, Gilbert and Leavelle, Dennis and Aller, Raymond and Forrey, Arden and Mercer, Kathy and DeMoor, Georges and Hook, John and others , journal =. LOINC, a universal standard for identifying laboratory observations: a 5-year update , volume =
-
[64]
RxNorm: prescription for electronic drug information exchange , volume =
Liu, Simon and Ma, Wei and Moore, Robin and Ganesan, Vikraman and Nelson, Stuart , journal =. RxNorm: prescription for electronic drug information exchange , volume =
-
[65]
Chen, Jianlv and Xiao, Shitao and Zhang, Peitian and Luo, Kun and Lian, Defu and Liu, Zheng , journal =. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation , volume =
-
[66]
Spectral Clustering with Likelihood Refinement is Optimal for Latent Class Recovery , year =
Lyu, Zhongyuan and Gu, Yuqi , journal =. Spectral Clustering with Likelihood Refinement is Optimal for Latent Class Recovery , year =
-
[67]
Model-based clustering of categorical data based on the Hamming distance , volume =
Argiento, Raffaele and Filippi-Mazzola, Edoardo and Paci, Lucia , journal =. Model-based clustering of categorical data based on the Hamming distance , volume =
-
[68]
Exploratory latent structure analysis using both identifiable and unidentifiable models , volume =
Goodman, Leo A , journal =. Exploratory latent structure analysis using both identifiable and unidentifiable models , volume =
-
[69]
Latent class models for categorical data , year =
Celeux, Gilles and Govaert, G. Latent class models for categorical data , year =. The Handbook of Cluster Analysis , pages =
-
[70]
Robust clustering with subpopulation-specific deviations , year =
Stephenson, Briana JK and Herring, Amy H and Olshan, Andrew , journal =. Robust clustering with subpopulation-specific deviations , year =
-
[71]
Optimal aggregation of classifiers in statistical learning , volume =
Tsybakov, Alexander B , journal =. Optimal aggregation of classifiers in statistical learning , volume =
-
[72]
Functional classification with margin conditions , year =
Fromont, Magalie and Tuleau, Christine , booktitle =. Functional classification with margin conditions , year =
-
[73]
A theory for record linkage , volume =
Fellegi, Ivan P and Sunter, Alan B , journal =. A theory for record linkage , volume =
-
[74]
Bayesian estimation of bipartite matchings for record linkage , volume =
Sadinle, Mauricio , journal =. Bayesian estimation of bipartite matchings for record linkage , volume =
-
[75]
Constrained k-means clustering with background knowledge , volume =
Wagstaff, Kiri and Cardie, Claire and Rogers, Seth and Schr. Constrained k-means clustering with background knowledge , volume =. Icml , pages =
-
[76]
Constrained clustering: Advances in algorithms, theory, and applications , year =
Basu, Sugato and Davidson, Ian and Wagstaff, Kiri , publisher =. Constrained clustering: Advances in algorithms, theory, and applications , year =
-
[77]
Anatomical therapeutic chemical classification system (ATC) , year =
Nahler, Gerhard , booktitle =. Anatomical therapeutic chemical classification system (ATC) , year =
-
[78]
Zhou, Doudou and Tong, Han and Wang, Linshanshan and others , journal=. Representation learning to advance multi-institutional studies with electronic health record data from. 2026 , publisher=
work page 2026
-
[79]
Stochastic blockmodels: First steps , volume =
Holland, Paul W and Laskey, Kathryn Blackmond and Leinhardt, Samuel , journal =. Stochastic blockmodels: First steps , volume =
-
[80]
Consistency of spectral clustering in stochastic block models , year =
Lei, Jing and Rinaldo, Alessandro , journal =. Consistency of spectral clustering in stochastic block models , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.