Variational inference and density estimation with non-negative tensor of hierarchical tucker format

Haoxuan Chen; Lexing Ying; Xun Tang

arxiv: 2606.23949 · v1 · pith:W4KHRDZEnew · submitted 2026-06-22 · 🧮 math.NA · cs.NA

Variational inference and density estimation with non-negative tensor of hierarchical tucker format

Xun Tang , Haoxuan Chen , Lexing Ying This is my paper

Pith reviewed 2026-06-26 07:17 UTC · model grok-4.3

classification 🧮 math.NA cs.NA

keywords hierarchical Tucker formatnon-negative tensorsprobability tensorstensor compressionhigh-dimensional datavariational inferencedensity estimation

0 comments

The pith

A two-stage procedure compresses high-dimensional probability tensors into non-negative hierarchical Tucker format with O(d) complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to compress high-dimensional discrete probability functions into non-negative hierarchical Tucker tensors. It proceeds in two stages: an interpolation-based compression to hierarchical Tucker format, followed by fitting to a non-negative version using a second-order method. Both stages have computational complexity linear in the tensor order d. This linear scaling makes the method applicable to high-dimensional problems in density estimation and variational inference, as demonstrated by numerical experiments on various tensors.

Core claim

The authors claim that a two-stage procedure consisting of interpolation to a hierarchical Tucker tensor followed by non-negative fitting allows compression of order-d probability tensors with O(d) complexity, enabling extension to high-dimensional settings, with success shown in numerical experiments.

What carries the argument

The two-stage procedure: interpolation method for hierarchical Tucker compression similar to CUR decomposition, followed by second-order fitting to non-negative hierarchical Tucker ansatz.

If this is right

The methodology extends into high-dimensional settings because both stages have O(d) complexity.
Numerical experiments demonstrate success in compressing various high-dimensional probability tensors.
The compressed form supports applications in variational inference and density estimation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the fitting stage maintains accuracy, the method could reduce memory requirements for storing high-dimensional distributions by orders of magnitude.
Connections to other tensor formats might allow hybrid representations for different data types.
Testing on tensors from specific applications like image processing could reveal practical performance gains.

Load-bearing premise

The non-negative fitting step does not substantially degrade the quality of the interpolation-based approximation and the result remains a faithful representation of the original distribution.

What would settle it

Observing whether the reconstruction error for a test probability tensor remains bounded as the dimension d increases, or whether the runtime scales linearly with d.

Figures

Figures reproduced from arXiv: 2606.23949 by Haoxuan Chen, Lexing Ying, Xun Tang.

**Figure 1.** Figure 1: Hierarchical Tucker model for d = 8 under a complete binary tree structure. The top figure illustrates the tensor network structure, and the bottom figure is the corresponding dimension tree, cf. [12]. Non-negativity in low-rank representation. We illustrate the idea of a non-negative tensor network in the simplest d = 2 case, where the proposal would reduce to non-negative matrix factorization (NMF) [19].… view at source ↗

**Figure 2.** Figure 2: NHT fitting in the variational inference case. Each panel plots the relative squared [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: NHT fitting in the density estimation case. As in Figure 2, every curve, including the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Ablation study that compares the default NHT ansatz implementation against other [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: A tree tensor network on seven nodes. The external nodes carry the physical legs [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

read the original abstract

In this work, we present an efficient method to compress a high-dimensional discrete probability function, i.e., a probability tensor, into a non-negative hierarchical Tucker format. The methodology is a two-stage procedure. In the first stage, we take an existing interpolation method to compress the target tensor into a hierarchical Tucker (HT) in a manner similar to the CUR decomposition for low-rank matrix reconstruction. In the second stage, we fit the first-stage output against a non-negative hierarchical Tucker ansatz using a second-order method tailored specifically for this setting. When the tensor is of order \(d\), both stages admit an \(\mathcal{O}(d)\) computational complexity, and therefore the proposed methodology readily extends into high-dimensional settings. Numerical experiments show success in compressing various high-dimensional probability tensors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The two-stage non-negative HT method claims linear cost in d but gives no analysis showing the fitting step preserves approximation quality from the first stage.

read the letter

The main takeaway is a two-stage procedure that first compresses a probability tensor via interpolation-style HT (CUR-like) and then fits the result to a non-negative HT ansatz with a second-order optimizer. Both stages are said to run in O(d) time.

What is new is the specific pairing of those two steps for probability tensors, where non-negativity matters. The paper correctly identifies that standard HT decompositions can produce negatives, so the second stage addresses a real practical issue. The linear scaling claim, if it holds, would be useful for high-dimensional discrete distributions.

The soft spot is the absence of any error control between stages. The abstract states that numerical experiments succeed but supplies no metrics, test tensors, baselines, or convergence details. More critically, there is no bound or even heuristic showing that the non-negative fit does not inflate the truncation error across the hierarchy. If the second stage requires rank increases or shifts the cores substantially, the final tensor can deviate from the target even while per-stage cost stays linear in d. That gap is load-bearing for the high-dimensional claim.

The work is aimed at researchers already using tensor formats for probabilistic modeling. A reader in that niche might pick up the two-stage idea, but the current version is too light on validation to stand on its own. It does not yet merit sending out for serious refereeing; the missing error analysis needs to be supplied first.

Referee Report

2 major / 0 minor

Summary. The manuscript presents a two-stage procedure for compressing a high-dimensional discrete probability tensor into non-negative hierarchical Tucker (HT) format. Stage 1 applies an interpolation-based compression similar to CUR decomposition to obtain an HT representation. Stage 2 fits the result to a non-negative HT ansatz via a tailored second-order method. Both stages are stated to have O(d) complexity for an order-d tensor, enabling high-dimensional extension, with numerical experiments claimed to demonstrate success in compressing various probability tensors.

Significance. If the non-negative fitting step preserves approximation quality without substantially increasing error relative to the first-stage interpolant, the approach could offer a scalable route to high-dimensional density estimation and variational inference. The explicit O(d) complexity claim, if rigorously supported, would be a notable strength for extending beyond low-order tensors.

major comments (2)

[Abstract] Abstract: the central claim that 'the resulting non-negative hierarchical Tucker tensor remains a faithful representation of the original probability distribution' lacks any supporting error analysis, bounds, or convergence guarantees showing that the second-stage non-negative fitting does not amplify the hierarchical truncation error from stage 1 across the d levels of the Tucker tree.
[Abstract] Abstract: the assertion that 'numerical experiments show success' is unsupported by any reported error metrics, test cases, baselines, initialization strategies, or validation procedures, leaving the empirical evidence for the O(d) high-dimensional extension unassessable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We respond to each major comment below and indicate the revisions we will make to the abstract.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the resulting non-negative hierarchical Tucker tensor remains a faithful representation of the original probability distribution' lacks any supporting error analysis, bounds, or convergence guarantees showing that the second-stage non-negative fitting does not amplify the hierarchical truncation error from stage 1 across the d levels of the Tucker tree.

Authors: We agree that the manuscript provides no theoretical error analysis, bounds, or convergence guarantees for the non-negative fitting step relative to the stage-1 interpolant. The paper's emphasis is on algorithmic complexity rather than approximation theory. We will revise the abstract to qualify the claim, stating that the procedure produces a non-negative approximation to the first-stage result whose quality is assessed empirically. revision: yes
Referee: [Abstract] Abstract: the assertion that 'numerical experiments show success' is unsupported by any reported error metrics, test cases, baselines, initialization strategies, or validation procedures, leaving the empirical evidence for the O(d) high-dimensional extension unassessable.

Authors: The full manuscript contains numerical experiments on high-dimensional probability tensors, but the abstract does not report specific metrics or procedures. We will revise the abstract to include brief references to the error metrics, test cases, and validation approach used in the experiments section, thereby making the empirical support more assessable. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an algorithmic two-stage procedure (CUR-style interpolation followed by second-order non-negative fitting) whose O(d) complexity is obtained by direct operation counting on the hierarchical Tucker tree traversals and local solves; this counting does not rely on any fitted quantity being renamed as a prediction, nor on any self-citation chain that supplies the central claim. The method is self-contained once the standard HT and CUR primitives are granted, and the manuscript validates performance via external numerical experiments rather than internal self-definition or load-bearing self-citations. No quoted equation or step reduces the claimed result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit details on free parameters, axioms, or invented entities beyond reliance on existing interpolation and optimization methods.

pith-pipeline@v0.9.1-grok · 5661 in / 964 out tokens · 27849 ms · 2026-06-26T07:17:13.118531+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 4 linked inside Pith

[1]

Albergo, N

M. Albergo, N. M. Boffi, and E. V anden-Eijnden , Stochastic interpolants: A unifying framework for flows and diffusions , Journal of Machine Learning Research, 26 (2025), pp. 1–80

2025
[2]

M. S. Albergo, M. Goldstein, N. M. Boffi, R. Ranganath, and E. V anden-Eijnden , Stochastic inter- polants with data-dependent couplings , arXiv preprint arXiv:2310.03725, (2023)

arXiv 2023
[3]

Ballani, L

J. Ballani, L. Grasedyck, and M. Kluge , Black box approximation of tensors in hierarchical tucker format, Linear algebra and its applications, 438 (2013), pp. 639–657

2013
[4]

Barndorff-Nielsen , Information and exponential families: in statistical theory , John Wiley & Sons, 2014

O. Barndorff-Nielsen , Information and exponential families: in statistical theory , John Wiley & Sons, 2014

2014
[5]

Biamonte and V

J. Biamonte and V. Bergholm , Tensor networks in a nutshell , arXiv preprint arXiv:1708.00006, (2017)

Pith/arXiv arXiv 2017
[6]

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe , Variational inference: A review for statisticians, Journal of the American statistical Association, 112 (2017), pp. 859–877

2017
[7]

L. D. Brown, Fundamentals of statistical exponential families: with applications in statistical decision theory, Ims, 1986

1986
[8]

W. E, W. Ren, and E. V anden-Eijnden , Minimum action method for the study of rare events , Communi- cations on pure and applied mathematics, 57 (2004), pp. 637–656

2004
[9]

Eisert, M

J. Eisert, M. Cramer, and M. B. Plenio , Area laws for the entanglement entropy-a review , arXiv preprint arXiv:0808.3773, (2008)

Pith/arXiv arXiv 2008
[10]

F annes, B

M. F annes, B. Nachtergaele, and R. F. Werner , Finitely correlated states on quantum spin chains , Communications in mathematical physics, 144 (1992), pp. 443–490

1992
[11]

V. L. Ginzburg, V. L. Ginzburg, and L. Landau , On the theory of superconductivity , Springer, 2009

2009
[12]

Hackbusch and S

W. Hackbusch and S. K ¨uhn, A new scheme for the tensor representation , Journal of Fourier analysis and applications, 15 (2009), pp. 706–722

2009
[13]

G. E. Hinton , Training products of experts by minimizing contrastive divergence , Neural computation, 14 (2002), pp. 1771–1800

2002
[14]

Hoffmann and Q

K.-H. Hoffmann and Q. Tang , Ginzburg-Landau phase transition theory and superconductivity , vol. 134, Birkh¨ auser, 2012. DENSITY COMPRESSION WITH NON-NEGATIVE HIERARCHICAL TUCKER 19

2012
[15]

P. C. Hohenberg and A. P. Krekhov , An introduction to the ginzburg–landau theory of phase transitions and nonequilibrium patterns, Physics Reports, 572 (2015), pp. 1–42

2015
[16]

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul , An introduction to variational methods for graphical models, Machine learning, 37 (1999), pp. 183–233

1999
[17]

LeCun, S

Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang , A tutorial on energy-based learning , Predicting structured data, 1 (2006)

2006
[18]

Lee and H

D. Lee and H. S. Seung , Algorithms for non-negative matrix factorization , Advances in neural information processing systems, 13 (2000)

2000
[19]

D. D. Lee and H. S. Seung , Learning the parts of objects by non-negative matrix factorization , nature, 401 (1999), pp. 788–791

1999
[20]

Lipman, R

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le , Flow matching for generative modeling , arXiv preprint arXiv:2210.02747, (2022)

Pith/arXiv arXiv 2022
[21]

Liu , Monte Carlo strategies in scientific computing , vol

J. Liu , Monte Carlo strategies in scientific computing , vol. 75, Springer, 2001

2001
[22]

A. Lou, C. Meng, and S. Ermon , Discrete diffusion modeling by estimating the ratios of the data distribution, arXiv preprint arXiv:2310.16834, (2023)

Pith/arXiv arXiv 2023
[23]

Nocedal, A

J. Nocedal, A. W ¨achter, and R. A. W altz , Adaptive barrier update strategies for nonlinear interior methods, SIAM Journal on Optimization, 19 (2009), pp. 1674–1693

2009
[24]

Nocedal and S

J. Nocedal and S. J. Wright , Numerical optimization, Springer, 1999

1999
[25]

Oseledets and E

I. Oseledets and E. Tyrtyshnikov , Tt-cross approximation for multidimensional arrays , Linear Algebra and its Applications, 432 (2010), pp. 70–88

2010
[26]

I. V. Oseledets , Tensor-train decomposition, SIAM Journal on Scientific Computing, 33 (2011), pp. 2295– 2317

2011
[27]

¨Ostlund and S

S. ¨Ostlund and S. Rommer , Thermodynamic limit of density matrix renormalization, Physical review letters, 75 (1995), p. 3537

1995
[28]

Y. Peng, Y. Chen, E. M. Stoudenmire, and Y. Khoo , Generative modeling via hierarchical tensor sketch- ing, arXiv preprint arXiv:2304.05305, (2023)

arXiv 2023
[29]

Rezende and S

D. Rezende and S. Mohamed , Variational inference with normalizing flows , in International conference on machine learning, PMLR, 2015, pp. 1530–1538

2015
[30]

Ryzhakov, A

G. Ryzhakov, A. Chertkov, A. Basharin, and I. Oseledets , Black-box approximation and optimization with hierarchical tucker decomposition, arXiv preprint arXiv:2402.02890, (2024)

arXiv 2024
[31]

Shcherbakova, Nonnegative tensor train factorization with dmrg technique, Lobachevskii Journal of Math- ematics, 40 (2019), pp

E. Shcherbakova, Nonnegative tensor train factorization with dmrg technique, Lobachevskii Journal of Math- ematics, 40 (2019), pp. 1863–1872

2019
[32]

B. W. Silverman , Density estimation for statistics and data analysis , Routledge, 2018

2018
[33]

Y. Song, C. Durkan, I. Murray, and S. Ermon , Maximum likelihood training of score-based diffusion models, Advances in Neural Information Processing Systems, 34 (2021), pp. 1415–1428

2021
[34]

Song and S

Y. Song and S. Ermon , Generative modeling by estimating gradients of the data distribution , Advances in Neural Information Processing Systems, 32 (2019)

2019
[35]

E. G. Tabak and E. V anden-Eijnden , Density estimation by dual ascent of the log-likelihood , Communica- tions in Mathematical Sciences, 8 (2010), pp. 217–233

2010
[36]

X. Tang, R. Dwaraknath, and L. Ying , Variational inference and density estimation with non-negative tensor train, arXiv preprint arXiv:2507.21519, (2025)

arXiv 2025
[37]

X. Tang, Y. Hur, Y. Khoo, and L. Ying , Generative modeling via tree tensor network states , Research in the Mathematical Sciences, 10 (2023), p. 19

2023
[38]

Tang and L

X. Tang and L. Ying , Solving high-dimensional fokker-planck equation with functional hierarchical tensor , Journal of Computational Physics, 511 (2024), p. 113110

2024
[39]

Tang and L

X. Tang and L. Ying , Wavelet-based density sketching with functional hierarchical tensor , arXiv preprint arXiv:2502.20655, (2025)

arXiv 2025
[40]

Vidal , Efficient classical simulation of slightly entangled quantum computations , Physical review letters, 91 (2003), p

G. Vidal , Efficient classical simulation of slightly entangled quantum computations , Physical review letters, 91 (2003), p. 147902

2003
[41]

M. J. W ainwright and M. I. Jordan , Graphical models, exponential families, and variational inference , Foundations and Trends® in Machine Learning, 1 (2008), pp. 1–305

2008
[42]

S. R. White , Density matrix formulation for quantum renormalization groups , Physical review letters, 69 (1992), p. 2863

1992

[1] [1]

Albergo, N

M. Albergo, N. M. Boffi, and E. V anden-Eijnden , Stochastic interpolants: A unifying framework for flows and diffusions , Journal of Machine Learning Research, 26 (2025), pp. 1–80

2025

[2] [2]

M. S. Albergo, M. Goldstein, N. M. Boffi, R. Ranganath, and E. V anden-Eijnden , Stochastic inter- polants with data-dependent couplings , arXiv preprint arXiv:2310.03725, (2023)

arXiv 2023

[3] [3]

Ballani, L

J. Ballani, L. Grasedyck, and M. Kluge , Black box approximation of tensors in hierarchical tucker format, Linear algebra and its applications, 438 (2013), pp. 639–657

2013

[4] [4]

Barndorff-Nielsen , Information and exponential families: in statistical theory , John Wiley & Sons, 2014

O. Barndorff-Nielsen , Information and exponential families: in statistical theory , John Wiley & Sons, 2014

2014

[5] [5]

Biamonte and V

J. Biamonte and V. Bergholm , Tensor networks in a nutshell , arXiv preprint arXiv:1708.00006, (2017)

Pith/arXiv arXiv 2017

[6] [6]

D. M. Blei, A. Kucukelbir, and J. D. McAuliffe , Variational inference: A review for statisticians, Journal of the American statistical Association, 112 (2017), pp. 859–877

2017

[7] [7]

L. D. Brown, Fundamentals of statistical exponential families: with applications in statistical decision theory, Ims, 1986

1986

[8] [8]

W. E, W. Ren, and E. V anden-Eijnden , Minimum action method for the study of rare events , Communi- cations on pure and applied mathematics, 57 (2004), pp. 637–656

2004

[9] [9]

Eisert, M

J. Eisert, M. Cramer, and M. B. Plenio , Area laws for the entanglement entropy-a review , arXiv preprint arXiv:0808.3773, (2008)

Pith/arXiv arXiv 2008

[10] [10]

F annes, B

M. F annes, B. Nachtergaele, and R. F. Werner , Finitely correlated states on quantum spin chains , Communications in mathematical physics, 144 (1992), pp. 443–490

1992

[11] [11]

V. L. Ginzburg, V. L. Ginzburg, and L. Landau , On the theory of superconductivity , Springer, 2009

2009

[12] [12]

Hackbusch and S

W. Hackbusch and S. K ¨uhn, A new scheme for the tensor representation , Journal of Fourier analysis and applications, 15 (2009), pp. 706–722

2009

[13] [13]

G. E. Hinton , Training products of experts by minimizing contrastive divergence , Neural computation, 14 (2002), pp. 1771–1800

2002

[14] [14]

Hoffmann and Q

K.-H. Hoffmann and Q. Tang , Ginzburg-Landau phase transition theory and superconductivity , vol. 134, Birkh¨ auser, 2012. DENSITY COMPRESSION WITH NON-NEGATIVE HIERARCHICAL TUCKER 19

2012

[15] [15]

P. C. Hohenberg and A. P. Krekhov , An introduction to the ginzburg–landau theory of phase transitions and nonequilibrium patterns, Physics Reports, 572 (2015), pp. 1–42

2015

[16] [16]

M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul , An introduction to variational methods for graphical models, Machine learning, 37 (1999), pp. 183–233

1999

[17] [17]

LeCun, S

Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang , A tutorial on energy-based learning , Predicting structured data, 1 (2006)

2006

[18] [18]

Lee and H

D. Lee and H. S. Seung , Algorithms for non-negative matrix factorization , Advances in neural information processing systems, 13 (2000)

2000

[19] [19]

D. D. Lee and H. S. Seung , Learning the parts of objects by non-negative matrix factorization , nature, 401 (1999), pp. 788–791

1999

[20] [20]

Lipman, R

Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le , Flow matching for generative modeling , arXiv preprint arXiv:2210.02747, (2022)

Pith/arXiv arXiv 2022

[21] [21]

Liu , Monte Carlo strategies in scientific computing , vol

J. Liu , Monte Carlo strategies in scientific computing , vol. 75, Springer, 2001

2001

[22] [22]

A. Lou, C. Meng, and S. Ermon , Discrete diffusion modeling by estimating the ratios of the data distribution, arXiv preprint arXiv:2310.16834, (2023)

Pith/arXiv arXiv 2023

[23] [23]

Nocedal, A

J. Nocedal, A. W ¨achter, and R. A. W altz , Adaptive barrier update strategies for nonlinear interior methods, SIAM Journal on Optimization, 19 (2009), pp. 1674–1693

2009

[24] [24]

Nocedal and S

J. Nocedal and S. J. Wright , Numerical optimization, Springer, 1999

1999

[25] [25]

Oseledets and E

I. Oseledets and E. Tyrtyshnikov , Tt-cross approximation for multidimensional arrays , Linear Algebra and its Applications, 432 (2010), pp. 70–88

2010

[26] [26]

I. V. Oseledets , Tensor-train decomposition, SIAM Journal on Scientific Computing, 33 (2011), pp. 2295– 2317

2011

[27] [27]

¨Ostlund and S

S. ¨Ostlund and S. Rommer , Thermodynamic limit of density matrix renormalization, Physical review letters, 75 (1995), p. 3537

1995

[28] [28]

Y. Peng, Y. Chen, E. M. Stoudenmire, and Y. Khoo , Generative modeling via hierarchical tensor sketch- ing, arXiv preprint arXiv:2304.05305, (2023)

arXiv 2023

[29] [29]

Rezende and S

D. Rezende and S. Mohamed , Variational inference with normalizing flows , in International conference on machine learning, PMLR, 2015, pp. 1530–1538

2015

[30] [30]

Ryzhakov, A

G. Ryzhakov, A. Chertkov, A. Basharin, and I. Oseledets , Black-box approximation and optimization with hierarchical tucker decomposition, arXiv preprint arXiv:2402.02890, (2024)

arXiv 2024

[31] [31]

Shcherbakova, Nonnegative tensor train factorization with dmrg technique, Lobachevskii Journal of Math- ematics, 40 (2019), pp

E. Shcherbakova, Nonnegative tensor train factorization with dmrg technique, Lobachevskii Journal of Math- ematics, 40 (2019), pp. 1863–1872

2019

[32] [32]

B. W. Silverman , Density estimation for statistics and data analysis , Routledge, 2018

2018

[33] [33]

Y. Song, C. Durkan, I. Murray, and S. Ermon , Maximum likelihood training of score-based diffusion models, Advances in Neural Information Processing Systems, 34 (2021), pp. 1415–1428

2021

[34] [34]

Song and S

Y. Song and S. Ermon , Generative modeling by estimating gradients of the data distribution , Advances in Neural Information Processing Systems, 32 (2019)

2019

[35] [35]

E. G. Tabak and E. V anden-Eijnden , Density estimation by dual ascent of the log-likelihood , Communica- tions in Mathematical Sciences, 8 (2010), pp. 217–233

2010

[36] [36]

X. Tang, R. Dwaraknath, and L. Ying , Variational inference and density estimation with non-negative tensor train, arXiv preprint arXiv:2507.21519, (2025)

arXiv 2025

[37] [37]

X. Tang, Y. Hur, Y. Khoo, and L. Ying , Generative modeling via tree tensor network states , Research in the Mathematical Sciences, 10 (2023), p. 19

2023

[38] [38]

Tang and L

X. Tang and L. Ying , Solving high-dimensional fokker-planck equation with functional hierarchical tensor , Journal of Computational Physics, 511 (2024), p. 113110

2024

[39] [39]

Tang and L

X. Tang and L. Ying , Wavelet-based density sketching with functional hierarchical tensor , arXiv preprint arXiv:2502.20655, (2025)

arXiv 2025

[40] [40]

Vidal , Efficient classical simulation of slightly entangled quantum computations , Physical review letters, 91 (2003), p

G. Vidal , Efficient classical simulation of slightly entangled quantum computations , Physical review letters, 91 (2003), p. 147902

2003

[41] [41]

M. J. W ainwright and M. I. Jordan , Graphical models, exponential families, and variational inference , Foundations and Trends® in Machine Learning, 1 (2008), pp. 1–305

2008

[42] [42]

S. R. White , Density matrix formulation for quantum renormalization groups , Physical review letters, 69 (1992), p. 2863

1992