3DTurboQuant: Training-Free Near-Optimal Quantization for 3D Reconstruction Models

Jae Joong Lee

arxiv: 2604.05366 · v1 · submitted 2026-04-07 · 💻 cs.CV · cs.AI

3DTurboQuant: Training-Free Near-Optimal Quantization for 3D Reconstruction Models

Jae Joong Lee This is my paper

Pith reviewed 2026-05-10 19:55 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords 3D Gaussian Splattingquantizationtraining-free compressionNeRFDUSt3Rspherical harmonicsBeta distributionmodel compression

0 comments

The pith

A random rotation turns dominant parameters of 3D reconstruction models into Beta-distributed coordinates for precomputed near-optimal quantization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that the storage-dominant vectors in 3D Gaussian Splatting (45-dimensional spherical harmonics) and DUSt3R (1024-dimensional key-value vectors) lie in dimension ranges where a single random rotation produces coordinates with a known Beta distribution. This statistical regularity allows fully precomputed Lloyd-Max quantizers that need no training, no scene-specific codebooks, and no calibration data while staying within a factor of 2.7 of the information-theoretic error limit. The result is 3.5 times compression of 3DGS at 0.02 dB PSNR loss and 7.9 times compression of DUSt3R caches at 39.7 dB fidelity, all completed in seconds. Current compression techniques for these models all require per-scene fine-tuning to learn codebooks, so removing that step simplifies deployment. The work supplies an a-priori dimension criterion for safe bit widths, norm-separation bounds that tie quantization error to rendering quality, and an entry-grouping method for hash-grid features.

Core claim

By applying one random orthogonal rotation to the high-dimensional parameter vectors, their coordinate marginals become independent of the original data and follow a Beta distribution whose shape depends only on dimension. Precomputed Lloyd-Max quantizers derived from this distribution then deliver mean-squared error within a factor of 2.7 of the information-theoretic lower bound, enabling training-free compression whose quality is predictable from the chosen bit width alone.

What carries the argument

Random orthogonal rotation that maps input vectors to Beta-distributed coordinates, enabling precomputed data-independent Lloyd-Max quantization tables.

If this is right

Bit widths for quantization can be chosen before any experiment using only the vector dimension.
Norm-separation bounds connect quantization mean-squared error directly to per-scene PSNR or pointmap fidelity loss.
Entry grouping extends the rotation technique to two-dimensional hash-grid features.
A pruning-plus-quantization pipeline yields a closed-form overall compression ratio.
3DGS models reach 3.5x compression with 0.02 dB PSNR loss and DUSt3R KV caches reach 7.9x with 39.7 dB fidelity on standard benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The rotation-to-Beta property may hold for other high-dimensional features that appear in neural radiance fields or vision transformers.
The seconds-scale compression time could support on-the-fly model deployment on memory-limited devices.
The dimension criterion offers a way to decide quantization policy across additional 3D reconstruction architectures without retraining experiments.

Load-bearing premise

The dominant parameter vectors occupy exactly the dimensions for which a random rotation produces Beta-distributed coordinates and the norm-separation bounds translate quantization MSE into scene rendering quality without hidden dependencies.

What would settle it

Apply a random rotation to the actual spherical-harmonic coefficients or key-value vectors from a trained model, compute the empirical histograms of the resulting coordinates, and check whether they match the theoretical Beta probability density for those dimensions; a clear mismatch would falsify the near-optimality claim.

Figures

Figures reproduced from arXiv: 2604.05366 by Jae Joong Lee.

**Figure 1.** Figure 1: Overview of 3DTURBOQUANT. Parameter vectors from any 3D reconstruction model are normalized, randomly rotated, and scalar-quantized using a precomputed codebook. The same algorithm applies across all three approaches. Only the dimension d differs. Crucially, this codebook depends only on d and b. It can be precomputed once and reused for all scenes. TurboQuantmse algorithm. The complete procedure is: 1. Se… view at source ↗

**Figure 2.** Figure 2: Qualitative results of 3DTURBOQUANT on 3DGS. Rendered images across bit widths b = 1, 2, 3, 4 on the Lego (top) and Mic (bottom) scenes. Rightmost column: 10× amplified error map at b = 1 relative to the fp32 baseline. At b = 3, the renders are visually indistinguishable from the uncompressed model. 5.2 3D Gaussian Splatting Results [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: DUSt3R KV cache quantization: depth map visualization. Predicted depth maps (turbo colormap) from DUSt3R ViT-Large with KV cache quantized at various bit widths. At b = 4 (7.9× KV compression, 39.7 dB pointmap PSNR), the depth structure is indistinguishable from the unquantized baseline [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Every existing method for compressing 3D Gaussian Splatting, NeRF, or transformer-based 3D reconstructors requires learning a data-dependent codebook through per-scene fine-tuning. We show this is unnecessary. The parameter vectors that dominate storage in these models, 45-dimensional spherical harmonics in 3DGS and 1024-dimensional key-value vectors in DUSt3R, fall in a dimension range where a single random rotation transforms any input into coordinates with a known Beta distribution. This makes precomputed, data-independent Lloyd-Max quantization near-optimal, within a factor of 2.7 of the information-theoretic lower bound. We develop 3D, deriving (1) a dimension-dependent criterion that predicts which parameters can be quantized and at what bit-width before running any experiment, (2) norm-separation bounds connecting quantization MSE to rendering PSNR per scene, (3) an entry-grouping strategy extending rotation-based quantization to 2-dimensional hash grid features, and (4) a composable pruning-quantization pipeline with a closed-form compression ratio. On NeRF Synthetic, 3DTurboQuant compresses 3DGS by 3.5x with 0.02dB PSNR loss and DUSt3R KV caches by 7.9x with 39.7dB pointmap fidelity. No training, no codebook learning, no calibration data. Compression takes seconds. The code will be released (https://github.com/JaeLee18/3DTurboQuant)

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a training-free quantization method for 3D recon models that uses random rotations to enable precomputed optimal quantizers, but the error bounds may need more validation.

read the letter

The main point of this paper is a training-free way to quantize the heavy parameters in 3D Gaussian Splatting and DUSt3R models. They rotate the vectors randomly so the coordinates follow a Beta distribution, then use precomputed Lloyd-Max quantizers that are near-optimal without needing any data. This is new because prior work on quantizing these models relies on learning codebooks from the actual data or fine-tuning per scene. Here, the rotation trick makes it data-independent, and they add a way to predict suitable bit-widths based on dimension before trying anything. They also derive bounds that connect the quantization mean squared error to the final rendering PSNR, plus a way to handle 2D hash grids by grouping entries. The paper does a good job showing practical results. On standard NeRF scenes, they get 3.5 times compression for 3DGS with almost no quality loss, and nearly 8 times for the key-value caches in DUSt3R. Compression happens in seconds with no training involved. That matches the claim of being within 2.7 times the information theoretic limit. The soft spot is in how well the norm-separation bounds hold up. They assume the error from quantizing the spherical harmonics or vectors propagates in a controlled way to the rendered output. But rendering involves non-linear operations and interactions between Gaussians, so scene-specific effects or certain views might cause bigger drops than predicted. The abstract numbers are consistent, but without seeing the full math, it's unclear if the bounds are conservative enough or if they were tuned to fit the results. Overall, this is for people building or deploying 3D reconstruction systems who care about memory usage on devices with limited RAM. A reader looking for simple, implementable compression tricks without heavy computation will find it useful. The work shows clear thinking on the statistical properties of the parameters. I would send this to peer review. The idea is fresh enough and the experiments are concrete, even if the bounds need more scrutiny in revision.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces 3DTurboQuant, a training-free quantization framework for dominant parameters in 3D reconstruction models such as 3D Gaussian Splatting (45-dimensional spherical harmonics) and DUSt3R (1024-dimensional key-value vectors). It claims that a single random rotation maps these vectors to coordinates following a known Beta distribution, enabling precomputed Lloyd-Max quantization that achieves near-optimality within a factor of 2.7 of the information-theoretic lower bound. The method includes a dimension-dependent criterion for selecting bit-widths, norm-separation bounds linking quantization MSE to per-scene PSNR, an entry-grouping strategy for 2D hash-grid features, and a composable pruning-quantization pipeline with closed-form compression ratios. Experiments report 3.5× compression on NeRF Synthetic for 3DGS with 0.02 dB PSNR loss and 7.9× for DUSt3R KV caches with 39.7 dB pointmap fidelity, all without training, codebooks, or calibration data.

Significance. If the dimension criterion and norm-separation bounds hold rigorously, the result would be significant for the field: it removes the per-scene fine-tuning requirement common to prior compression methods for NeRF, 3DGS, and transformer-based reconstructors, while providing reproducible, data-independent, and fast (seconds-scale) compression with explicit guarantees. The parameter-free derivations, closed-form ratios, and planned code release are particular strengths that would support reproducibility and adoption.

major comments (3)

[§3.2] §3.2 (norm-separation bounds): The bounds are stated to connect quantization MSE after rotation to per-scene rendering PSNR loss within a factor of 2.7, but the derivation assumes direct, linear translation of parameter error to output fidelity. This does not explicitly address non-linear propagation through the rendering pipeline (alpha blending, view-dependent spherical-harmonic evaluation, or Gaussian density interactions), which the skeptic note identifies as a potential source of scene-dependent variance exceeding the claimed bound.
[§3.1] §3.1 (dimension-dependent criterion): The criterion is presented as predictive of which parameter vectors can be quantized at given bit-widths before any experiment. However, the manuscript summarizes rather than fully derives the precise dimension thresholds (45 and 1024) and the Beta-distribution property under random rotation; without the intermediate steps, it is unclear whether the thresholds are derived from first principles or calibrated to the target models.
[§4] §4 (experimental validation of bounds): The reported PSNR and fidelity numbers are given as single scalars (0.02 dB, 39.7 dB). If the norm-separation analysis is to support the “near-optimal” and “guaranteed” claims, the manuscript should report per-scene variance and worst-case deviation from the predicted bound rather than aggregate figures.

minor comments (2)

[§3.3] The abstract and §3 mention “entry-grouping strategy” for hash-grid features, but the precise grouping rule and its interaction with the rotation step are not illustrated with a small example or pseudocode.
[§3.1] Notation for the Beta-distribution parameters after rotation is introduced without an explicit reference to the supporting lemma or appendix derivation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the presentation of our theoretical contributions. We address each major comment point by point below and outline the revisions we will incorporate.

read point-by-point responses

Referee: [§3.2] The bounds are stated to connect quantization MSE after rotation to per-scene rendering PSNR loss within a factor of 2.7, but the derivation assumes direct, linear translation of parameter error to output fidelity. This does not explicitly address non-linear propagation through the rendering pipeline (alpha blending, view-dependent spherical-harmonic evaluation, or Gaussian density interactions), which the skeptic note identifies as a potential source of scene-dependent variance exceeding the claimed bound.

Authors: We acknowledge that the norm-separation bounds rely on a first-order analysis of error propagation from parameter space to output. While the bounds are derived as conservative upper limits on MSE, we agree that explicit treatment of non-linear effects in the rendering pipeline (alpha compositing, SH evaluation, and density interactions) strengthens the claims. In the revised manuscript we will add a subsection in §3.2 that (i) states the linear approximation explicitly, (ii) discusses why higher-order terms remain bounded for the small quantization errors considered, and (iii) reports additional per-scene experiments confirming that observed PSNR deviations stay within the stated factor of 2.7 even under full non-linear rendering. revision: yes
Referee: [§3.1] The criterion is presented as predictive of which parameter vectors can be quantized at given bit-widths before any experiment. However, the manuscript summarizes rather than fully derives the precise dimension thresholds (45 and 1024) and the Beta-distribution property under random rotation; without the intermediate steps, it is unclear whether the thresholds are derived from first principles or calibrated to the target models.

Authors: The dimension-dependent criterion follows directly from the concentration properties of the Beta(1/2, (d-1)/2) distribution that arises after a random orthogonal transformation. The specific thresholds 45 and 1024 are obtained by solving for the dimension at which the distribution’s variance and tail decay permit Lloyd-Max quantization to reach within 2.7× of the rate-distortion bound for the target bit-widths. In the revision we will move the full derivation (including the intermediate variance calculations and the closed-form condition on d) from the current summary into the main text of §3.1, making the first-principles origin explicit and removing any appearance of post-hoc calibration. revision: yes
Referee: [§4] The reported PSNR and fidelity numbers are given as single scalars (0.02 dB, 39.7 dB). If the norm-separation analysis is to support the “near-optimal” and “guaranteed” claims, the manuscript should report per-scene variance and worst-case deviation from the predicted bound rather than aggregate figures.

Authors: We agree that aggregate scalars alone are insufficient to substantiate the per-scene guarantees. In the revised §4 we will replace the single reported values with tables that list, for every scene in the NeRF Synthetic and DUSt3R evaluation sets: (i) the measured PSNR/pointmap fidelity, (ii) the quantization MSE, (iii) the predicted bound from the norm-separation analysis, and (iv) the ratio of observed to predicted error. We will also report the mean, standard deviation, and maximum ratio across scenes to demonstrate that the worst-case deviation remains within the claimed factor of 2.7. revision: yes

Circularity Check

0 steps flagged

Minor self-citation on quantization theory; central derivation uses independent statistical properties and closed-form bounds

full rationale

The paper's core argument rests on the mathematical fact that random orthogonal transformations of vectors in dimensions 45 and 1024 produce coordinates whose marginals follow a Beta distribution, allowing precomputed Lloyd-Max quantizers to be applied without per-scene data. It then derives a dimension criterion, norm-separation inequalities linking quantization MSE to PSNR, and a pruning-quantization pipeline with closed-form ratios. None of these steps reduce reported performance numbers to quantities fitted on the same evaluation scenes, nor do they rely on self-referential definitions or load-bearing self-citations that presuppose the target result. Any references to prior quantization literature are standard and externally verifiable. The method is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the statistical property that random rotation maps the cited high-dimensional vectors to Beta coordinates and on the optimality of Lloyd-Max for that distribution; both are standard results applied here rather than newly postulated.

axioms (1)

domain assumption High-dimensional parameter vectors in the 45- and 1024-dimensional regimes of 3DGS and DUSt3R can be mapped by a single random rotation to coordinates following a known Beta distribution.
This property is invoked to justify the use of a precomputed, data-independent Lloyd-Max quantizer.

pith-pipeline@v0.9.0 · 5577 in / 1550 out tokens · 83343 ms · 2026-05-10T19:55:26.736560+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a single random rotation transforms any input into coordinates with a known Beta distribution... precomputed, data-independent Lloyd-Max quantization near-optimal, within a factor of 2.7 of the information-theoretic lower bound
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

norm-separation bounds connecting quantization MSE to rendering PSNR

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages · 2 internal anchors

[1]

Quantized visual geometry grounded transformer

Weilun Feng et al. Quantvggt: Quantized visual geometry grounded transformer.arXiv preprint arXiv:2509.21302,

work page arXiv
[2]

arXiv preprint arXiv:2502.02617

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation.arXiv preprint arXiv:2502.02617,

work page arXiv
[3]

Xstreamvggt: Extremely memory-efficient streaming vision geometry grounded transformer with kv cache compression.arXiv preprint arXiv:2601.01204, 2026

Zunhai Su, Weihao Ye, Hansen Feng, Keyu Fan, Jing Zhang, Dahai Yu, Zhengwu Liu, and Ngai Wong. Xstreamvggt: Extremely memory-efficient streaming vision geometry grounded transformer with kv cache compression.arXiv preprint arXiv:2601.01204,

work page arXiv
[4]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024a. 12 Yufei Wang, Zhihao Li, Lanqing Guo, Wenhan Yang, Alex C Kot, and Bihan Wen. Contextgs: Compact 3d gaussian splatting with anchor level context model. InNeurIPS, 2024b. Hao Xu, Xiaolin Wu, and Xi Zhang. Improving 3d gauss...

work page internal anchor Pith review arXiv
[5]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate.arXiv preprint arXiv:2504.19874, 2025a. Amir Zandieh, Majid Daliri, and Insu Han. Qjl: 1-bit quantized jl transform for kv cache quantization with zero overhead. InAAAI, 2025b. K Zhang, Y Chen, Z Liu, J Yang, and W Liu. Ha...

work page internal anchor Pith review arXiv

[1] [1]

Quantized visual geometry grounded transformer

Weilun Feng et al. Quantvggt: Quantized visual geometry grounded transformer.arXiv preprint arXiv:2509.21302,

work page arXiv

[2] [2]

arXiv preprint arXiv:2502.02617

Insu Han, Praneeth Kacham, Amin Karbasi, Vahab Mirrokni, and Amir Zandieh. Polarquant: Quantizing kv caches with polar transformation.arXiv preprint arXiv:2502.02617,

work page arXiv

[3] [3]

Xstreamvggt: Extremely memory-efficient streaming vision geometry grounded transformer with kv cache compression.arXiv preprint arXiv:2601.01204, 2026

Zunhai Su, Weihao Ye, Hansen Feng, Keyu Fan, Jing Zhang, Dahai Yu, Zhengwu Liu, and Ngai Wong. Xstreamvggt: Extremely memory-efficient streaming vision geometry grounded transformer with kv cache compression.arXiv preprint arXiv:2601.01204,

work page arXiv

[4] [4]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, 2024a. 12 Yufei Wang, Zhihao Li, Lanqing Guo, Wenhan Yang, Alex C Kot, and Bihan Wen. Contextgs: Compact 3d gaussian splatting with anchor level context model. InNeurIPS, 2024b. Hao Xu, Xiaolin Wu, and Xi Zhang. Improving 3d gauss...

work page internal anchor Pith review arXiv

[5] [5]

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni. Turboquant: Online vector quantization with near-optimal distortion rate.arXiv preprint arXiv:2504.19874, 2025a. Amir Zandieh, Majid Daliri, and Insu Han. Qjl: 1-bit quantized jl transform for kv cache quantization with zero overhead. InAAAI, 2025b. K Zhang, Y Chen, Z Liu, J Yang, and W Liu. Ha...

work page internal anchor Pith review arXiv