DISK: Differentiable Sparse Kernel Complex for Efficient Spatially-Variant Convolution

Yuchi Huo; Zhe Cao; Zhizhen Wu

arxiv: 2512.04556 · v3 · pith:JEVT6QUWnew · submitted 2025-12-04 · 💻 cs.GR · cs.CV

DISK: Differentiable Sparse Kernel Complex for Efficient Spatially-Variant Convolution

Zhizhen Wu , Zhe Cao , Yuchi Huo This is my paper

Pith reviewed 2026-05-21 18:46 UTC · model grok-4.3

classification 💻 cs.GR cs.CV

keywords sparse kernel decompositiondifferentiable optimizationspatially-variant convolutionnon-convex kernelsefficient image filteringmobile imagingreal-time renderingkernel interpolation

0 comments

The pith

A differentiable decomposition represents dense complex kernels as sparse samples for efficient spatially-variant convolution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework to represent target dense complex spatially-variant kernels using a smaller set of sparse kernel samples. It enables differentiable optimization of the sparse samples, includes an initialization strategy to handle non-convex shapes, and uses kernel-space interpolation to support spatial variation without retraining or extra runtime cost. A sympathetic reader would care because direct dense convolution is too slow for mobile devices and real-time applications, while existing approximations either lose accuracy on non-convex kernels or remain expensive. Experiments indicate the approach yields higher fidelity than simulated annealing and lower cost than low-rank decompositions for both Gaussian and non-convex cases.

Core claim

The central claim is that any target spatially-variant dense complex kernel can be represented by a set of sparse kernel samples through a differentiable decomposition, supported by a dedicated initialization strategy for non-convex shapes and a kernel-space interpolation scheme that extends single-kernel filtering to spatially varying filtering without retraining and additional runtime overhead.

What carries the argument

The set of sparse kernel samples under differentiable optimization, combined with non-convex initialization and kernel-space interpolation.

If this is right

Higher fidelity than simulated annealing on Gaussian and non-convex kernels.
Significantly lower computational cost than low-rank decompositions.
Enables practical high-quality convolution on resource-limited devices for mobile imaging.
Supports real-time rendering with complex spatially-variant effects.
Remains fully differentiable for direct use inside larger learning pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could extend to video or dynamic scenes where kernels vary over time as well as space.
It may allow end-to-end training of the sparse sample positions within neural rendering systems.
Similar sparse decompositions might apply to other dense linear operators in graphics or scientific computing.
Testing on kernels with extreme discontinuities could reveal the practical limits of the interpolation scheme.

Load-bearing premise

A fixed set of sparse kernel samples with the proposed initialization and kernel-space interpolation can faithfully represent arbitrary non-convex dense complex kernels without substantial approximation error.

What would settle it

An experiment applying the method to a highly irregular non-convex kernel and measuring whether the achieved fidelity falls below that of low-rank decompositions or the optimization converges to poor local minima.

Figures

Figures reproduced from arXiv: 2512.04556 by Yuchi Huo, Zhe Cao, Zhizhen Wu.

**Figure 1.** Figure 1: An overview of our method. We represent a dense filter as a Sparse Kernel Complex, a sequence of sparse layers whose parameters Θ are learned via Differentiable Optimization. We apply our filter FΘ to an impulse δ to yield a synthesized kernel Ksyn, and minimize a loss L against the target Ktgt to learn arbitrary shapes. These optimized kernels serve as a basis for high-performance Spatially Varying Filter… view at source ↗

**Figure 2.** Figure 2: Comparison of Gaussian kernel approximation with varying σ. We compare our method against PST using two sparse configurations (8 layers × 6 samples and 12 layers × 4 samples). LPIPS scores appear in the top-right corner (lower is better). 5 EXPERIMENTS In this section, we conduct a series of experiments to evaluate our differentiable kernel decomposition framework thoroughly. We first describe the experi… view at source ↗

**Figure 3.** Figure 3: Speed, accuracy, and samples comparison. The figure plots quality against latency (lower is better for both). The size of each bubble represents the total sample count. Baselines. We compare our method against several baselines. For both single kernel and spatially varying filtering, we include a low-rank decomposition (LowRank) (McGraw, 2015) and the optimizationbased method of Parallel Tempering (PST)… view at source ↗

**Figure 4.** Figure 4: Comparison of Single kernel approximation. Compared to baselines, SVD-based decomposition (LowR.) and Parallel Simulated Tempering (PST), our approach (blue) better preserves sharp features on non-convex targets, resulting in lower LPIPS scores (lower is better). 5.2 SINGLE KERNEL [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visual comparison of diverse spatially varying (SV) effects. We evaluate three SV configurations: 1D tilt-shift blur (top), 2D rotational blur (middle), and 2D radial motion blur (bottom). We compare our method against Parallel Simulated Tempering (PST) and Low-Rank Decomposition (LowRank). Our method achieves results that are nearly indistinguishable from the ground truth. As shown in the red and green … view at source ↗

**Figure 6.** Figure 6: Ablation of initialization strategies on the Flower and Dove kernel. We evaluate both our method and Parallel Simulated Annealing (PST) combined with three initialization schemes: Random (Rand), Increasing Radial (IR), and Sparse Sampling (SS). 0 500 1000 1500 2000 2500 3000 Iteration Step 2 4 6 8 10 12 14 M A E (x 1 0 5 ) Ours 12x4 Ours 12x6 Ours 12x8 Ours 24x4 Ours 24x6 Ours 24x8 Ours 32x4 Ours 32x6 Ours… view at source ↗

**Figure 7.** Figure 7: Ablation results for various configurations of samples and layers on Ring kernel. stably, and configurations with more samples and layers tend to achieve higher quality. Compared with PST, our method delivers more consistent behavior and better quality across all tested configurations. For additional results, please refer to the Appendix, which includes ablations on Gaussian kernels with fewer samples and… view at source ↗

read the original abstract

Image convolution with complex kernels is a fundamental operation in photography, scientific imaging, and animation effects, yet direct dense convolution is computationally prohibitive on resource-limited devices. Existing approximations, such as simulated annealing or low-rank decompositions, either lack efficiency or fail to capture non-convex kernels. We introduce a differentiable kernel decomposition framework that represents a target spatially-variant, dense, complex kernel using a set of sparse kernel samples. Our approach features (i) a decomposition that enables differentiable optimization of sparse kernels, (ii) a dedicated initialization strategy for non-convex shapes to avoid poor local minima, and (iii) a kernel-space interpolation scheme that extends single-kernel filtering to spatially varying filtering without retraining and additional runtime overhead. Experiments on Gaussian and non-convex kernels show that our method achieves higher fidelity than simulated annealing and significantly lower cost than low-rank decompositions. Our approach provides a practical solution for mobile imaging and real-time rendering, while remaining fully differentiable for integration into broader learning pipelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DISK gives a differentiable sparse decomposition plus interpolation for spatially-variant complex kernels that looks practically useful for efficiency but rests on an unproven assumption about faithful approximation of arbitrary non-convex cases.

read the letter

The main takeaway is that this paper introduces a differentiable way to break down dense complex kernels into sparse samples for faster spatially-variant convolution, with added pieces for non-convex initialization and kernel-space interpolation to handle variation without retraining. That combination is the actual novelty compared to simulated annealing or low-rank baselines mentioned in the abstract. It targets a clear pain point in mobile imaging and real-time rendering where full dense convolution is too slow on limited hardware, and keeping the whole thing differentiable is a solid point for integration into learning pipelines. The reported experiments on Gaussian and non-convex kernels claim higher fidelity than annealing and much lower cost than low-rank methods, which would be a useful practical result if the numbers hold up in the full paper. The approach seems cleanly motivated and the interpolation trick for extending single-kernel filtering to spatial cases without extra overhead is a reasonable engineering move. On the soft spots, the abstract gives no concrete metrics, error bars, sample counts, dataset details, or ablation results, so the central claim stays plausible but thin. The weakest part is the assumption that a fixed set of sparse samples plus the proposed initialization and interpolation can represent arbitrary non-convex, dense, spatially-variant kernels without substantial error, especially for sharp spatial changes or highly irregular shapes. If the sample budget is limited or the interpolation introduces artifacts, the fidelity edge could disappear and leave just another approximation whose cost savings are unclear. The stress-test concern lands here because the abstract offers no worst-case bounds or analysis for those scenarios. This is the kind of work that would interest graphics and vision people building efficient differentiable filters for rendering or imaging pipelines. A reader looking for new approximation techniques rather than theoretical guarantees would get the most out of the method description and baseline comparisons. It has enough of a fresh angle and practical framing to deserve serious referee time rather than a desk reject, even if revisions will likely be needed to fill in the quantitative gaps and test the approximation limits more rigorously.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces DISK, a differentiable sparse kernel complex framework for efficient spatially-variant convolution. It represents target dense, complex, spatially-variant kernels via a fixed set of sparse kernel samples using (i) a differentiable decomposition for optimization, (ii) a dedicated initialization strategy to handle non-convex shapes, and (iii) a kernel-space interpolation scheme that enables spatial variation without retraining or extra runtime cost. Experiments on Gaussian and non-convex kernels are reported to demonstrate higher fidelity than simulated annealing and substantially lower computational cost than low-rank decompositions, with applications to mobile imaging and real-time rendering.

Significance. If the quantitative claims hold under detailed scrutiny, the approach could supply a practical, fully differentiable approximation technique that balances fidelity and efficiency for complex kernels, enabling broader use in resource-constrained graphics and imaging pipelines while supporting end-to-end learning.

major comments (2)

[Experiments section] Experiments section: the central claim of higher fidelity than simulated annealing for non-convex kernels rests on comparative results, yet the provided abstract and summary contain no quantitative metrics, error bars, dataset specifications, sample counts, or ablation studies; this absence directly affects verifiability of the fidelity advantage and must be addressed with concrete numbers and controls.
[§3.2 and §3.3] §3.2 (Decomposition and initialization) and §3.3 (Interpolation): the assumption that a fixed set of sparse samples plus the proposed non-convex initialization and kernel-space interpolation can faithfully approximate arbitrary non-convex, rapidly spatially-varying kernels without substantial error is load-bearing; without explicit approximation-error bounds, worst-case analysis for sharp spatial changes, or sensitivity to sample count, the method risks reducing to an uncharacterized approximation whose cost benefit is unclear.

minor comments (2)

[Abstract] Abstract: the phrase 'significantly lower cost' should be accompanied by the precise cost metric (FLOPs, runtime, memory) used in the comparison to low-rank decompositions.
[Notation] Notation: ensure consistent use of symbols for the sparse sample count and the interpolation weights across equations and text to avoid ambiguity.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment point by point below. Where the comments identify areas for improved clarity or additional supporting material, we have revised the manuscript accordingly.

read point-by-point responses

Referee: [Experiments section] Experiments section: the central claim of higher fidelity than simulated annealing for non-convex kernels rests on comparative results, yet the provided abstract and summary contain no quantitative metrics, error bars, dataset specifications, sample counts, or ablation studies; this absence directly affects verifiability of the fidelity advantage and must be addressed with concrete numbers and controls.

Authors: We agree that the abstract does not contain specific numerical results. The full manuscript's experiments section reports comparative fidelity results for both Gaussian and non-convex kernels, including dataset details and sample counts. To directly address the concern, we have added a summary table of key quantitative metrics (with error bars from repeated trials) to the experiments section and included a concise statement of the main fidelity improvement in the revised abstract. revision: yes
Referee: [§3.2 and §3.3] §3.2 (Decomposition and initialization) and §3.3 (Interpolation): the assumption that a fixed set of sparse samples plus the proposed non-convex initialization and kernel-space interpolation can faithfully approximate arbitrary non-convex, rapidly spatially-varying kernels without substantial error is load-bearing; without explicit approximation-error bounds, worst-case analysis for sharp spatial changes, or sensitivity to sample count, the method risks reducing to an uncharacterized approximation whose cost benefit is unclear.

Authors: We acknowledge that the manuscript does not supply formal approximation-error bounds or a complete worst-case analysis. We have added a sensitivity study with respect to sample count in the revised experiments section and a new paragraph discussing behavior under rapid spatial variation. However, deriving rigorous bounds for arbitrary non-convex kernels lies outside the current empirical scope; we therefore treat this as a limitation rather than a claim of universal guarantees. revision: partial

standing simulated objections not resolved

Deriving explicit approximation-error bounds and a full worst-case analysis for arbitrary rapidly varying non-convex kernels would require substantial new theoretical work beyond the empirical focus and scope of the present manuscript.

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The paper proposes a new differentiable sparse kernel decomposition with dedicated initialization and kernel-space interpolation components. These algorithmic elements are presented as novel and are evaluated directly against external baselines (simulated annealing, low-rank decompositions) on Gaussian and non-convex kernels. No equations or claims reduce by construction to fitted inputs, self-citations, or renamed prior results; the fidelity and cost advantages are reported as empirical outcomes from independent experiments rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that sparse samples suffice for non-convex kernel representation; no free parameters or invented entities are explicitly quantified in the abstract, and no independent evidence for the assumption is supplied.

axioms (1)

domain assumption Sparse kernel samples with dedicated initialization and interpolation can represent arbitrary non-convex dense complex kernels with high fidelity
This premise underpins the entire decomposition and extension to spatially-variant filtering.

pith-pipeline@v0.9.0 · 5703 in / 1254 out tokens · 39197 ms · 2026-05-21T18:46:00.454965+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce a differentiable kernel decomposition framework that represents a target spatially-variant, dense, complex kernel using a set of sparse kernel samples... a dedicated initialization strategy for non-convex shapes... kernel-space interpolation scheme
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method achieves higher fidelity than simulated annealing and significantly lower cost than low-rank decompositions

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages · 1 internal anchor

[1]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

Frame buffer postprocessing effects in double-steal (wrechless)

Masaki Kawase. Frame buffer postprocessing effects in double-steal (wrechless). InGame Devel- opers Conference 2003, 3,

work page 2003
[3]

Revisiting dynamic convolution via matrix de- composition.arXiv preprint arXiv:2103.08756,

Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, and Nuno Vasconcelos. Revisiting dynamic convolution via matrix de- composition.arXiv preprint arXiv:2103.08756,

work page arXiv
[4]

Moving mobile graphics

Sam Martin, Andrew Garrard, Andrew Gruber, Marius Bjorge, Renaldas Zioma, Simon Benge, and Niklas Nummelin. Moving mobile graphics. InACM SIGGRAPH 2015 Courses, SIGGRAPH ’15, New York, NY , USA,

work page 2015
[5]

ISBN 9781450336345

Association for Computing Machinery. ISBN 9781450336345. doi: 10.1145/2776880.2787664. URLhttps://doi.org/10.1145/2776880.2787664. Tim McGraw. Fast bokeh effects using low-rank linear filters.The Visual Computer, 31(5):601–611,

work page doi:10.1145/2776880.2787664

[1] [1]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

Frame buffer postprocessing effects in double-steal (wrechless)

Masaki Kawase. Frame buffer postprocessing effects in double-steal (wrechless). InGame Devel- opers Conference 2003, 3,

work page 2003

[3] [3]

Revisiting dynamic convolution via matrix de- composition.arXiv preprint arXiv:2103.08756,

Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, and Nuno Vasconcelos. Revisiting dynamic convolution via matrix de- composition.arXiv preprint arXiv:2103.08756,

work page arXiv

[4] [4]

Moving mobile graphics

Sam Martin, Andrew Garrard, Andrew Gruber, Marius Bjorge, Renaldas Zioma, Simon Benge, and Niklas Nummelin. Moving mobile graphics. InACM SIGGRAPH 2015 Courses, SIGGRAPH ’15, New York, NY , USA,

work page 2015

[5] [5]

ISBN 9781450336345

Association for Computing Machinery. ISBN 9781450336345. doi: 10.1145/2776880.2787664. URLhttps://doi.org/10.1145/2776880.2787664. Tim McGraw. Fast bokeh effects using low-rank linear filters.The Visual Computer, 31(5):601–611,

work page doi:10.1145/2776880.2787664