pith. machine review for the scientific record. sign in

cs.GR

Graphics

Covers all aspects of computer graphics. Roughly includes material in all of ACM Subject Class I.3, except that I.3.5 is is likely to have Computational Geometry as the primary subject area.

0
cs.GR 2026-05-14

BlitzGS is a distributed 3D Gaussian Splatting system that shards Gaussians by index…

BlitzGS: City-Scale Gaussian Splatting at Lightning Speed

BlitzGS delivers order-of-magnitude faster city-scale 3D Gaussian Splatting training via index-parity GPU sharding, scheduled importance…

abstract click to expand
We present BlitzGS, a distributed 3DGS framework that reduces active Gaussian workload for fast city-scale reconstruction. BlitzGS manages this workload at three coupled levels. At the system level, the framework shards Gaussians across GPUs by index parity rather than spatial blocks. This approach mitigates the cross-block visibility redundancy inherent in spatial partitioning. Furthermore, it distributes each rendering step through a single cross-GPU exchange that routes projected Gaussians to their tile owners. At the model level, scheduled importance-scoring passes shrink the global Gaussian population. During these passes, the framework generates a per-Gaussian visibility weight to bias density-control updates toward contributing primitives and a per-view importance mask for the view-level renderer. At the view level, BlitzGS trims each camera's active set with a distance-based LOD gate to exclude excessively fine primitives for the current frustum and the importance-based culling mask to skip Gaussians with negligible cross-view contribution. On large-scale benchmarks, BlitzGS matches the rendering quality of recent large-scale baselines while delivering an order-of-magnitude speedup, training city-scale scenes in tens of minutes. Our code is available at https: //github.com/AkierRaee/BlitzGS.
0
0
cs.GR 2026-05-14 Recognition

Coupled latents produce 3D assets with built-in skeletons and skinning

Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

Joint surface-skeleton modeling yields rigged meshes that outperform post-hoc auto-rigging on animation metrics.

Figure from the paper full image
abstract click to expand
Recent 3D generative models can synthesize high-quality assets, but their outputs are typically static: they lack the skeletal rigs, joint hierarchies, and skinning weights required for animation. This limits their use in games, film, simulation, virtual agents, and embodied AI, where assets must not only look plausible but also move plausibly. We introduce Rigel3D, a generative method for animation-ready 3D assets represented as rigged meshes. Unlike post-hoc auto-rigging methods that attach rigs to completed shapes, our method jointly models geometry and rig structure through coupled surface and skeleton structured latent representations. A rig-aware autoencoder decodes these representations into mesh geometry, skeleton topology, joint coordinates, and skinning weights, while a two-stage latent generative model synthesizes both surface and skeleton representations for image-conditioned generation. To support downstream animation workflows, we further introduce an open-vocabulary joint labeling module that embeds generated joints into a shared vision-language space, enabling correspondence to arbitrary retargeting templates. Experiments on large-scale rigged asset datasets demonstrate that our method generates diverse, high-quality animation-ready assets and outperforms existing rigging baselines across multiple metrics.
0
0
cs.GR 2026-05-13 1 theorem

Streaming assembly matches full rebuilds exactly in tetrahedral simulations

STA-FEM: Exact Streaming Assembly for Preplanned Dynamic Tetrahedral Topology Edits

Preallocated superset meshes and edit streams turn per-frame assembly into incremental updates with 1.37x-1.61x end-to-end speedups and zero

Figure from the paper full image
abstract click to expand
Dynamic tetrahedral simulation pipelines rebuild topology-dependent solver state after every fracture, refinement, or merge event - discarding structural continuity that survives each edit and spending global work on what are often local changes. We present STA-FEM, a streaming assembly method for simulations with topologically-dynamic tetrahedral meshes operating on a fixed superset mesh: when the candidate element pool is preallocated and the per-frame edit stream is exposed, the surrounding solver, preconditioner, and time-stepping layers stay unchanged while the per-frame assembly step is replaced with persistent incremental updates that match a full-rebuild approach exactly at every frame. Across various three-dimensional examples with up to 460k elements, the method delivers end-to-end speedups of 1.37x to 1.61x over full-rebuild with orders-of-magnitude reductions in matrix update cost, preserving exact matrix parity in all tested frames against a stronger exact local recomputation baseline. We test our algorithm in realistic fracture simulation pipelines and observe up to 76% speedups in fracture frame time with exact equivalence to a ground-truth full-rebuild algorithm. These results establish exact streaming assembly as a potentially practical approach for simulating tetrahedral meshes with dynamic topology.
0
0
cs.GR 2026-05-13 Recognition

Shift mapping makes ToF reservoir resampling interactive

ToF ReSTIR: Time-of-Flight Rendering with Spatio-temporal Reservoir Resampling

Enforcing path lengths on reused samples supports complex dynamic scenes at real-time rates for gated and transient capture.

Figure from the paper full image
abstract click to expand
We present a novel spatio-temporal reuse framework for time-resolved light transport, enabling efficient Monte Carlo rendering of time-of-flight (ToF) phenomena such as time-gated imaging and transient light capture. Existing ToF rendering methods are computationally expensive, scale poorly to complex dynamic scenes, and are therefore unsuitable for applications with strict latency constraints. To address this limitation, we draw inspiration from ReSTIR, a reuse-based technique for steady-state real-time rendering, and adapt its core principles to interactive-rate ToF simulation. However, naively applying existing ReSTIR methods to ToF rendering leads to severe inefficiency, as reused paths frequently violate optical path-length constraints and thus contribute little or no signal. We overcome this challenge by introducing a path reuse formulation that explicitly enforces physically valid optical path lengths. The key idea is path-length-aware shift mapping, a geometric transformation based on Newton's method that adjusts reused light paths to satisfy temporal gating constraints, inspired by specular manifold exploration in steady-state caustics rendering. The resulting framework substantially improves the efficiency of ToF rendering across a wide range of scenarios, including complex scenes with glossy or specular materials and dynamic motion. Our method supports both time-gated and transient rendering at interactive frame rates, enabling simulation under practical latency constraints. We demonstrate the effectiveness of our approach through two downstream applications, including shape reconstruction and navigation.
0
0
cs.GR 2026-05-13 Recognition

Low-resolution 3DGS yields high-res high-frame-rate video

3DGS³: Joint Super Sampling and Frame Interpolation for Real-Time Large-Scale 3DGS Rendering

Gradient-guided super-sampling and temporal interpolation recover detail from cheap base renders for real-time large scenes.

Figure from the paper full image
abstract click to expand
3D Gaussian Splatting (3DGS) enables high-quality real-time 3D rendering but faces challenges in efficiently scaling to ultra-dense scenes and high-resolution due to computational bottlenecks that limit its use in latency-sensitive applications. Instead of optimizing the splatting pipeline itself, we propose \textbf{3DGS$^3$}, a unified post-rendering framework that jointly performs super sampling and frame interpolation through differentiable processing of low-resolution outputs to achieve both high-resolution and high-frame-rate rendering. Our \textbf{Gradient\- \-Aware Super Sampling (GASS)} module leverages the continuous differentiability of 3DGS to extract image gradients that guide a GRU-based refinement network to enable high-fidelity super sampling. Furthermore, a \textbf{Lightweight Temporal Frame Interpolation (LTFI)} module based on a compact U-Net-like backbone fuses temporal and differentiable spatial cues from consecutive frames to synthesize temporally coherent intermediate frames. Experiments on public datasets demonstrate that 3DGS$^3$ achieves superior rendering efficiency and visual quality when compared with state-of-the-art methods and remains compatible with existing 3DGS acceleration techniques. The code will be publicly released upon acceptance.
0
0
cs.GR 2026-05-12 2 theorems

Inverted culling speeds dynamic LiDAR ray tracing

Geometrically Approximated Modeling for Emitter-Centric Ray-Triangle Filtering in Arbitrarily Dynamic LiDAR Simulation

Modeling emitters as traced cones lets each triangle reject most rays upfront, avoiding per-frame structure rebuilds for moving scenes.

Figure from the paper full image
abstract click to expand
Real-time Light Detection And Ranging (LiDAR) simulation must find, per emitted ray, the closest intersecting triangle even in dynamic scenes containing large numbers of moving and deformable objects. Dominant acceleration-structure approaches require rebuilding each frame for dynamic geometry -- a cost that compounds directly with scene dynamics and cannot be amortized regardless of how little actually changed. This paper presents the Gajmer Ray-Casting Algorithm (GRCA), which inverts the question: instead of asking what does each ray hit? it asks which rays can each triangle possibly hit? GRCA geometrically models spinning LiDAR emitters as rotation-traced cones or planes and uses each triangle's emitter-centric apparent area to cull, per triangle, which channels and the rays within those channels can possibly reach it -- without any acceleration structure. GRCA is compute-based and vendor-agnostic by design, targeting highly dynamic, high-resolution simultaneous multi-sensor simulation. At its core, GRCA is a general-purpose ray-casting algorithm: the emitter-centric inversion applies to any setting where rays originate from a known position, not only LiDAR. Benchmarks evaluate 2-8 simultaneous 128x4096-ray LiDARs (360deg/180deg) over complex dynamic scenes -- with just two sensors casting ~1M rays per frame. With range culling inactive, GRCA reaches up to 7.97x over hardware-accelerated OptiX (GPU) and 14.55x over Embree (CPU). Two independent extensions further boost performance even in the most complex scene (~22M triangles, ~9M of which are dynamic, 8 LiDARs): range culling at realistic deployment ranges (10-100m) reaches up to 7.02x GPU and 9.33x CPU; a hybrid pipeline -- GRCA for dynamic geometry, OptiX/Embree for static -- reaches up to 10.5x GPU and 19.2x CPU.
0
0
cs.GR 2026-05-11 Recognition

Anchored Gaussians track wireframe deformations from sparse views

FrameTwin: Curve-Anchored Gaussian Alignment from Sparse Views for Adaptive Wireframe 3D Printing

The curve-constrained representation creates a digital twin to adapt printing paths for observed bends in thin structures.

Figure from the paper full image
abstract click to expand
We present FrameTwin, a curve-anchored Gaussian alignment framework that uses sparse-view images to close the control loop for adaptive wireframe 3D printing. Our key idea is to capture the deformation of thin wireframe structures from sparse-view images using Gaussian kernels anchored to parametric curves, yielding a compact and geometry-aware encoding that explicitly captures strut topology. Driven by a differentiable rendering pipeline, FrameTwin estimates a neural deformation field that aligns the partially printed target model with the deformed structure observed during fabrication, where the optimized curve-Gaussian representation serves as a digital twin of the evolving wireframe. Unlike general Gaussian-splatting approaches, our formulation constrains kernel placement along parametric curves, substantially reducing the ambiguity inherent in sparse-view observations of thin structures. The resultant deformation-field alignment enforces global consistency across all struts. By using the estimated deformation field to blend the distorted printed geometry with the remaining unprinted geometry, FrameTwin enables adaptive updates to future printing trajectories. We demonstrate that FrameTwin can robustly capture and compensate for deformation in wireframe models fabricated using a robotized 3D printing system.
0
0
cs.GR 2026-05-11 2 theorems

Divergence-free kernels advect Gaussian splats to reconstruct fluid velocities

LagrangianSplats: Divergence-Free Transport of Gaussian Primitives for Fluid Reconstruction

Structural enforcement of incompressibility and coherence replaces soft penalties, improving accuracy from 2D video observations.

Figure from the paper full image
abstract click to expand
Reconstructing 3D fluid velocity fields from sparse 2D video observations is a highly ill-posed inverse problem, demanding both transport consistency with observed motion and physical validity under fluid laws. Existing methods typically impose these constraints through soft penalties, often leading to compromised accuracy and convergence issues. We introduce a reconstruction framework that structurally enforces both constraints. Specifically, we parameterize the reconstructed velocity using a continuous Divergence-Free Kernel representation, driving the advection of a Lagrangian 3D Gaussian Splatting representation. This formulation intrinsically guarantees both flow incompressibility and long-range transport coherence by construction. To enable the efficient optimization of such a constrained system, we introduce a novel Sliding Window scheme that propagates gradients over meaningful temporal horizons while maintaining tractable training costs. Experiments on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art baselines in both transport consistency and physical accuracy, enabling applications such as high-quality re-simulation and flow analysis.
0
0
cs.GR 2026-05-11 2 theorems

Color-adaptive scheme raises 3D Gaussian streaming quality 5-20 dB

CAGS: Color-Adaptive Volumetric Video Streaming with Dynamic 3D Gaussian Splatting

CAGS fixes compression color errors with low-res references and vector-quantized LoDs, running faster than prior scalable methods.

Figure from the paper full image
abstract click to expand
Volumetric video (VV) streaming enables real-time, immersive access to remote 3D environments, powering telepresence, ecological monitoring, and robotic teleoperation. These applications turn VV streaming into a real-time interface to remote physical environments, imposing new system-level demands for photorealistic scene representation, low-latency interaction, and robust performance under heterogeneous networks. 3D Gaussian Splatting (3DGS) has been widely used for real-time photorealistic rendering, offering superior visual quality and rendering performance, but it faces challenges due to bandwidth consumption. Furthermore, as the foundation of adaptive VV streaming, existing Levels of Detail (LoD) methods based on density are not well-suited to Gaussian representations, leading to visible gaps and severe quality degradation. Recent studies have also explored attribute compression techniques to reduce bandwidth consumption. Our preliminary studies reveal that aggressive attribute compression primarily causes color distortion, which can be effectively corrected in the rendered image using a reference image. Motivated by these findings, we propose a novel Color-Adaptive scheme for adaptive VV streaming that uses vector quantization (VQ) to establish LoDs and correct color distortions with low-resolution reference images. We further present CAGS, an adaptive VV streaming system compatible with diverse Gaussian representations, which integrates the Color-Adaptive scheme by rendering reference images on the streaming server and performing color restoration on the client. Extensive experiments on our prototype system demonstrate that CAGS outperforms the existing adaptive streaming systems in PSNR by 5$\sim$20 dB under fluctuating bandwidth, operates significantly faster than existing scalable Gaussian compression methods, and generalizes across different Gaussian representations.
0
0
cs.GR 2026-05-11 2 theorems

Strand sequences turn 3D hair generation into semantic authoring

HairGPT: Strand-as-Language Autoregressive Modeling for Realistic 3D Hairstyle Synthesis

Dual decoupling of scalp regions and strand hierarchies produces controllable, high-fidelity results across realistic and stylized styles.

Figure from the paper full image
abstract click to expand
Hair is a rich medium of visual and cultural expression, yet its digital modeling remains challenging due to the duality of fluidity and structure. Many existing generative approaches rely primarily on continuous diffusion fields, which entangle global topology with local texture and obscure the semantic and structural organization of hairstyles. To address this, we propose HairGPT, a strand-centric framework that treats strands as generative primitives and formulates realistic 3D hairstyle synthesis as a dual-decoupled autoregressive sequence modeling problem. Our method applies spatial decoupling across semantic scalp regions and structural decoupling along a hierarchical strand representation, progressing from global layout to fine-grained style. We further introduce a geometric tokenizer and region-aware semantic annotations to guide strand-level generation, enabling compositional editing, synthesis of rare and complex hairstyles, and adaptation to stylized domains. By aligning generative modeling with the workflow of digital grooming, HairGPT turns hair generation from opaque texture synthesis into a structured and semantically controllable authoring process, supporting robust semantic conditioning and high-fidelity results across realistic and stylized domains. Project Page: https://haiminluo.github.io/hairgpt/
0
0
cs.GR 2026-05-11 2 theorems

Targeted region regeneration fixes low-poly meshes locally

MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation

MeshFIM conditions autoregressive output on surrounding context to edit or repair only unsatisfactory areas without rebuilding the rest.

Figure from the paper full image
abstract click to expand
Autoregressive (AR) models can generate high-quality low-poly meshes from point clouds, but they still operate in an all-or-nothing manner: when a local region is unsatisfactory, the entire mesh must be regenerated, wasting computation and destroying satisfactory mesh structure elsewhere. We introduce MeshFIM, a Fill-in-the-Middle (FIM) framework that regenerates a target region of a low-poly mesh conditioned on the surrounding context. MeshFIM addresses three mesh-specific challenges: enforcing exact attachment along the exposed boundary, preserving topological order in the context, and suppressing overflow beyond the intended region. It does so with five complementary design choices: boundary vertex markers, context positional embeddings, expanded context width, context augmentation, and a low-poly geometry encoder whose gated subtraction mechanism focuses generation on the missing region by leveraging the difference between the reference surface and the existing mesh. Detailed ablation studies are presented to show the effectiveness of every introduced component. Based on MeshFIM, we demonstrate two applications: interactive brush-based editing and automatic defect repair on low-poly mesh (see Figure 1). Last but not least, experiments show that MeshFIM outperforms a range of baselines in mesh refinement, mesh repair and whole mesh generation plus stitch-back scheme.
0
0
cs.GR 2026-05-11 Recognition

Local bone mappings stabilize garment refitting on varied bodies

LoBoFit: Flexible Garment Refitting via Local Bone Mapping Blending

A linear blend of positions in skeleton-local frames creates a smoother, more stable space for adapting clothes while keeping wrinkles and设计

Figure from the paper full image
abstract click to expand
Garment refitting, the task of adapting a garment from a source to a target avatar, must preserve the original design features and fine-scale wrinkles, a challenge exacerbated by significant shape variations and varying poses without registration to a shared canonical pose. Existing methods struggle to balance robustness, efficiency, and fidelity of detail: physics-based simulation is costly, data-driven approaches lack generalizability, and geometry optimization in the full vertex space is often ill-conditioned and prone to local minima with unsatisfactory quality. We identify that a fundamental limitation lies in the representation: deforming garments directly in global coordinates couples vertices non-locally, creating a complex and poorly-structured optimization landscape. Therefore, we introduce LoBoFit, a robust refitting method built upon a novel Local Bone Mapping Blending (LoBoMap Blending) representation. Instead of manipulating global vertex positions, LoBoMap Blending expresses garment geometry as a linear blend of its mappings into local bone coordinate frames. This representation is highly expressive and flexible: local bone mappings yield a pose-robust initialization and a well-conditioned parameterization, while blending weights smooth the optimization landscape and broaden the space of plausible solutions for stable convergence with fine-scale detail preservation. The subsequent refinement efficiently resolves collisions and preserves details by optimizing localized residuals, effectively decomposing the complex global deformation into manageable subproblems. Our experiments demonstrate that LoBoFit reliably refits high-resolution, single- and multi-layer garments across avatars with large shape and topological differences, while faithfully preserving intricate wrinkles and the intended fit style, outperforming state-of-the-art methods in robustness and output quality.
0
0
cs.GR 2026-05-11 1 theorem

Local 3D edits done inside the velocity sampler

Velocity-Space 3D Asset Editing

Three targeted interventions absorb leakage, amplify edits, and preserve identity without masks or retraining.

Figure from the paper full image
abstract click to expand
Editing a 3D asset locally, modifying a target region while preserving the rest, is a fundamental requirement of native 3D editing. Existing methods enforce locality through mechanisms external to the generator, such as manual 3D masks, post-hoc voxel merging, or 2D multi-view lifting. None of them intervene where the corruption actually originates: inside the ODE sampler. For a rectified-flow generator to achieve faithful local editing, its velocity field should be strong over the target editing region while vanishing on preserved content. Yet a single velocity field can hardly satisfy both requirements simultaneously, leading to three problems: (i) identity leakage that keeps the edit signal non-zero on preserved regions; (ii) no dedicated edit-amplification channel, so strengthening the edit inevitably perturbs identity; and (iii) an identity drag at the geometry and material stages, where a global condition pulls every token toward the target. We propose VS3D (Velocity-Space 3D Asset editing}), an inversion-free, training-free, and mask-free framework that addresses each problem with a targeted intervention inside the sampler. VS3D integrates three complementary modules, each corresponding to a specific stage of the editing pipeline. Reconstruction-Anchored Source Injection (RASI) absorbs identity leakage by turning the unconditional embedding into a per-step, asset-specific anchor calibrated through source reconstruction. Partial-Mean Guidance (PMG) amplifies the edit signal by contrasting high- and low-quality subsample estimates of the velocity difference, active only where a consistent edit exists. Twin-Agreement Residual injection (TAR) lets the sampler decide token by token what to preserve at the geometry and material stages.
0
0
cs.GR 2026-05-11 Recognition

Semantic codebook creates style-matched co-speech gestures

PersonaGest: Personalized Co-Speech Gesture Generation with Semantic-Guided Hierarchical Motion Representation

By organizing motion codes according to gesture semantics and applying reference prompts, the system produces motions faithful to both word,

Figure from the paper full image
abstract click to expand
Co-speech gesture generation aims to synthesize realistic body movements that are semantically coherent with speech and faithful to a user-specified gestural style. Existing VQ-VAE based co-speech gesture generation methods improve generation quality but fail to encode semantic structure into the motion representation or explicitly disentangle content from style, limiting both semantic coherence and personalization fidelity. We present PersonaGest, a two-stage framework addressing both limitations. In the first stage, a semantic-guided RVQ-VAE disentangles motion content and gestural style within the residual quantization structure, where a Semantic-Aware Motion Codebook (SMoC) organizes the content codebook by gesture semantics and contrastive learning further enforces content-style separation. In the second stage, a Masked Generative Transformer generates content tokens via a semantic-aware re-masking strategy, followed by a cascade of Style Residual Transformers conditioned on a reference motion prompt for style control. Extensive experiments demonstrate state-of-the-art performance on objective metrics and perceptual user studies, with strong style consistency to the reference prompt. Our project page with demo videos is available at https://danny-nus.github.io/PersonaGest/
0
0
cs.GR 2026-05-08

Avatar and face choice changes gesture quality ratings

Reality Check: How Avatar and Face Representation Affect the Perceptual Evaluation of Synthesized Gestures

Tests with seven different virtual humans reveal that the same synthesized motions get rated differently based on appearance and facial

Figure from the paper full image
abstract click to expand
The capacity to create realistic virtual humans has progressed significantly, and such characters can be found in many applications across entertainment, education and health. As an essential element of interactive virtual humans, speech-driven 3D gesture generation still depends heavily on perceptual evaluation, yet studies often vary avatar appearance and facial presentation when judging the generated motions. Prior work suggests these visual choices can bias motion judgments, but controlled evidence remains limited. We address this gap with controlled evaluations of co-speech gestures across motion sources, spanning seven representative avatar renderings used in contemporary research and application pipelines. Our results show that avatar and face presentation systematically shift perceptual judgments, and we provide recommendations for benchmarking gesture synthesis as well as for deploying virtual humans in human-facing applications.
0
0
cs.GR 2026-05-08 1 theorem

Avatar visuals shift ratings of synthesized gestures

Reality Check: How Avatar and Face Representation Affect the Perceptual Evaluation of Synthesized Gestures

Perceptual judgments of co-speech gesture quality vary with avatar appearance and face presence, affecting how synthesis methods are tested.

Figure from the paper full image
abstract click to expand
The capacity to create realistic virtual humans has progressed significantly, and such characters can be found in many applications across entertainment, education and health. As an essential element of interactive virtual humans, speech-driven 3D gesture generation still depends heavily on perceptual evaluation, yet studies often vary avatar appearance and facial presentation when judging the generated motions. Prior work suggests these visual choices can bias motion judgments, but controlled evidence remains limited. We address this gap with controlled evaluations of co-speech gestures across motion sources, spanning seven representative avatar renderings used in contemporary research and application pipelines. Our results show that avatar and face presentation systematically shift perceptual judgments, and we provide recommendations for benchmarking gesture synthesis as well as for deploying virtual humans in human-facing applications.
0
0
cs.GR 2026-05-08 2 theorems

Kernel-derived opacity turns surface splatting differentiable for inverse rendering

3DSS: 3D Surface Splatting for Inverse Rendering

Opacity taken from accumulated reconstruction weights supplies clean edge gradients and anti-aliased silhouettes during optimization of 3D,

Figure from the paper full image
abstract click to expand
We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a direct formulation in terms of the reconstruction kernels themselves. From this foundation we derive a coverage-based compositing model whose per-layer opacity arises directly from the accumulated Elliptical Weighted Average reconstruction weight, yielding anti-aliased silhouettes and informative visibility gradients at sparsely covered edges. Combined with forward microfacet shading under co-optimized HDR environment lighting and density-aware adaptive refinement, 3DSS jointly recovers shape, spatially-varying BRDF materials, and illumination. Because the optimized representation is a set of oriented surface samples, it bridges natively to mesh-based workflows via surface reconstruction from oriented point cloud methods. We evaluate 3DSS against mesh-based, implicit, and Gaussian-splatting baselines across geometry reconstruction, novel-view synthesis, and novel-illumination relighting.
0
0
cs.GR 2026-05-08

Surface splatting made differentiable for shape and material recovery

3DSS: 3D Surface Splatting for Inverse Rendering

A coverage model from reconstruction kernels supplies anti-aliased edges and visibility gradients that let oriented points optimize to match

Figure from the paper full image
abstract click to expand
We present 3D Surface Splatting (3DSS), the first differentiable surface splatting renderer for physically-based inverse rendering from multi-view images. Our central insight is that the surface separation problem at the heart of surface splatting admits a direct formulation in terms of the reconstruction kernels themselves. From this foundation we derive a coverage-based compositing model whose per-layer opacity arises directly from the accumulated Elliptical Weighted Average reconstruction weight, yielding anti-aliased silhouettes and informative visibility gradients at sparsely covered edges. Combined with forward microfacet shading under co-optimized HDR environment lighting and density-aware adaptive refinement, 3DSS jointly recovers shape, spatially-varying BRDF materials, and illumination. Because the optimized representation is a set of oriented surface samples, it bridges natively to mesh-based workflows via surface reconstruction from oriented point cloud methods. We evaluate 3DSS against mesh-based, implicit, and Gaussian-splatting baselines across geometry reconstruction, novel-view synthesis, and novel-illumination relighting.
0
0
cs.GR 2026-05-07

Bayesian view selection cuts scans needed for task-specific 3D models

A Bayesian Approach for Task-Specific Next-Best-View Selection with Uncertain Geometry

It optimizes camera poses for semantic classification, segmentation or physics simulation rather than uniform uncertainty reduction.

Figure from the paper full image
abstract click to expand
We develop a framework for task-specific active next-best-view selection in 3D reconstruction from point clouds, by casting the problem in the language of Bayesian decision theory. Our framework works by (a) placing a prior distribution over the space of implicit surfaces, (b) using recently-developed stochastic surface reconstruction methods to calculate the resulting posterior distribution, then (c) using the posterior distribution to carefully reason about which view to scan next. This enables us to perform camera selection in a manner that is directly optimized for the intended use of the reconstructed data - meaning, we reduce uncertainty only in those regions that make a difference in the task at hand, as opposed to prior approaches that reduce it uniformly across space. We evaluate our method across three distinct downstream tasks: semantic classification, segmentation, and PDE-guided physics simulation. Experimental results demonstrate that our framework achieves superior task performance with fewer views compared to commonly used baselines and prior general uncertainty-reduction techniques.
0
0
cs.GR 2026-05-07

Algebraic coarsening delivers 3x speedup in GPU contact solves

AGIPC: Adaptive In-Solve Algebraic Coarsening for GPU IPC

In-solve vertex aggregation reduces system size without remeshing or loss of contact accuracy.

Figure from the paper full image
abstract click to expand
Implicit time integration is key to robustly simulating stiff materials and large deformations, but its performance is often dominated by repeatedly solving large linear systems. Adaptive coarsening can reduce this cost by concentrating degrees of freedom (DoF) to where it is most needed, yet conventional explicit remeshing changes connectivity (and often vertex ordering), complicating parallel implementations, harming memory locality, and sometimes being disallowed when it may introduce local geometry intersections. Adaptive subspace approaches avoid topological changes, but basis construction and updates incur irregular data access patterns and typically produce dense system matrices, limiting GPU efficiency and keeping many practical systems CPU-centric. We present algebraic adaptive in-solve coarsening, a GPU-oriented method that dynamically reduces DoF within the Newton solve of implicit time integration without explicit topological modification. Starting from a fine mesh, we express adaptivity as a selective edge-collapse process governed by per-edge tags. Collapsible edges are aggregated in parallel using a warp-level hash mapping scheme that groups fine vertices into coarse super-nodes, while protected edges preserve local detail. This defines an implicit coarse mesh whose linear system is assembled algebraically by mapping and reducing fine-scale gradients and Hessians via efficient GPU reduction kernels. We solve the resulting coarse system with a preconditioned conjugate gradient (PCG) method and then prolongate the solution back to the fine mesh. Our approach integrates seamlessly with IPC's barrier energy and exploits GPU parallelism end-to-end. Across a range of challenging scenarios, we achieve up to 3x speedup over a state-of-the-art GPU IPC solver while producing visually indistinguishable results.
0
0
cs.GR 2026-05-07

CoherentRaster enables real-time light field rendering

CoherentRaster: Efficient 3D Gaussian Splatting for Light Field Displays

Cross-view attribute reuse and subpixel remapping cut redundant work in 3D Gaussian splatting for interlaced displays on standard hardware.

Figure from the paper full image
abstract click to expand
Light field displays (LFDs) require rendering an interlaced image that encodes many view-dependent observations. This multi-view requirement introduces substantial computational overhead, making real-time rendering difficult to achieve. While 3D Gaussian Splatting (3DGS) is efficient for single-view rendering on 2D displays, directly extending it to LFDs is computationally expensive. Moreover, prior accelerations either suffer from GPU inefficiency under spatially incoherent subpixel layouts or rely on computationally heavy multi-plane intermediates. In this paper, we propose CoherentRaster, a 3DGS-based light field rendering framework that performs subpixel-level rasterization. Our method employs Cross-view Coherent Attribute Reuse to eliminate redundant computation across neighboring viewpoints and applies View-coherent Remapping to restore warp-level memory efficiency degraded by the interlaced subpixel layout. Together, CoherentRaster provides an efficient pipeline for real-time, high-quality light field synthesis on consumer-grade hardware.
0
1
cs.GR 2026-05-06 3 theorems

Precomputed maps simulate real lens optics an order of magnitude faster

Precomputed Lens Transport Maps

A factorized model with wavelength inputs and a binary ray mask reproduces flares and aberrations without per-wavelength polynomials or full

Figure from the paper full image
abstract click to expand
Accurate real-time simulation of lens optics remains challenging due to the computational expense of full ray tracing and the limitations of existing approximations. The commonly used pinhole model and thin-lens model ignore many optical effects seen in real-world lens systems such as distortion and chromatic aberration. Prior polynomial models approximate a mapping between incident rays and exitant rays through a lens system per wavelength. Prior neural models improve the accuracy of this mapping and also capture wavelength-dependent variations (e.g., chromatic aberration) by integrating wavelength as an input to a unified neural network. Common to those prior models is that they omit Fresnel intensity throughput, precluding accurate simulation of internal reflections and lens flares. We introduce a precomputed lens model that combines wavelength-aware inputs with Fresnel intensity outputs. By classifying rays as valid or occluded via a binary mask in a factorized representation, our method focuses regression on unblocked rays, improving accuracy near discontinuities. Our model avoids per-wavelength approximations in polynomial models and explicitly predicts Fresnel coefficients to enable accurate lens simulation. Designed for static, rotationally symmetric systems under geometric optics, our model captures various lens effects such as chromatic aberration, coma, and lens flares. Our method achieves improved accuracy over polynomial baselines and is an order of magnitude faster than brute force ray tracing. Our method serves as a practical and scalable approach for simulating complex lens systems in applications requiring both accuracy and computational efficiency.
0
0
cs.GR 2026-05-06 3 theorems

Bidirectional loop boosts spatial intelligence in unified visual model

Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation

Coupling understanding with spatial editing and novel views moves the model past general competence toward geometry-aware reasoning.

abstract click to expand
We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and instruction-guided image editing. JoyAI-Image couples a spatially enhanced Multimodal Large Language Model (MLLM) with a Multimodal Diffusion Transformer (MMDiT), allowing perception and generation to interact through a shared multimodal interface. Around this architecture, we build a scalable training recipe that combines unified instruction tuning, long-text rendering supervision, spatially grounded data, and both general and spatial editing signals. This design gives the model broad multimodal capability while strengthening geometry-aware reasoning and controllable visual synthesis. Experiments across understanding, generation, long-text rendering, and editing benchmarks show that JoyAI-Image achieves state-of-the-art or highly competitive performance. More importantly, the bidirectional loop between enhanced understanding, controllable spatial editing, and novel-view-assisted reasoning enables the model to move beyond general visual competence toward stronger spatial intelligence. These results suggest a promising path for unified visual models in downstream applications such as vision-language-action systems and world models.
0
0
cs.GR 2026-05-06

Delaunay scaffold samples implicit surfaces with 10x fewer evaluations

ADS: Random Sampling of Occupancy Functions using Adaptive Delaunay Scaffolding

It produces both random points on the surface and a connecting mesh for occupancy functions.

Figure from the paper full image
abstract click to expand
Dense random sampling and surfacing of shapes encoded via implicit occupancy functions (OFs) are critical elements of many applications. Existing methods largely provide either one or the other of random sampling or mesh surfaces: ray shooting approaches deliver random samples with no connectivity, and grid-based methods deliver mesh surfaces but their sampling is highly biased. We propose a new method which delivers both pseudo-random OF surface samples and an isosurface mesh connecting them. Our method achieves these goals while requiring an order of magnitude fewer function evaluations than prior approaches. Key to our Adaptive Delaunay Sampling (ADS) approach is a progressively computed Delaunay tetrahedralization of points in 3D space, which we use as a sampling and surfacing scaffold. Starting from an initial coarse Delaunay scaffold, we repeatedly refine crossing edges, ones whose end vertices lie on opposite sides of the surface, augmenting the scaffold with points closer and closer to the surface. Each refinement step uses the Delaunay criterion to incorporate the newly added vertices into the scaffold, introducing new crossing edges. We use the intersections of fine crossing edges with the OF surface as the output samples, and use the marching tetrahedra method to surface these samples. We subsequently use normal estimation to densify the sampling near fine features and in areas of high surface curvature. We validate ADS by sampling 150 inputs at different resolutions, and provide extensive comparisons to existing alternatives. Our experiments demonstrate significant improvement in accuracy/function evaluation count trade-off, and showcase downstream applications.
0
0
cs.GR 2026-05-05

Zero sets of complex sections produce bijective surface maps

Implicit Minimal Surfaces for Bijective Correspondences

Minimizing Ginzburg-Landau energy on the product space yields low-distortion correspondences without barriers or mesh surgery.

Figure from the paper full image
abstract click to expand
We introduce an implicit representation of continuous, bijective, orientation-preserving maps between genus zero surfaces with or without boundary. The distortion of these maps can easily be minimized by optimizing the Ginzburg-Landau functional - a ubiquitous model in physics and differential geometry - leading to a simple algorithm for computing bijective correspondences using only standard tools of the tangent vector field toolbox. The method avoids combinatorial mesh modifications and does not require barrier functions to enforce bijectivity making it more robust to noise and simpler to implement. Moreover, the algorithm does not assume a bijective initialization and can untangle non-bijective correspondences generated by computationally cheaper methods such as functional maps. It supports the use of both landmark points and landmark curves to guide the correspondence. The key idea is that a bijection between surfaces defines a two-dimensional mapping surface sitting inside the four-dimensional product space of the two inputs, and this mapping surface can be stored implicitly as the zero set of a complex section - essentially a complex function defined on the product space. Now the distortion of the map can be optimized by minimizing the area of this mapping surface, which amounts to minimizing the Ginzburg-Landau functional of the complex section. We demonstrate the practical benefits of our method by comparing to state-of-the-art correspondence algorithms and show that our implicit representation offers improved stability and naturally supports constraints that are difficult to enforce with explicit map representations.
0
0
cs.GR 2026-05-05

Adaptive layer speeds animation in-betweening by 3.5 times

Adaptive Interpolation-Synthesis for Motion In-Betweening on Keyframe-Based Animation

It switches between interpolation and synthesis to match the spacing and style of real production keyframe data.

Figure from the paper full image
abstract click to expand
Motion in-betweening is one of the most artistically demanding and time consuming stages of 3D animation, where the expressivity and rhythm of motion are defined. The level of creative control it requires makes it a major production bottleneck, underscoring the need for intelligent tools that assist animators in this process. Although recent deep learning approaches have achieved strong results in motion synthesis and in-betweening, they assume data characteristics, motion styles, and problem formulations that diverge from professional animation workflows. To bridge this gap, we propose a method explicitly aligned with the constraints of motion in-betweening for keyframe-based animation in production environments. At its core, the Adaptive Interpolation-Synthesis (AIS) layer mirrors the animator's creative process by dynamically balancing learned interpolation and direct pose synthesis. In addition, a domain-based input keypose schedule reflects the distribution of production data, improving stylistic consistency and alignment between training and real-world usage. Our method achieves state-of-the-art performance on production data; when integrated into Autodesk Maya, it enables animators to complete in-betweening tasks with a 3.5x speedup.
0
0
cs.GR 2026-05-05

Surface correspondence tracking yields clean structural MATs

Structural MAT: Clean and Scalable Medial Axis Simplification via Explicit Surface Correspondence

By maintaining explicit links between MAT vertices and surface patches, simplification preserves symmetries and fillet alignments on complex

Figure from the paper full image
abstract click to expand
The Medial Axis Transform (MAT) is a complete shape descriptor capable of reconstructing the geometry of the original domain. A high-quality MAT should not only facilitate high-fidelity reconstruction but also capture structural features -- for instance, by aligning the MAT boundary with the locus of rolling ball centers within fillet regions. However, computing such an ideal MAT remains a significant challenge, particularly when the input is a discrete triangle mesh. In this paper, we follow the established technical pipeline of initializing the MAT via a 3D Voronoi diagram of surface samples and subsequently simplifying the Voronoi structure through a QEM-like scheme. Our key insight is to explicitly track the correspondence between MAT vertices and surface regions throughout the progressive simplification process, ensuring that the resulting MAT triangles accurately reflect the intrinsic symmetries between surface patches. We translate these geometric requirements into a suite of priority control strategies that govern the sequencing of edge collapses. Through extensive evaluation against state-of-the-art MAT algorithms, we validate the strong performance of our approach regarding runtime efficiency, structural alignment, boundary regularity, triangle quality, and robustness to noise. Our resulting MATs remain highly expressive for both articulated shapes and CAD models, even under extreme simplification -- effectively capturing the global structure of complex geometries with only a few hundred vertices.
0
0
cs.GR 2026-05-05

Orbit-space flows match 3D shape SOTA with 5x fewer steps

Generative Modeling with Orbit-Space Particle Flow Matching

Canonicalizing particle permutations and tying terminal velocities to arc length lets models encode normals and reach high accuracy on Shape

Figure from the paper full image
abstract click to expand
We present Orbit-Space Geometric Probability Paths (OGPP), a particle-native flow-matching framework for generative modeling of particle systems. OGPP is motivated by two insights: (i) particles are defined up to permutation symmetries, so anonymous indexing inflates per-index target variance and yields curved, hard-to-learn flows; and (ii) particles live in physical space, so the flow terminal velocity has physical meaning and can encode geometric attributes, e.g., surface normals. OGPP instantiates three key components: (1) orbit-space canonicalization of the probability-path terminal endpoint, (2) particle index embeddings for role specialization, and (3) geometric probability paths with arc-length-aware terminal velocities that generate normals as a byproduct of the flow. We evaluate OGPP on minimal-surface benchmarks, where it reduces metric error by up to two orders of magnitude in a single inference step; on ShapeNet, where it matches the state of the art with 5x fewer steps and reaches airplane EMD comparable to DiT-3D with 26x fewer parameters and 5x fewer steps; and on single-shape encoding, where it produces normals and reconstructions competitive with 6D generators while operating entirely in 3D.
0
0
cs.GR 2026-05-04 3 theorems

Greedy algorithm keeps signed distance interpolations consistent

Greed for the Spheres: A Signed Distance Interpolation Method

Hard geometric constraints guarantee the augmented data matches a realizable surface, enabling refinement and repair of SDFs.

Figure from the paper full image
abstract click to expand
We propose a method to interpolate Signed Distance Function (SDF) data from a discrete set of samples. Unlike prior work, our approach ensures that the new SDF data values are fully consistent with the input and each other, such that the augmented data still corresponds to a geometrically realizable surface. We express the theoretical properties of SDFs as hard geometric constraints, and construct an efficient greedy algorithm for consistent SDF interpolation that is made even faster with powerful parallelized GPU preprocessing. We exemplify the usefulness of our method by evaluating it on three practical applications: global SDF refinement, in which the SDF data is upsampled without knowledge of the ground truth; mesh reconstruction, where our method can reconstruct highly detailed surfaces using global information from coarse input SDFs; and repair of pseudo-SDFs, which result from many pipelines such as CSG Boolean operations and must be turned into valid SDFs for downstream processing tasks. Our refined SDFs are guaranteed to be consistent with the input, where previous methods have no such guarantee.
0
0
cs.GR 2026-05-04

The paper introduces the Antipodal Method

The Antipodal Method: Fast, Accurate, and Robust 3D Generalized Winding Numbers

The Antipodal Method decomposes generalized winding numbers into signed ray intersections plus a spherical boundary integral, delivering…

abstract click to expand
Generalized winding numbers provide a robust measure of point insidedness for 3D surfaces - whether open, self-intersecting, or non-manifold - and are central to numerous geometry processing tasks. However, existing methods trade off between accuracy and computational efficiency, limiting their use in interactive and large-scale applications. We introduce a new formulation and algorithm for computing generalized winding numbers that is both fast and accurate to arbitrary precision, applicable to meshes and parametric surfaces. Our approach expresses the winding number as the sum of two intuitive geometric quantities: the signed number of ray-surface intersections and a boundary integral over the surface's projection onto the unit sphere. This insight leads to an efficient discretization that avoids expensive surface integrals and spherical arrangements. For meshes, our method achieves average speedups of $22\times$ on a CPU compared to the fastest precise methods and $3\times$ compared to the fastest approximation method, while maintaining full precision. On a GPU, for moderately complex meshes we reach a throughput of $10^9$ queries per second, or $4K$ generalized winding number slices at 120 FPS ($13\times$ faster than a naive GPU method). For parametric surfaces, our method is on average $5.6\times$ faster than the state-of-the-art method, with the same precision. Our method naturally handles complex topologies and non-manifold inputs. We extensively validate its accuracy, robustness, and time performance. Our code is available at https://github.com/MartensCedric/antipodal.
0
0
cs.GR 2026-05-04

Historians employ visualizations in five distinct roles

How Historians Use Visualization: A Corpus-Backed Taxonomy and Analysis for Cross-Disciplinary Practice

Corpus study of 4,142 articles reveals roles from primary sources to exploration, constrained by uncertainty and provenance demands

Figure from the paper full image
abstract click to expand
Visualization in historical research is shifting from isolated attempts to systematic practices. However, data-driven evidence about how historians actually use visualization remains scarce. We present a corpus-driven, mixed-methods study that combines analysis of images from 4,142 research articles across history and digital humanities journals with a collaboratively developed visualization taxonomy and a semi-automatic labeling pipeline. We construct a corpus of 14,021 images, classify 4,831 visualization instances using a hierarchical, domain-informed taxonomy, and analyze patterns of visualization adoption across venues, history subfields, and time. To interpret these patterns, we conduct interviews with 11 historians and use HiFigAtlas system as a boundary object to support joint inspection of the corpus. We identify distinct roles for visualizations in historical research: primary-source, evidence-synthesis, communicative, confirmative, and exploratory. We further find that while historians pursue diverse goals with figures, persistent epistemological and practical barriers, such as uncertainty, provenance, justification burden, and publication constraints, impede the adoption of visualization. This work contributes a grounded account of visualization use in historical scholarship and points to opportunities to better support domain-specific needs.
0
0
cs.GR 2026-05-04

P2M++ cuts point-to-mesh preprocessing by 3x-10x

P2M++: Enhanced Solver for Point-to-Mesh Distance Queries

Auxiliary sites and sphere-triangle checks shrink preparation time while raising query speed, with biggest wins on symmetric shapes.

Figure from the paper full image
abstract click to expand
Point-to-mesh distance queries are fundamental in computer graphics and geometric modeling. While the state-of-the-art P2M method achieves high-speed queries via Voronoi-based localization, it suffers from prohibitive precomputation costs. Its iterative Voronoi sweep for interference detection leads to redundant predicate evaluations and scales poorly on rotationally symmetric structures (e.g., spheres, cones or cylinders), where candidate counts grow quadratically. We propose P2M++ to address these limitations through three key contributions. First, we adaptively augment the set of mesh vertices with auxiliary sites in regions of high Voronoi vertex density to localize complex interference within minimal spatial regions. Second, we reformulate interference detection as a series of sphere-triangle collision tests centered at Voronoi cell corners, which are efficiently resolved using the base mesh's BVH. Finally, we enhance runtime performance by replacing the standard kd-tree search with a faster recursive dynamic programming implementation. Experimental results demonstrate that P2M++ is 3x-10x faster than the original P2M during preprocessing and 1.5x faster in queries, with even more pronounced gains on rotationally symmetric geometries.
0
0
cs.GR 2026-05-04

Interactive visuals help non-experts grasp ML

Towards Interactive Multimodal Representation of ML Functions for Human Understanding of ML

Three prototypes with transparent datasets test engagement factors to shift attitudes away from fear of AI.

Figure from the paper full image
abstract click to expand
Attitudes about artificial intelligence and machine learning are recent victims of endemic misunderstanding; given our increasing reliance on these technologies, the need for widespread understanding and confidence in their use is paramount. To this end, our work seeks to increase understanding in these typically inaccessible topics through interactive visualizations, thereby garnering curiosity in the hopes of kickstarting a cycle of understanding leading to further pursuit of knowledge. We hope this will cyclically shift global attitudes away from the intimidation of the unknown currently plaguing ML. This work explores best practices for supporting curiosity in new technologies, to inspire attitudinal paradigm-shifts. Over three, distinct visualizations of machine learning data, we created prototypes with carefully selected, highly-transparent datasets, to examine the success factors of engagement required for more informed attitudes on ML less dictated by the fear of the unknown. By employing interactive visualizations, we can captivate the interest of teenagers and individuals from diverse fields, encouraging them to explore the fascinating world of machine learning.
0
0
cs.GR 2026-05-01

Physics integration adds controllable fire to any 3D scene scan

FieryGS: In-the-Wild Fire Synthesis with Physics-Integrated Gaussian Splatting

A unified pipeline runs material reasoning, combustion simulation, and rendering together so fire respects real geometry without manual tune

Figure from the paper full image
abstract click to expand
We consider the problem of synthesizing photorealistic, physically plausible combustion effects in in-the-wild 3D scenes. Traditional CFD and graphics pipelines can produce realistic fire effects but rely on handcrafted geometry, expert-tuned parameters, and labor-intensive workflows, limiting their scalability to the real world. Recent scene modeling advances like 3D Gaussian Splatting (3DGS) enable high-fidelity real-world scene reconstruction, yet lack physical grounding for combustion. To bridge this gap, we propose FieryGS, a physically-based framework that integrates physically-accurate and user-controllable combustion simulation and rendering within the 3DGS pipeline, enabling realistic fire synthesis for real scenes. Our approach tightly couples three key modules: (1) multimodal large-language-model-based physical material reasoning, (2) efficient volumetric combustion simulation, and (3) a unified renderer for fire and 3DGS. By unifying reconstruction, physical reasoning, simulation, and rendering, FieryGS removes manual tuning and automatically generates realistic, controllable fire dynamics consistent with scene geometry and materials. Our framework supports complex combustion phenomena -- including flame propagation, smoke dispersion, and surface carbonization -- with precise user control over fire intensity, airflow, ignition location and other combustion parameters. Evaluated on diverse indoor and outdoor scenes, FieryGS outperforms all comparative baselines in visual realism, physical fidelity, and controllability. Project page can be found at https://pku-vcl-geometry.github.io/FieryGS/.
0
0
cs.GR 2026-05-01

Diffusion model adds relighting to any existing avatar

D-Rex : Diffusion Rendering for Relightable Expressive Avatars

Translates flat-lit renders to HDR illumination while preserving motion and facial details.

Figure from the paper full image
abstract click to expand
We present D-Rex, a person-specific framework for photorealistic, relightable, expressive, and animatable full-body human avatars with free-viewpoint rendering. Existing methods for relightable full-body avatars rely on explicit 3D intrinsic decomposition with analytic reflectance models, which require accurate geometry registration and careful optimization to capture realistic light transport effects. This tight coupling of relighting with avatar modeling has hindered expressiveness: to our knowledge, no existing method demonstrates strong facial animation alongside relighting, limiting applicability in telepresence, gaming, and virtual production. We propose to decouple relighting entirely from avatar modeling by treating it as an image-space post-process: a learned translation from flat-lit, albedo-like renderings to a target HDR illumination. To this end, we leverage the strong generative prior of a pre-trained video diffusion relighting model, fine-tuned via LoRA on paired flat-lit and relit frames captured in a light stage. The flat-lit driving frames are produced by an independent expressive full-body avatar framework trained under white-light conditions, requiring no modification to support relighting, making D-Rex directly applicable to any white-light avatar system. We demonstrate that D-Rex enables view- and temporally consistent relighting while faithfully preserving expressive motion and fine-grained facial detail, outperforming physically-based relightable avatar baselines. Project page is https://vcai.mpi-inf.mpg.de/projects/DRex/
0
0
cs.GR 2026-05-01

Quadrilateral concavity test clips lines without false points

Line Segment Clipping using Quadrilateral Concavity and Convexity

Shape check before division calculation reduces operations when clipping segments to rectangular windows.

abstract click to expand
This paper proposes an algorithm for clipping line segment against an axis-aligned rectangular window. The conventional algorithms for line segment clipping treat the clipping boundary and/or the line segment to be clipped as line. The present algorithm treats the clipping boundary and the line segment to be clipped as line segment and using this strategy, it succeeds to avoid computation of false intersection points. A quadrilateral is constructed using the end points of a clipping boundary segment and the end points of the line segment to be clipped as its vertices. The concavity and convexity of the quadrilateral dictates whether a line segment actually intersects the clipping boundary. If the quadrilateral is found to be concave then the line segment is rejected, otherwise the point of intersection of the line segment with the clipping boundary is computed. Since a 'test & intersect' approach is used instead of a 'intersect & test', hence the proposed algorithm does not compute false intersection point thereby reducing the number of divisions required to obtain a clipped line segment. Only one routine can process line segments at any position. Improved performance is observed with respect to the Nicholl-Lee-Nicholl, Liang-Barsky, Cohen-Sutherland and Skala's algorithm through experiments with random line segments using a metric based on execution time.
0
0
cs.GR 2026-05-01

Curve-guided Gaussians turn one sand-painting image into a coherent process

SandSim: Curve-Guided Gaussian Splatting for Reconstructing Sand Painting Processes

Modeling strokes as anisotropic primitives along trajectories yields temporally consistent sequences that match the input image.

Figure from the paper full image
abstract click to expand
Sand painting is a process-driven art where visual appearance emerges from granular accumulation. Given a single image, reconstructing a plausible sand painting process requires modeling coherent stroke structures and material-dependent effects. Existing methods, including stroke-based optimization and diffusion-based video synthesis, often lack structural coherence and material consistency, leading to unrealistic drawing sequences. We present SandSim, a framework that reconstructs a sand painting process from a single image. We introduce a curve-guided Gaussian representation that models strokes as sequences of anisotropic primitives along continuous trajectories, whose smooth kernels capture the soft boundaries of sand strokes and enable coherent stroke formation. We further adopt a subtractive compositing scheme to model light attenuation during sand accumulation. We incorporate a semantic-guided planning module for scene decomposition and drawing order inference. Our framework jointly optimizes stroke geometry and appearance and can be integrated with a physics-based simulator for interactive sand dynamics and editing. Experiments show that our method produces temporally coherent and visually realistic results, achieving improved reconstruction quality and perceptual fidelity compared to existing approaches.
0
0
cs.GR 2026-05-01

Diffusion model generates simple quad layouts on 3D shapes

SQuadGen: Generating Simple Quad Layouts via Chart Distance Fields

Chart distance fields convert the discrete connectivity problem into a continuous task that yields clean, editable meshes.

Figure from the paper full image
abstract click to expand
3D shapes from scanning, reconstruction, or AI-generated content often lack simple quad mesh layouts -- critical for efficient editing and modeling. Existing quad-remeshing techniques typically produce complex layouts with irregular loops, leading to tedious manual cleanup and extensive algorithm tuning. We introduce SQuadGen, a diffusion-based generative framework that leverages Chart Distance Fields (CDF) to synthesize simple quad layouts on 3D shapes. Our approach addresses two key challenges: (1) the discrete nature of mesh connectivity, which hinders learning, and (2) the scarcity of large-scale datasets with simple quad meshes. To overcome the first, we propose CDF, a continuous surface-based representation enabling effective learning and synthesis of quad layouts. To address the second, we define loop-aware simplicity metrics and construct a large-scale dataset of high-quality quad layouts recovered from public 3D repositories through a robust quad-recovery pipeline. Extensive evaluations across diverse 3D inputs show that SQuadGen consistently outperforms existing methods, producing robust, artist-friendly simple quad layouts.
0
0
cs.GR 2026-04-30

GMT is a neural solver that restructures a Point Transformer to run across geometric…

GMT: A Geometric Multigrid Transformer Solver for Microstructure Homogenization

GMT combines a restructured Point Transformer V3 with geometric multigrid hierarchies and physics-aware encoding to solve microstructure…

abstract click to expand
Lattice metamaterials enable lightweight, multifunctional structures, yet homogenization-based evaluation of their effective properties remains computationally expensive. Neural surrogates offer speed but often lack the accuracy and stability required for engineering-grade simulations. We introduce GMT, a Geometric Multigrid Transformer -- a neural solver with high numerical fidelity for fast and reliable lattice homogenization. GMT achieves architectural alignment with Geometric Multigrid (GMG) by restructuring Point Transformer V3 to operate across sparse GMG hierarchies, capturing long-range dependencies and cross-level interactions essential for multigrid convergence. To enforce physical consistency, GMT incorporates physics-aware positional encoding for strict enforcement of periodicity and predicts both the finest-level solution and multi-level residual corrections. These predictions deliver a spectrally-aligned initialization, enabling end-to-end training under physics-informed and solver-aware losses and requiring only a single GMG V-cycle refinement to reach convergence. This fusion of neural prediction and numerical rigor achieves relative residual errors of $10^{-5}$ with a $160\times$ speedup over state-of-the-art GPU-based solvers at equivalent accuracy -- particularly at high resolutions (e.g. $512^3$), where traditional methods become most costly. We validate GMT across mechanical and thermal domains, demonstrate robust generalization to unseen geometries and non-periodic settings, and showcase scalability to high resolutions -- enabling real-time design iteration, multi-scale simulations, high-throughput material discovery, and inverse design.
0
0
cs.GR 2026-04-29

LLM agents generate full 3D game cutscenes automatically

Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation

Director agent coordinates animation, camera, and sound specialists while staying linked to the engine for real-time observation and fixes.

Figure from the paper full image
abstract click to expand
Cutscenes are carefully choreographed cinematic sequences embedded in video games and interactive media, serving as the primary vehicle for narrative delivery, character development, and emotional engagement. Producing cutscenes is inherently complex: it demands seamless coordination across screenwriting, cinematography, character animation, voice acting, and technical direction, often requiring days to weeks of collaborative effort from multidisciplinary teams to produce minutes of polished content. In this work, we present Cutscene Agent, an LLM agent framework for automated end-to-end cutscene generation. The framework makes three contributions: (1)~a Cutscene Toolkit built on the Model Context Protocol (MCP) that establishes \emph{bidirectional} integration between LLM agents and the game engine -- agents not only invoke engine operations but continuously observe real-time scene state, enabling closed-loop generation of editable engine-native cinematic assets; (2)~a multi-agent system where a director agent orchestrates specialist subagents for animation, cinematography, and sound design, augmented by a visual reasoning feedback loop for perception-driven refinement; and (3)~CutsceneBench, a hierarchical evaluation benchmark for cutscene generation. Unlike typical tool-use benchmarks that evaluate short, isolated function calls, cutscene generation requires long-horizon, multi-step orchestration of dozens of interdependent tool invocations with strict ordering constraints -- a capability dimension that existing benchmarks do not cover. We evaluate a range of LLMs on CutsceneBench and analyze their performance across this challenging task.
0
0
cs.GR 2026-04-29

Neural assets bake full 8D light transport from path samples

8DNA: 8D Neural Asset Light Transport by Distribution Learning

Distribution learning yields near-field global illumination renderings that match path tracing with lower variance and fast inference.

Figure from the paper full image
abstract click to expand
High-fidelity 3D assets exhibit intriguing global illumination effects like subsurface scattering, glossy interreflections, and fine-scale fiber scatterings, which often involve long scattering paths that are expensive to simulate. We introduce 8D neural assets (8DNA) to pre-bake these light transport effects into neural representations. Unlike prior methods that assume far-field lighting and precompute light transport into 6D functions, 8DNA learns the full 8D light transport, enabling accurate rendering under near-field illumination. Our training leverages a distribution-learning formulation that learns light transport from forward path-traced samples, which produces less optimization variance with lower training budget than the prior regression-based approaches. Experiments show our 8DNA rendering closely matches path-traced results under various scene configurations, yet it achieves improved variance reduction and fast inference speeds on challenging assets.
0
0
cs.GR 2026-04-28 Recognition

Open video model surpasses closed ones via smarter distillation

Alice v1: Distillation-Enhanced Video Generation Surpassing Closed-Source Models

Alice v1 hits 91.2 on VBench with 7x faster generation by focusing on quality modes and hard examples.

Figure from the paper full image
abstract click to expand
Wepresent Alice v1, a 14-billion parameter open-source video generation model that achieves state-of-the-art quality through consistency distillation with score regularization (rCM). Contrary to conventional distillation-which trades quality for speed-we demonstrate that rCM-based distillation can exceed teacher model quality. We attribute this to three mechanisms: (1) the score regularization term acts as a mode-seeking objective that concentrates probability mass on high-quality outputs rather than covering the full teacher distribution, (2) our targeted synthetic data pipeline with hard example mining provides training signal specifically for failure modes (physics, hands, faces) that the teacher handles inconsistently, and (3) consistency enforcement acts as implicit regularization, eliminating "lucky path" dependence on specific noise samples. Alice v1 generates 5-second 720p videos at 24fps in 4 denoising steps (~8 seconds on H100), a 7x speedup over the 50-step teacher while improving VBench score from 84.0 (Wan2.2) to 91.2. This surpasses both the teacher and closed-source systems including Veo3 (~90) and Sora2 (~88) on automated benchmarks, with competitive results in human preference studies. We release all model weights, training code, synthetic data pipelines, and evaluation scripts to advance open research in video generation.
0
0
cs.GR 2026-04-28

Power Foam unifies ray tracing and rasterization

Power Foam: Unifying Real-Time Differentiable Ray Tracing and Rasterization

Bounded power diagrams keep constant-time ray traversal while matching 3DGS raster speeds in one differentiable model.

Figure from the paper full image
abstract click to expand
We introduce a differentiable 3D representation that unifies the ray tracing capabilities of foam-based ray tracing with the efficiency of modern rasterization pipelines. While prior foam representations enable constant-time ray traversal through an explicit volumetric partition of space, their potentially unbounded cells hinder efficient tile-based rasterization. We address this limitation by generalizing Voronoi foams to bounded power diagrams with controllable cell extents, enabling spatially bounded primitives without requiring expensive Delaunay triangulations during training. We further introduce an oriented surface formulation that explicitly models interfaces between interior and exterior regions, and decouple geometry from appearance by embedding differentiable texture directly on these surfaces. Together, these contributions yield a representation that preserves state-of-the-art ray tracing efficiency while achieving rasterization performance competitive with current generation 3DGS, providing a practical path toward unified real-time differentiable rendering.
0
0
cs.GR 2026-04-28

Single neural net predicts intersections for deforming geometry

Voxel Deformation-Aware Neural Intersection Function

Rays are mapped to a canonical rest space so one compact model works across all animation poses.

Figure from the paper full image
abstract click to expand
We extend the Locally-Subdivided Neural Intersection Function (LSNIF) to support parameterized deformable and animated geometry. Our approach introduces a rest-space and deformed-space formulation inspired by meshless rendering, allowing ray samples to be mapped back to a canonical space where a single neural network represents geometry consistently across poses without retraining. To maintain accuracy under deformation-aware training, we incorporate scale-invariant distance regression, uncertainty-weighted multi-task learning, and a hybrid positional-grid encoding. The resulting method preserves the compactness and efficiency of LSNIF while enabling robust neural intersection prediction for dynamic geometry.
0
0
cs.GR 2026-04-28

Hybrid workflow yields 25-30 billion triangle model of Maltese cathedral

Large-Scale Photogrammetric Documentation of St. John's Co-Cathedral: A Workflow for Cultural Heritage Preservation

Seven-night capture of DSLR, drone, and LIDAR data plus AI cleanup produces usable archive for preservation and research.

Figure from the paper full image
abstract click to expand
We present a comprehensive methodology for the large-scale photogrammetric documentation of St. John's Co-Cathedral in Valletta, Malta, a UNESCO World Heritage site renowned for its ornate Baroque architecture and Caravaggio masterpieces. Over seven nights of evening-only data collection, we captured 99,000 images using DSLR cameras, drone photography, and LIDAR scanning to create a highly detailed 3D reconstruction comprising 25-30 billion triangles. This paper documents our complete workflow for cultural heritage preservation, addressing the unique challenges of digitizing complex baroque architectural spaces with highly reflective metallic surfaces, dark materials, intricate tapestries, and restricted access. We detail our pipeline from multi-modal data acquisition through processing, including strategic image grading and AI-assisted denoising to address low-light grain, extensive LIDAR point cloud cleanup, hybrid photogrammetric reconstruction using RealityCapture, and mesh subdivision strategies for real-time visualization engines. Our methodology combines automated workflows with necessary manual intervention to handle the scale and complexity of the project, with particular attention to reflective surface challenges characteristic of baroque heritage sites. We also present preliminary experiments with Gaussian splatting as a complementary representation technique. The resulting digital archive serves multiple preservation purposes including disaster recovery documentation, conservation analysis, virtual tourism, and scholarly research. This work provides a detailed, replicable workflow for heritage professionals undertaking similar large-scale architectural documentation projects, addressing the practical challenges of applying photogrammetric methods in complex real-world heritage scenarios.
0
0
cs.GR 2026-04-28

Neural swaps lift analytical models closer to real reflectance data

Neural Enhancement of Analytical Appearance Models

Tiny MLPs replace key nodes inside existing BRDF graphs, gaining accuracy while staying compact and pipeline-ready.

Figure from the paper full image
abstract click to expand
Traditional analytical reflectance models, while compact and interpretable, lack the capacity to accurately represent physical measurements. Recent neural models, which closely fit input data, are less generalizable and often more expensive to store and evaluate. To combine the strengths and overcome the limitations of these two classes of models, we present neural enhancement, a novel framework to boost an input analytical appearance model, by identifying and replacing its key computational nodes/operators with small-scale multi-layer perceptrons. This allows us to leverage the computational graph structure of the original model, while improving its expressiveness at a modest cost. To make the enhancement computationally tractable, we propose a hypercube-based search to automatically and efficiently identify the node(s) and/or operator(s) to be replaced towards maximal gain in a differentiable fashion. We enhance a number of common analytical BRDF models. The results are, at once accurate, compact and efficient, and compare favorably with state-of-the-art work on fitting measured reflectance and bidirectional texture functions. Finally, our models are fully compatible with any standard rasterization or ray-tracing pipeline.
0
0
cs.GR 2026-04-27

Distilled muscle policies let hands play new piano music

MUSIC: Learning Muscle-Driven Dexterous Hand Control

A latent space built from reference motions allows high-level controllers to handle unseen scores with accurate bimanual key strikes.

Figure from the paper full image
abstract click to expand
We present a data-driven approach for physics-based, muscle-driven dexterous control that enables musculoskeletal hands to perform precise piano playing for novel pieces of music outside the reference dataset. Our approach combines high-frequency muscle-level control with low-frequency latent-space coordination in a hierarchical architecture. At the low level, general single-hand policies are trained via reinforcement learning to generate dynamic muscle-tendon activations while tracking trajectories from a large reference motion dataset. The resulting tracking policies are then distilled into variational autoencoder (VAE) models, yielding smooth and structured latent spaces that abstract away low-level muscle dynamics. For the high level, we train piece-specific policies to operate in this latent space, coordinating bimanual motions based on specific goals, denoted by note events extracted from given musical scores, to synthesize performances beyond the reference data. In addition, we present an enhanced musculoskeletal hand model that supports fine control of fingers for accurate low-level motion tracking and diverse high-level motion synthesis. We evaluate the control pipeline of our approach on a diverse piano repertoire spanning multiple musical styles and technical demands. Results demonstrate that our approach can synthesize coordinated bimanual motions with accurate key presses, and achieve the state-of-the-art performance of piano playing in physics-based dexterous control. We also show that our musculoskeletal hand model demonstrates superior biomechanical stability and tracking precision compared to the existing model, and validate that our musculoskeletal hand model and muscle-driven controller can generate physiologically plausible activation patterns that align with human electromyography (EMG) recordings.
0
0
cs.GR 2026-04-27

Primitives let text specify precise 3D shape edits

Prox-E: Fine-Grained 3D Shape Editing via Primitive-Based Abstractions

Abstracting objects into geometric primitives allows a vision-language model to direct localized changes in a generative model while keeping

Figure from the paper full image
abstract click to expand
Text-based 2D image editing models have recently reached an impressive level of maturity, motivating a growing body of work that heavily depends on these models to drive 3D edits. While effective for appearance-based modifications, such 2D-centric 3D editing pipelines often struggle with fine-grained 3D editing, where localized structural changes must be applied while strictly preserving an object's overall identity. To address this limitation, we propose Prox-E, a training-free framework that enables fine-grained 3D control through an explicit, primitive-based geometric abstraction. Our framework first abstracts an input 3D shape into a compact set of geometric primitives. A pretrained vision-language model (VLM) then edits this abstraction to specify primitive-level changes. These structural edits are subsequently used to guide a 3D generative model, enabling fine-grained, localized modifications while preserving unchanged regions of the original shape. Through extensive experiments, we demonstrate that our method consistently balances identity preservation, shape quality, and instruction fidelity more effectively than various existing approaches, including 2D-based 3D editors and training-based methods.
0
0
cs.GR 2026-04-27

Causal retrieval model personalizes facial animation from audio

Personalizing Causal Audio-Driven Facial Motion via Dynamic Multi-modal Retrieval

Dynamic multi-modal style lookup from unstructured references removes lookahead latency while improving sync and realism.

Figure from the paper full image
abstract click to expand
Audio-driven facial animation is essential for immersive digital interaction, yet existing frameworks fail to reconcile real-time streaming with high-fidelity personalization. Current methods often rely on latency-inducing audio look-ahead, or require high user compliance to pre-encode static embeddings that fails to capture dynamic idiosyncrasies. We present an end-to-end causal framework for personalizing causal facial motion generation via dynamic multi-modal style retrieval, enabling ultra-low latency while uniquely leveraging unstructured style references. We introduce two key innovations: (1) a temporal hierarchical motion representation that captures global temporal context and high-frequency details while maintaining decoding causality, and (2) a multi-modal style retriever that jointly queries audio and motion to dynamically extract stylistic priors without breaking causality. This mechanism allows for scalable personalization with total flexibility regarding the number and contents of templates. By integrating these components into a causal autoregressive architecture, our method significantly outperforms state-of-the-art approaches in lip-sync accuracy, identity consistency, and perceived realism, supported by extensive quantitative evaluations and user studies.
0
0
cs.GR 2026-04-27 Recognition

Taxonomy shows most 3D generators miss engine-ready assets

From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

Organizing methods by asset tiers and lifecycle stages exposes the gap between visual synthesis and production standards for games, AI, and

Figure from the paper full image
abstract click to expand
Three-dimensional content generation has progressed from producing isolated, visually plausible shapes to constructing structured assets that can be deployed in real-time interactive environments. This trajectory is driven by converging demands from game development, embodied AI, world simulation, digital twins, and spatial computing, all of which require 3D content that goes beyond surface appearance to satisfy engine-level constraints on topology, UV parameterization, physically based materials, skeletal rigging, and physics-aware scene layout. Despite rapid advances in generative modeling, a persistent gap separates the outputs of current methods from the production-ready standard expected by interactive applications. This survey addresses that gap by organizing the literature around the asset production pipeline rather than algorithmic families. Along the horizontal axis we distinguish three asset tiers, namely general objects, characters, and scenes, while the vertical axis traces each tier through the full production lifecycle from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR appearance, rigging, and scene assembly. Through this two-dimensional taxonomy we assess not only what current methods can generate but whether their outputs are directly usable in downstream engines and simulation platforms. We further consolidate evaluation metrics and protocols that span geometric fidelity, appearance quality, asset usability, and scene-level physical plausibility. The survey concludes by identifying open challenges in data quality, generation controllability, end-to-end assetization, and physically grounded generation, and by situating production-ready 3D content as foundational infrastructure for emerging interactive world models and embodied intelligent systems.
0
0
cs.GR 2026-04-27

3D generators rarely yield assets ready for interactive engines

From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation

Survey maps research by asset type and production stage to show where outputs still fail engine constraints on topology and rigging.

Figure from the paper full image
abstract click to expand
Three-dimensional content generation has progressed from producing isolated, visually plausible shapes to constructing structured assets that can be deployed in real-time interactive environments. This trajectory is driven by converging demands from game development, embodied AI, world simulation, digital twins, and spatial computing, all of which require 3D content that goes beyond surface appearance to satisfy engine-level constraints on topology, UV parameterization, physically based materials, skeletal rigging, and physics-aware scene layout. Despite rapid advances in generative modeling, a persistent gap separates the outputs of current methods from the production-ready standard expected by interactive applications. This survey addresses that gap by organizing the literature around the asset production pipeline rather than algorithmic families. Along the horizontal axis we distinguish three asset tiers, namely general objects, characters, and scenes, while the vertical axis traces each tier through the full production lifecycle from data foundations and geometry synthesis through topology optimization, UV unwrapping, PBR appearance, rigging, and scene assembly. Through this two-dimensional taxonomy we assess not only what current methods can generate but whether their outputs are directly usable in downstream engines and simulation platforms. We further consolidate evaluation metrics and protocols that span geometric fidelity, appearance quality, asset usability, and scene-level physical plausibility. The survey concludes by identifying open challenges in data quality, generation controllability, end-to-end assetization, and physically grounded generation, and by situating production-ready 3D content as foundational infrastructure for emerging interactive world models and embodied intelligent systems.
0
0
cs.GR 2026-04-27

Weighted control points localize fairing of curves and surfaces

Progressive-Iterative Fairing of Curves and Surfaces with Localized Control Point Adjustment

Progressive iterations with auto-selected points enable global smoothing and precise local tweaks without manual input.

abstract click to expand
Curve and surface fairing is crucial in computer-aided geometric design, influencing product quality, physical performance, and aesthetics. Traditional methods often apply global modifications, lacking fine-grained control. This paper introduces a novel progressive-iterative fairing method based on control point adjustment. By assigning independent weights to each control point, our approach enables precise, localized shape adjustments. The method functions both globally and locally, allowing for comprehensive shape fairing and fine control over the fairing effect. Furthermore, this paper provides an automatic control point selection method to adjust shapes, thereby eliminating the reliance on manual interaction. Numerical experiments demonstrate the efficiency and effectiveness of our approach.
0
0
cs.GR 2026-04-27

Tetrahedral SDF rasterization enables end-to-end mesh reconstruction

Distance Field Rasterization for End-to-End Mesh Reconstruction

Optimizing distances over a Delaunay grid and compositing tetrahedra yields complete surfaces directly, using less memory than prior methods

Figure from the paper full image
abstract click to expand
Rasterization based methods have recently enabled high-quality novel view synthesis at real-time rates, but their underlying volumetric primitives do not expose a direct, globally consistent surface representation, leaving sur face extraction to heuristic post-processing. In contrast, implicit signed dis tance field (SDF) methods provide well-defined surfaces but are typically optimized with computationally expensive ray marching. We propose SD FRaster, a rasterizable SDF representation that bridges this gap by combin ing the efficiency of rasterization with signed distance field for end-to-end mesh reconstruction. Starting from a Delaunay tetrahedralization, we op timize a continuous SDF over a tetrahedral grid and render it efficiently by rasterizing tetrahedra and alpha-compositing their contributions. We further integrate differentiable Marching Tetrahedra into the optimization loop, enablingend-to-endmeshreconstructionwithoutpost-processingmesh extraction. Experiments on DTU and Tanks and Temples demonstrate that SDFRaster achieves higher-quality and more complete surface reconstruc tions with lower storage cost than state-of-the-art approaches. Project page: https://ustc3dv.github.io/SDFRaster/
0
0
cs.GR 2026-04-27

Proxy spheres allow tighter BVHs for varied-radius particle collisions

Rethinking Collision Detection on GPU Ray Tracing Architecture

Mochi reformulates DCD on GPU RT hardware to handle non-uniform spheres efficiently while guaranteeing all contacts are found.

Figure from the paper full image
abstract click to expand
Discrete Collision Detection (DCD) is a fundamental task in several domains including particle-based physics simulations. Efficient DCD uses indexing structures such as Bounding Volume Hierarchy (BVH), but accelerating irregular BVH traversals demands meticulous efforts to achieve performance. Modern GPUs feature Ray Tracing (RT) architecture that provides hardware acceleration for BVH traversal and optimized drivers for BVH construction. Recent work has attempted to exploit RT architecture to accelerate DCD on spherical particles by reducing DCD to fixed-radius neighbor search. However, this reduction breaks down for particles with different radii, necessitating the use of large bounding boxes that result in a higher number of duplicate collisions and poor performance. To address these limitations, we present Mochi, a new reduction that reformulates DCD on RT architecture by exploiting the symmetry of collision relations to support both uniform and non-uniform spherical particles efficiently. Mochi introduces per-object proxy spheres that decouple BVH bounding volumes from the collision search radius, enabling significantly tighter bounding boxes without sacrificing correctness. Mochi is provably sound and guarantees that all true collisions are detected. We integrate Mochi into an end-to-end particle simulation pipeline and evaluate it across large-scale particle workloads, showing consistent speedups over state-of-the-art BVH-based and RT-based DCD implementations. Mochi generalizes prior RT-based neighbor search formulations while avoiding their fundamental limitations for non-uniform spheres.
0
0
cs.GR 2026-04-24

CUDA rasterizer outperforms Vulkan on 100M+ triangle meshes

CuRast: Cuda-Based Software Rasterization for Billions of Triangles

Three-stage compute pipeline skips acceleration structures for 2-12x speedups on dense photogrammetry models.

Figure from the paper full image
abstract click to expand
Previous work shows that small triangles can be rasterized efficiently with compute shaders. Building on this insight, we explore how far this can be pushed for massive triangle datasets without the need to construct acceleration structures in advance. Method: A 3-stage rasterization pipeline first rasterizes small triangles directly in stage 1, using atomicMin to store the closest fragments. Larger triangles are forwarded to stages 2 and 3. Results: CuRast can render models with hundreds of millions of triangles up to 2-5x (unique) or up to 12x (instanced) faster than Vulkan. Vulkan remains an order of magnitude faster for low-poly meshes. Limitations: We currently focus on dense, opaque meshes that you would typically obtain from photogrammetry/3D reconstruction. Blending/Transparency is not yet supported, and scenes with thousands of low-poly meshes are not implemented efficiently. Future Work: To make it suitable for games and a wider range of use cases, future work will need to (1) optimize handling of scenes with tens of thousands of nodes/meshes, (2) add support for hierarchical clustered LODs such as those produced by Meshoptimizer, (3) add support for transparency, likely in its own stage so as to keep opaque rasterization untouched and fast. Source Code: https://github.com/m-schuetz/CuRast
1 0
0
cs.GR 2026-04-24

Calibrated encoders track face identity across artistic styles

StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition

Psychometric curves from same-different and 2AFC human experiments align model similarities with perception, improving robustness on stylzed

Figure from the paper full image
abstract click to expand
Creative face stylization aims to render portraits in diverse visual idioms such as cartoons, sketches, and paintings while retaining recognizable identity. However, current identity encoders, which are typically trained and calibrated on natural photographs, exhibit severe brittleness under stylization. They often mistake changes in texture or color palette for identity drift or fail to detect geometric exaggerations. This reveals the lack of a style-agnostic framework to evaluate and supervise identity consistency across varying styles and strengths. To address this gap, we introduce StyleID, a human perception-aware dataset and evaluation framework for facial identity under stylization. StyleID comprises two datasets: (i) StyleBench-H, a benchmark that captures human same-different verification judgments across diffusion- and flow-matching-based stylization at multiple style strengths, and (ii) StyleBench-S, a supervision set derived from psychometric recognition-strength curves obtained through controlled two-alternative forced-choice (2AFC) experiments. Leveraging StyleBench-S, we fine-tune existing semantic encoders to align their similarity orderings with human perception across styles and strengths. Experiments demonstrate that our calibrated models yield significantly higher correlation with human judgments and enhanced robustness for out-of-domain, artist drawn portraits. All of our datasets, code, and pretrained models are publicly available at https://kwanyun.github.io/StyleID_page/
0
0
cs.GR 2026-04-23

Framework adds animator control to skeleton generation on complex models

Animator-Centric Skeleton Generation on Objects with Fine-Grained Details

Semantic tokenization and density module on a large rigged dataset meet quality and controllability needs for detailed 3D objects.

Figure from the paper full image
abstract click to expand
Skeleton generation is essential for animating 3D assets, but current deep learning methods remain limited: they cannot handle the growing structural complexity of modern models and offer minimal controllability, creating a major bottleneck for real-world animation workflows. To address this, we propose an animator-centric SG framework that achieves high-quality skeleton prediction on complex inputs while providing intuitive control handles. Our contributions are threefold. First, we curate a large-scale dataset of 82,633 rigged meshes with diverse and complicated structures. Second, we introduce a novel semantic-aware tokenization scheme for auto-regressive modeling. This scheme effectively complements purely geometric prior methods by subdividing bones into semantically meaningful groups, thereby enhancing robustness to structural complexity and enabling a key control mechanism. Third, we design a learnable density interval module that allows animators to exert soft, direct control over bone density. Extensive experiments demonstrate that our framework not only generates high-quality skeletons for challenging inputs but also successfully fulfills two critical requirements from professional animators.
0
0
cs.GR 2026-04-23

Iteration lets Monte Carlo solvers handle nonlinear radiation boundaries

Monte Carlo PDE Solvers for Nonlinear Radiative Boundary Conditions

A relaxed Picard loop converges to accurate solutions for radiative heat problems where linear approximations lose fidelity.

Figure from the paper full image
abstract click to expand
Monte Carlo PDE solvers have become increasingly popular for solving heat-related partial differential equations in geometry processing and computer graphics due to their robustness in handling complex geometries. While existing methods can handle Dirichlet, Neumann, and linear Robin boundary conditions, nonlinear boundary conditions arising from thermal radiation remain largely unexplored. In this paper, we introduce a Picard-style fixed-point iteration framework that enables Monte Carlo PDE solvers to handle nonlinear radiative boundary conditions. While strict theoretical convergence is not generally guaranteed, our method remains stable and empirically convergent with a properly chosen relaxation coefficient. Even with imprecise initial boundary estimates, it progressively approaches the correct solution. Compared to standard linearization strategies, the proposed approach achieves significantly higher accuracy. To further address the high variance inherent in Monte Carlo estimators, we propose a heteroscedastic regression-based denoising technique specifically designed for on-boundary solution estimates, filling a gap left by prior variance reduction methods that focus solely on interior points. We validate our approach through extensive evaluations on synthetic benchmarks and demonstrate its effectiveness on practical heat radiation simulations with complex geometries.
0
0
cs.GR 2026-04-22

Incremental Woodbury updates speed contact sims up to 5.66x

An Efficient Multilevel Preconditioned Nonlinear Conjugate Gradient Method for Incremental Potential Contact

MAS-PNCG refreshes hierarchical preconditioners for evolving contacts at near-zero cost and beats Newton-PCG baselines without losing the no

Figure from the paper full image
abstract click to expand
Incremental Potential Contact (IPC) guarantees intersection-free simulation but suffers from high computational costs due to the expensive Hessian assembly and linear solves required by Newton's method. While Preconditioned Nonlinear Conjugate Gradient (PNCG) avoids Hessian assembly, it has historically struggled with poor convergence in stiff, contact-rich scenarios due to the lack of effective preconditioners; simple Jacobi preconditioners fail to capture the global coupling, while advanced hierarchy-based preconditioners like Multilevel Additive Schwarz (MAS) are computationally prohibitive to rebuild at every nonlinear iteration. We present MAS-PNCG, a method that unlocks the power of hierarchical preconditioning for nonlinear optimization. Our key technical innovation is a Sparse-Input Woodbury update algorithm that incrementally adapts the fine-level MAS components to rapidly evolving contact sets. This bypasses the need for full preconditioner rebuilds, reducing maintenance cost to near-zero while capturing the complex spectral properties of the contact system. Furthermore, we replace heuristic PNCG search directions with a Hessian-aware 2D subspace minimization that optimally combines the preconditioned gradient and previous direction. We also apply a fast per-subdomain conservative CCD method that ensures penetration-free trajectories while avoiding overly restrictive global step sizes. Experiments demonstrate that our MAS-PNCG outperforms state-of-the-art Newton-PCG solvers, GIPC and StiffGIPC, both preconditioned with MAS up to 5.66$\times$ and 2.07$\times$ respectively.
0
0
cs.GR 2026-04-22 Recognition

Superpower contours reconstruct surfaces from unsigned distances

SpUDD: Superpower Contouring of Unsigned Distance Data

Power diagrams of discrete samples define a converging proxy that yields accurate meshes without needing signs or gradients.

Figure from the paper full image
abstract click to expand
Unsigned distance functions offer a powerful and flexible implicit surface representation that, unlike their signed counterparts, allow for surfaces that are open, non-orientable, or non-manifold. We consider the problem of reconstructing arbitrary surfaces from a finite set of samples of unsigned distance data. Existing methods for mesh reconstruction from distance data rely on sign information, accurate gradients, a corresponding continuous distance function, or extensive data-dependent training. However, they fail when applied to input that is both discrete and unsigned. Inspired by this challenge, we study the power diagram generated by the distance samples and propose a novel theoretical concept, the superpower contour, which we prove converges to the true surface in the limit of sampling density. We use this superpower contour as an initial surface proxy and design an algorithm that leverages it to produce a polygonal mesh approximating the unknown true geometry. Our method vastly outperforms other conceivable strategies for the discrete unsigned distance reconstruction task, and sets the stage for future work on this mathematically rich problem.
0
0
cs.GR 2026-04-22

Superpower contour from unsigned distances yields surface meshes

SpUDD: Superpower Contouring of Unsigned Distance Data

Power diagrams of discrete unsigned samples define a proxy that converges to the true geometry and seeds faithful mesh reconstruction.

Figure from the paper full image
abstract click to expand
Unsigned distance functions offer a powerful and flexible implicit surface representation that, unlike their signed counterparts, allow for surfaces that are open, non-orientable, or non-manifold. We consider the problem of reconstructing arbitrary surfaces from a finite set of samples of unsigned distance data. Existing methods for mesh reconstruction from distance data rely on sign information, accurate gradients, a corresponding continuous distance function, or extensive data-dependent training. However, they fail when applied to input that is both discrete and unsigned. Inspired by this challenge, we study the power diagram generated by the distance samples and propose a novel theoretical concept, the superpower contour, which we prove converges to the true surface in the limit of sampling density. We use this superpower contour as an initial surface proxy and design an algorithm that leverages it to produce a polygonal mesh approximating the unknown true geometry. Our method vastly outperforms other conceivable strategies for the discrete unsigned distance reconstruction task, and sets the stage for future work on this mathematically rich problem.
0
0
cs.GR 2026-04-22

Reproduction rules extend the arrowhead curve to higher dimensions

Stitching Arrowhead Curves: Extending the Sierpinski Arrowhead Curve to Higher Dimensions

The method enables consistent fractal visualizations usable in applications like sweater patterns.

Figure from the paper full image
abstract click to expand
The Sierpinski triangle and the Sierpinski arrowhead curve are both defined in dimension 2 and can be used to model the same fractal. While a natural extension of the triangular construction to arbitrary dimensions exists, an analogous extension of the curve representation does not. In this article, we analyze the properties of the two-dimensional Sierpinski arrowhead curve to formulate an extension to arbitrary dimensions based on reproduction rules. Building on this formulation, we demonstrate a way to visualize such curves in a comparative manner across levels. Finally, as geometric patterns have a long history in the arts, and especially in fashion, we exemplify this visualization approach in knitwear, specifically in the yoke of a sweater.
0
0
cs.GR 2026-04-22

System turns sketches into real-time 3D head models

SketchFaceGS: Real-Time Sketch-Driven Face Editing and Generation with Gaussian Splatting

A feed-forward network infers consistent Gaussian structures from 2D lines for instant editing and free-viewpoint viewing.

Figure from the paper full image
abstract click to expand
3D Gaussian representations have emerged as a powerful paradigm for digital head modeling, achieving photorealistic quality with real-time rendering. However, intuitive and interactive creation or editing of 3D Gaussian head models remains challenging. Although 2D sketches provide an ideal interaction modality for fast, intuitive conceptual design, they are sparse, depth-ambiguous, and lack high-frequency appearance cues, making it difficult to infer dense, geometrically consistent 3D Gaussian structures from strokes - especially under real-time constraints. To address these challenges, we propose SketchFaceGS, the first sketch-driven framework for real-time generation and editing of photorealistic 3D Gaussian head models from 2D sketches. Our method uses a feed-forward, coarse-to-fine architecture. A Transformer-based UV feature-prediction module first reconstructs a coarse but geometrically consistent UV feature map from the input sketch, and then a 3D UV feature enhancement module refines it with high-frequency, photorealistic detail to produce a high-fidelity 3D head. For editing, we introduce a UV Mask Fusion technique combined with a layer-by-layer feature-fusion strategy, enabling precise, real-time, free-viewpoint modifications. Extensive experiments show that SketchFaceGS outperforms existing methods in both generation fidelity and editing flexibility, producing high-quality, editable 3D heads from sketches in a single forward pass.
0
0
cs.GR 2026-04-22

New Python tool turns SUMO data into 3D traffic videos

sumo3Dviz: A three dimensional traffic visualisation

Lightweight package creates external and driver views from standard outputs, supporting psychology and acceptance studies.

Figure from the paper full image
abstract click to expand
Traffic microsimulation software such as SUMO generate rich spatio-temporal data describing individual vehicle movements, interactions, and support the development of control strategies. While numerical outputs and 2D visualisations are sufficient for many technical analyses, they are often inadequate for applications that require intuitive interpretation, effective communication, or human-centred evaluation. In particular, user studies in mobility psychology, acceptance research, and virtual experience stated-preference experiments require realistic visualisations that reflect how traffic scenarios are perceived from a human perspective. This paper introduces sumo3Dviz, a lightweight, open-source 3D visualisation pipeline for SUMO traffic simulations. It converts standard SUMO simulation outputs, such as vehicle trajectories and signal states, into high-quality 3D renderings using a Python-based framework. In contrast to heavyweight game-engine-based approaches or tightly coupled co-simulation frameworks, sumo3Dviz is designed to be simple, scriptable, and reproducible. The tool is installable through the pip package manager, runs across operating systems, and works independently of any proprietary software or licenses. sumo3Dviz supports both external camera views and first-person perspectives, enabling cinematic overviews as well as driver-level experiences. The rendering process is optimized for batch video generation, making it suitable for large-scale scenario visualisation, educational demonstrations, and automated experiment pipelines. A key technical challenge addressed by the tool is trajectory interpolation and orientation smoothing, enabling visually coherent motion from discrete simulation outputs. Source Code on project's GitHub page: https://github.com/DerKevinRiehl/sumo3dviz/.
0
0
cs.GR 2026-04-22

JSON from panoramic views lets LLMs give NPCs spatial awareness

Empowering NPC Dialogue with Environmental Context Using LLMs and Panoramic Images

Semantic segmentation of surroundings provides context so characters reference objects naturally, preferred by users in tests

Figure from the paper full image
abstract click to expand
We present an approach for enhancing non-playable characters (NPCs) in games by combining large language models (LLMs) with computer vision to provide contextual awareness of their surroundings. Conventional NPCs typically rely on pre-scripted dialogue and lack spatial understanding, which limits their responsiveness to player actions and reduces overall immersion. Our method addresses these limitations by capturing panoramic images of an NPC's environment and applying semantic segmentation to identify objects and their spatial positions. The extracted information is used to generate a structured JSON representation of the environment, combining object locations derived from segmentation with additional scene graph data within the NPC's bounding sphere, encoded as directional vectors. This representation is provided as input to the LLM, enabling NPCs to incorporate spatial knowledge into player interactions. As a result, NPCs can dynamically reference nearby objects, landmarks, and environmental features, leading to more believable and engaging gameplay. We describe the technical implementation of the system and evaluate it in two stages. First, an expert interview was conducted to gather feedback and identify areas for improvement. After integrating these refinements, a user study was performed, showing that participants preferred the context-aware NPCs over a non-context-aware baseline, confirming the effectiveness of the proposed approach.
0
0
cs.GR 2026-04-22

Transport mapping fills more UV slots in Gaussian Splatting

OT-UVGS: Revisiting UV Mapping for Gaussian Splatting as a Capacity Allocation Problem

A separable assignment that respects global distribution raises PSNR and cuts wasted capacity on the same fixed budget.

Figure from the paper full image
abstract click to expand
UV-parameterized Gaussian Splatting (UVGS) maps an unstructured set of 3D Gaussians to a regular UV tensor, enabling compact storage and explicit control of representation capacity. Existing UVGS, however, uses a deterministic spherical pro- jection to assign Gaussians to UV locations. Because this mapping ignores the global Gaussian distribution, it often leaves many UV slots empty while causing frequent collisions in dense regions. We reinterpret UV mapping as a capacity-allocation problem under a fixed UV budget and propose OT-UVGS, a lightweight, separable one-dimensional optimal-transport-inspired mapping that globally couples assignments while preserving the original UVGS representation. The method is implemented with rank-based sorting, has O(N log N) complexity for N Gaussians, and can be used as a drop-in replacement for spherical UVGS. Across 184 object-centric scenes and the Mip-NeRF dataset, OT-UVGS consistently improves peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS) under the same UV resolution and per-slot capacity (K=1). These gains are accompanied by substantially better UV utilization, including higher non-empty slot ratios, fewer collisions, and higher Gaussian retention. Our results show that revisiting the mapping alone can unlock a significant fraction of the latent capacity of UVGS.
1 0
0
cs.GR 2026-04-17

Dividing space and truncating displacements prevents penetrations

Divide and Truncate: A Penetration and Inversion Free Framework for Coupled Multi-physics Systems

A material- and solver-agnostic method couples rigid bodies, soft bodies, shells and rods by post-processing any optimizer for stable multi-

Figure from the paper full image
abstract click to expand
We present Divide and Truncate (DAT), a unified framework for coupling multi-physics systems through penetration-free collision handling, including rigid bodies, volumetric soft bodies, thin shells, rods, and animated objects. By partitioning the ambient space into exclusive regions and truncating displacements to remain within them, DAT guarantees penetration-free contact resolution. Our \emph{Planar-DAT} variant further refines this by restricting only motion toward nearby surfaces, leaving tangential movement unconstrained, which addresses the artificial damping and deadlock problems of previous works. The framework is material-agnostic: each object responds to contact without knowledge of the opposing body's physics. Our method is also solver-agnostic; it can be integrated seamlessly with any iterative optimizer as a post-processing step, enabling robust simulation of complex multi-body interactions.
0
0
cs.GR 2026-04-17

B-Rep topology alone yields stable part partitions for any CAD mesh

STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing

Merging same-type faces under tangent continuity produces instance labels that transfer unchanged across tessellations, supplying consistent

Figure from the paper full image
abstract click to expand
Many CAD learning pipelines discretize Boundary Representations (B-Reps) into triangle meshes, discarding analytic surface structure and topological adjacency and thereby weakening consistent instance-level analysis. We present STEP-Parts, a deterministic CAD-to-supervision toolchain that extracts geometric instance partitions directly from raw STEP B-Reps and transfers them to tessellated carriers through retained source-face correspondence, yielding instance labels and metadata for downstream learning and evaluation. The construction merges adjacent B-Rep faces only when they share the same analytic primitive type and satisfy a near-tangent continuity criterion. On ABC, same-primitive dihedral angles are strongly bimodal, yielding a threshold-insensitive low-angle regime for part extraction. Because the partition is defined on intrinsic B-Rep topology rather than on a particular triangulation, the resulting boundaries remain stable under changes in tessellation. Applied to the DeepCAD subset of ABC, the pipeline processes approximately 180{,}000 models in under six hours on a consumer CPU. We release code and precomputed labels, and show that STEP-Parts serves both as a tessellation-robust geometric reference and as a useful supervision source in two downstream probes: an implicit reconstruction--segmentation network and a dataset-level point-based backbone.
0
0
cs.GR 2026-04-16

Dual greedy method simplifies convex hulls to any size in O(n log n)

Progressive Convex Hull Simplification

Minimizes added volume or area while guaranteeing the result always contains the original shape for safe collision and distance queries.

Figure from the paper full image
abstract click to expand
Convex hulls are useful as tight bounding proxies for a variety of tasks including collision detection, ray intersection, and distance computation. Unfortunately, the complexity of polyhedral convex hulls grows linearly with their input. We consider the problem of conservatively simplifying a convex hull to a specified number of half-spaces while minimizing added volume or surface area. By working in the dual representation, we propose an efficient $O(n \log n)$ greedy optimization. In comparisons, we show that existing methods either exhibit poor efficiency, tightness or safety. We demonstrate the success of our method on a variety of input shapes and downstream application domains.
0
0
cs.GR 2026-04-16

Adaptive depth map accelerates neural implicit surface sampling

SAND: Spatially Adaptive Network Depth for Fast Sampling of Neural Implicit Surfaces

A depth map tells the network how many layers to evaluate at each point, saving work far from or on simple parts of the surface.

Figure from the paper full image
abstract click to expand
Implicit neural representations are powerful for geometric modeling, but their practical use is often limited by the high computational cost of network evaluations. We observe that implicit representations require progressively lower accuracy as query points move farther from the target surface, and that even within the same iso-surface, representation difficulty varies spatially with local geometric complexity. However, conventional neural implicit models evaluate all query points with the same network depth and computational cost, ignoring this spatial variation and thereby incurring substantial computational waste. Motivated by this observation, we propose an efficient neural implicit geometry representation framework with spatially adaptive network depth (SAND). SAND leverages a volumetric network-depth map together with a tailed multi-layer perceptron (T-MLP) to model implicit representation. The volumetric depth map records, for each spatial region, the network depth required to achieve sufficient accuracy, while the T-MLP is a modified MLP designed to learn implicit functions such as signed distance functions, where an output branch, referred to as a tail, is attached to each hidden layer. This design allows network evaluation to terminate adaptively without traversing the full network and directs computational resources to geometrically important and complex regions, improving efficiency while preserving high-fidelity representations. Extensive experimental results demonstrate that our approach can significantly improve the inference-time query speed of implicit neural representations.
0
0
cs.GR 2026-04-16

One flow model unifies motion generation

A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting

Rectified flow jointly conditioned on text and skeletons performs all three tasks zero-shot by switching which signal is modulated at each,

Figure from the paper full image
abstract click to expand
Text-driven motion editing and intra-structural retargeting, where source and target share topology but may differ in bone lengths, are traditionally handled by fragmented pipelines with incompatible inputs and representations: editing relies on specialized generative steering, while retargeting is deferred to geometric post-processing. We present a unifying perspective where both tasks are cast as instances of conditional transport within a single generative framework. By leveraging recent advances in flow matching, we demonstrate that editing and retargeting are fundamentally the same generative task, distinguished only by which conditioning signal, semantic or structural, is modulated during inference. We implement this vision via a rectified-flow motion model jointly conditioned on text prompts and target skeletal structures. Our architecture extends a DiT-style transformer with per-joint tokenization and explicit joint self-attention to strictly enforce kinematic dependencies, while a multi-condition classifier-free guidance strategy balances text adherence with skeletal conformity. Experiments on SnapMoGen and a multi-character Mixamo subset show that a single trained model supports text-to-motion generation, zero-shot editing, and zero-shot intra-structural retargeting. This unified approach simplifies deployment and improves structural consistency compared to task-specific baselines.
0
0
cs.GR 2026-04-15

Conformal abstention cuts TCR-pMHC errors from 18.7% to 10.9% at 80% coverage

Calibrated Abstention for Reliable TCR--pMHC Binding Prediction under Epitope Shift

Temperature scaling plus finite-sample coverage rules keep predictions reliable when epitopes are unseen during training.

abstract click to expand
Predicting T-cell receptor (TCR)--peptide-MHC (pMHC) binding is central to vaccine design and T-cell therapy, yet deployed models frequently encounter epitopes unseen during training, causing silent overconfidence and unreliable prioritization. We address this by framing TCR--pMHC prediction as a \emph{selective prediction} problem: a calibrated model should either output a trustworthy confidence score or explicitly abstain. Concretely, we (1) introduce a dual-encoder architecture encoding both CDR3$\alpha$/CDR3$\beta$ and peptide sequences via a pre-trained protein language model; (2) apply temperature scaling to correct systematic probability miscalibration; and (3) impose a conformal abstention rule that provides finite-sample coverage guarantees at a user-specified target error rate. Evaluated under three split strategies -- random, epitope-held-out, and distance-aware -- our method achieves AUROC 0.813 and ECE 0.043 under the challenging epitope-held-out protocol, reducing ECE by 69.7\% relative to an uncalibrated baseline. At 80\% coverage, the selective model further reduces error rate from 18.7\% to 10.9\%, demonstrating that calibrated abstention enables principled coverage-risk trade-offs aligned with practical screening budgets.
0
0
cs.GR 2026-04-15

Parallel voxelizer plus SGGX clustering improves microgeometry LoD accuracy

Fast Voxelization and Level of Detail for Microgeometry Rendering

The method builds multi-scale voxel data quickly and aggregates anisotropic scattering better than prior level-of-detail techniques for path

abstract click to expand
Many materials show anisotropic light scattering patterns due to the shape and local alignment of their underlying micro structures: surfaces with small elements such as fibers, or the ridges of a brushed metal, are very sparse and require a high spatial resolution to be properly represented as a volume. The acquisition of voxel data from such objects is a time and memory-intensive task, and most rendering approaches require an additional Level-of-Detail (LoD) data structure to aggregate the visual appearance, as observed from multiple distances, in order to reduce the number of samples computed per pixel (E.g.: MIP mapping). In this work we introduce first, an efficient parallel voxelization method designed to facilitate fast data aggregation at multiple resolution levels, and second, a novel representation based on hierarchical SGGX clustering that provides better accuracy than baseline methods. We validate our approach with a CUDA-based implementation of the voxelizer, tested both on triangle meshes and volumetric fabrics modeled with explicit fibers. Finally, we show the results generated with a path tracer based on the proposed LoD rendering model.
0
0
cs.GR 2026-04-15

Neural compression trims lightmap storage for dynamic lighting

Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting Environments

Feature maps and lightweight networks replace full sets of lightmaps while keeping high-quality global illumination and modest runtime cost.

Figure from the paper full image
abstract click to expand
High-quality global illumination (GI) in real-time rendering is commonly achieved using precomputed lighting techniques, with lightmap as the standard choice. To support GI for static objects in dynamic lighting environments, multiple lightmaps at different lighting conditions need to be precomputed, which incurs substantial storage and memory overhead. To overcome this limitation, we propose Neural Dynamic GI (NDGI), a novel compression technique specifically designed for temporal lightmap sets. Our method utilizes multi-dimensional feature maps and lightweight neural networks to integrate the temporal information instead of storing multiple sets explicitly, which significantly reduces the storage size of lightmaps. Additionally, we introduce a block compression (BC) simulation strategy during the training process, which enables BC compression on the final generated feature maps and further improves the compression ratio. To enable efficient real-time decompression, we also integrate a virtual texturing (VT) system with our neural representation. Compared with prior methods, our approach achieves high-quality dynamic GI while maintaining remarkably low storage and memory requirements, with only modest real-time decompression overhead. To facilitate further research in this direction, we will release our temporal lightmap dataset precomputed in multiple scenes featuring diverse temporal variations.
0
0
cs.GR 2026-04-15 2 theorems

Neural networks compress temporal lightmaps for dynamic GI

Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting Environments

Multi-dimensional feature maps and lightweight decoders replace multiple precomputed maps, cutting storage while supporting real-time global

Figure from the paper full image
abstract click to expand
High-quality global illumination (GI) in real-time rendering is commonly achieved using precomputed lighting techniques, with lightmap as the standard choice. To support GI for static objects in dynamic lighting environments, multiple lightmaps at different lighting conditions need to be precomputed, which incurs substantial storage and memory overhead. To overcome this limitation, we propose Neural Dynamic GI (NDGI), a novel compression technique specifically designed for temporal lightmap sets. Our method utilizes multi-dimensional feature maps and lightweight neural networks to integrate the temporal information instead of storing multiple sets explicitly, which significantly reduces the storage size of lightmaps. Additionally, we introduce a block compression (BC) simulation strategy during the training process, which enables BC compression on the final generated feature maps and further improves the compression ratio. To enable efficient real-time decompression, we also integrate a virtual texturing (VT) system with our neural representation. Compared with prior methods, our approach achieves high-quality dynamic GI while maintaining remarkably low storage and memory requirements, with only modest real-time decompression overhead. To facilitate further research in this direction, we will release our temporal lightmap dataset precomputed in multiple scenes featuring diverse temporal variations.
0
0
cs.GR 2026-04-15

Transformer converts volumes to Gaussians in one forward pass

VVGT: Visual Volume-Grounded Transformer

VVGT removes per-scene optimization and surface assumptions while delivering interactive speeds and zero-shot performance on new data.

Figure from the paper full image
abstract click to expand
Volumetric visualization has long been dominated by Direct Volume Rendering (DVR), which operates on dense voxel grids and suffers from limited scalability as resolution and interactivity demands increase. Recent advances in 3D Gaussian Splatting (3DGS) offer a representation-centric alternative; however, existing volumetric extensions still depend on costly per-scene optimization, limiting scalability and interactivity. We present VVGT (Visual Volume-Grounded Transformer), a feed-forward, representation-first framework that directly maps volumetric data to a 3D Gaussian Splatting representation, advancing a new paradigm for volumetric visualization beyond DVR. Unlike prior feed-forward 3DGS methods designed for surface-centric reconstruction, VVGT explicitly accounts for volumetric rendering, where each pixel aggregates contributions along a ray. VVGT employs a dual-transformer network and introduces Volume Geometry Forcing, an epipolar cross-attention mechanism that integrates multi-view observations into distributed 3D Gaussian primitives without surface assumptions. This design eliminates per-scene optimization while enabling accurate volumetric representations. Extensive experiments show that VVGT achieves high-quality visualization with orders-of-magnitude faster conversion, improved geometric consistency, and strong zero-shot generalization across diverse datasets, enabling truly interactive and scalable volumetric visualization. The code will be publicly released upon acceptance.
0
0
cs.GR 2026-04-14 2 theorems

Integer twists on edges turn meshes into linked knot structures

Twisted Edges: A Unified Framework for Designing Linked Knot (LK) Structures Using Labeled Non-Manifold Surface Meshes

Generalizing from binary to arbitrary labels on non-manifold surfaces allows partial connectivity, hinges, and correspondence to 4D knotted

Figure from the paper full image
abstract click to expand
We present Twisted Edges, a unified framework for designing Linked Knot (LK) structures using labeled non-manifold surface meshes. While the concept of edge twists, originating in topological graph theory, is foundational to these designs, prior approaches have been strictly limited to binary states. We identify this restriction as a critical barrier; binary twisting fails to capture the full spectrum of topological possibilities, rendering a vast class of structural and dynamic behaviors inaccessible. To overcome this limitation, we generalize the twist formulation to support arbitrary integer twist labels. This expansion reveals that while zero twists may introduce disconnections, applying even twists to 2-manifold meshes robustly preserves connectivity, transforming surfaces into fully connected, chainmail-like structures where faces form consistently linked cycles. Furthermore, we extend this framework to non-manifold meshes, where specific integer assignments prevent cycle merging. This capability, unattainable with binary methods, enables the design of partial connectivity and functional hinges, supporting dynamic folding and articulation. Theoretically, we show that these integer-twisted meshes correspond to knotted surfaces in four dimensions, with LK structures arising as their immersions into $\mathbb{R}^3$. By breaking the binary constraint, this work establishes a coherent paradigm for the systematic exploration of previously unstudied woven and articulated structures.
0
0
cs.GR 2026-04-14

Multi-modal LLM predicts MOOC satisfaction more accurately than text alone

Predicting User Satisfaction in Online Education Platforms: A Large Language Model Based Multi-Modal Review Mining Framework

Framework combines topic structures, deep sentiment, and activity logs to forecast learner satisfaction on public course platforms.

abstract click to expand
Online education platforms have experienced explosive growth over the past decade, generating massive volumes of user-generated content in the form of reviews, ratings, and behavioral logs. These heterogeneous signals provide unprecedented opportunities for understanding learner satisfaction, which is a critical determinant of course retention, engagement, and long-term learning outcomes. However, accurately predicting satisfaction remains challenging due to the short length, noise, contextual dependency, and multi-dimensional nature of online reviews. In this paper, we propose a unified \textbf{Large Language Model (LLM)-based multi-modal framework} for predicting both platform-level and course-level learner satisfaction. The proposed framework integrates three complementary information sources: (1) short-text topic distributions that capture latent thematic structures, (2) contextualized sentiment representations learned from pretrained Transformer-based language models, and (3) behavioral interaction features derived from learner activity logs. These heterogeneous representations are fused within a hybrid regression architecture to produce accurate satisfaction predictions. We conduct extensive experiments on large-scale MOOC review datasets collected from multiple public platforms. The experimental results demonstrate that the proposed LLM-based multi-modal framework consistently outperforms traditional text-only models, shallow sentiment baselines, and single-modality regression approaches. Comprehensive ablation studies further validate the necessity of jointly modeling topic semantics, deep sentiment representations, and behavioral analytics. Our findings highlight the critical role of large-scale contextual language representations in advancing learning analytics and provide actionable insights for platform design, course improvement, and personalized recommendation.
0
0
cs.GR 2026-04-14

INR features enable accurate ROI classification with sparse labels

NeuVolEx: Implicit Neural Features for Volume Exploration

Representations learned during neural volume training, when augmented, classify regions accurately and cluster viewpoints to reveal distinct

Figure from the paper full image
abstract click to expand
Direct volume rendering (DVR) aims to help users identify and examine regions of interest (ROIs) within volumetric data, and feature representations that support effective ROI classification and clustering play a fundamental role in volume exploration. Existing approaches typically rely on either explicit local feature representations or implicit convolutional feature representations learned from raw volumes. However, explicit local feature representations are limited in capturing broader geometric patterns and spatial correlations, while implicit convolutional feature representations do not necessarily ensure robust performance in practice, where user supervision is typically limited. Meanwhile, implicit neural representations (INRs) have recently shown strong promise in DVR for volume compression, owing to their ability to compactly parameterize continuous volumetric fields. In this work, we propose NeuVolEx, a neural volume exploration approach that extends the role of INRs beyond volume compression. Unlike prior compression methods that focus on INR outputs, NeuVolEx leverages feature representations learned during INR training as a robust basis for volume exploration. To better adapt these feature representations to exploration tasks, we augment a base INR with a structural encoder and a multi-task learning scheme that improve spatial coherence for ROI characterization. We validate NeuVolEx on two fundamental volume exploration tasks: image-based transfer function (TF) design and viewpoint recommendation. NeuVolEx enables accurate ROI classification under sparse user supervision for image-based TF design and supports unsupervised clustering to identify compact complementary viewpoints that reveal different ROI clusters. Experiments on diverse volume datasets with varying modalities and ROI complexities demonstrate NeuVolEx improves both effectiveness and usability over prior methods
0
0
cs.GR 2026-04-13

Complex-valued method upsamples holograms while keeping linear depth

CV-HoloSR: Hologram to hologram super-resolution through volume-upsampling three-dimensional scenes

CV-HoloSR avoids quadratic distortion during volume scaling, raises perceptual quality 32 percent, and trains on 200 samples in 5.2 hours.

Figure from the paper full image
abstract click to expand
Existing hologram super-resolution (HSR) methods primarily focus on angle-of-view expansion. Adapting them for volumetric spatial up-sampling introduces severe quadratic depth distortion, degrading 3D focal accuracy. We propose CV-HoloSR, a complex-valued HSR framework specifically designed to preserve physically consistent linear depth scaling during volume up-sampling. Built upon a Complex-Valued Residual Dense Network (CV-RDN) and optimized with a novel depth-aware perceptual reconstruction loss, our model effectively suppresses over-smoothing to recover sharp, high-frequency interference patterns. To support this, we introduce a comprehensive large-depth-range dataset with resolutions up to 4K. Furthermore, to overcome the inherent depth bias of pre-trained encoders when scaling to massive target volumes, we integrate a parameter-efficient fine-tuning strategy utilizing complex-valued Low-Rank Adaptation (LoRA). Extensive numerical and physical optical experiments demonstrate our method's superiority. CV-HoloSR achieves a 32% improvement in perceptual realism (LPIPS of 0.2001) over state-of-the-art baselines. Additionally, our tailored LoRA strategy requires merely 200 samples, reducing training time by over 75% (from 22.5 to 5.2 hours) while successfully adapting the pre-trained backbone to unseen depth ranges and novel display configurations.
0

browse all of cs.GR → full archive · search · sub-categories