MMD-Reg registers point clouds without correspondences by minimizing an MMD objective approximated via random Fourier features, solved with Levenberg-Marquardt and differentiated via the implicit function theorem for use as a neural network layer.
hub
Open3D: A Modern Library for 3D Data Processing
46 Pith papers cite this work. Polarity classification is still indexing.
abstract
Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. Open3D was developed from a clean slate with a small and carefully considered set of dependencies. It can be set up on different platforms and compiled from source with minimal effort. The code is clean, consistently styled, and maintained via a clear code review mechanism. Open3D has been used in a number of published research projects and is actively deployed in the cloud. We welcome contributions from the open-source community.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
Manifold k-NN generalizes DP-NNS to k-NN queries on manifold point clouds via a recursive successor-list property, delivering 1-10x speedups and full dynamic support.
Paired-CSLiDAR benchmark and Residual-Guided Stratified Registration achieve 86% success at 0.75 m RMSE on 9,012 cross-source pairs by height-stratified ICP and confidence-gated selection.
PC2Model is a new public benchmark dataset combining simulated and real-world 3D point clouds with corresponding models to train and test registration methods.
ClipGStream enables scalable flicker-free reconstruction of long dynamic multi-view videos by performing stream optimization at the clip level with clip-independent spatio-temporal fields, residual anchor compensation, and inter-clip inherited anchors.
SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.
2D Triangle Splatting uses 2D triangles instead of 3D Gaussians to enable differentiable optimization that yields opaque mesh-like reconstructions with competitive visual quality.
ViVo introduces a diverse multi-view volumetric video dataset with raw multi-camera RGB-depth data, calibration, masks, and point clouds to support reconstruction and compression research, with benchmarks highlighting limitations of current methods.
CSCD generalizes LS to continuous domain with CSCD-M using intrinsic triangulation for meshes and CSCD-PC using tufted Laplacians for point clouds, claiming to match or outperform priors on benchmarks.
LAPS improves incremental neural LiDAR mapping by combining reliability-based active pooling for sample retention with uncertainty-guided active sampling for optimization focus.
A probabilistic validation framework with a novel modified area validation metric quantifies finite element model error for fusion heat sinks while separating it from aleatoric and epistemic experimental uncertainties.
Point cloud geometry is cast as a statistical manifold of per-point Gaussians, with POLI learning the mapping self-supervisedly to improve perception without labeled data.
A gradient-descent algorithm with level-set surface representation and dynamic point adjustment generates curvature-adaptive, locally regular point distributions on curved surfaces with low deviation from target spacing.
A new MAT simplification algorithm uses explicit surface correspondence tracking and priority-controlled edge collapses to preserve structural features like fillet alignments on discrete meshes.
PRIME is a five-level hierarchical equivariant graph model for proteins that uses physics-informed deterministic operators to exchange information across scales and achieves state-of-the-art results on fold classification and reaction class prediction.
A new online attack framework manipulates object poses in shared CAV perception data below detection thresholds, propagating errors to cause unsafe trajectory predictions and behaviors in up to 50% of tested scenarios while evading defenses.
DualViewMapDet fuses prior-traversal point cloud maps into camera features via dual perspective-view and bird's-eye-view encoding to improve 3D detection and tracking without LiDAR.
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
Introduces Multimodal-NF, a synchronized dataset of near-field CSI with RGB, LiDAR, and GPS data plus an open generator for low-altitude XL-MIMO research.
SPEAR-1 combines a 3D-enriched VLM with embodied control to match or exceed existing robotic foundation models using 20 times fewer robot demonstrations.
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
citing papers explorer
-
Scalable and Differentiable Point-Cloud Registration Using Maximum Mean Discrepancy
MMD-Reg registers point clouds without correspondences by minimizing an MMD objective approximated via random Fourier features, solved with Levenberg-Marquardt and differentiated via the implicit function theorem for use as a neural network layer.
-
CARD: A Multi-Modal Automotive Dataset for Dense 3D Reconstruction in Challenging Road Topography
CARD is a new multi-modal driving dataset delivering ~500K dense depth pixels per frame from challenging road topographies using stereo cameras and fused LiDARs over 110 km.
-
Manifold k-NN: Accelerated k-NN Queries for Manifold Point Clouds
Manifold k-NN generalizes DP-NNS to k-NN queries on manifold point clouds via a recursive successor-list property, delivering 1-10x speedups and full dynamic support.
-
Paired-CSLiDAR: Height-Stratified Registration for Cross-Source Aerial-Ground LiDAR Pose Refinement
Paired-CSLiDAR benchmark and Residual-Guided Stratified Registration achieve 86% success at 0.75 m RMSE on 9,012 cross-source pairs by height-stratified ICP and confidence-gated selection.
-
PC2Model: ISPRS benchmark on 3D point cloud to model registration
PC2Model is a new public benchmark dataset combining simulated and real-world 3D point clouds with corresponding models to train and test registration methods.
-
ClipGStream: Clip-Stream Gaussian Splatting for Any Length and Any Motion Multi-View Dynamic Scene Reconstruction
ClipGStream enables scalable flicker-free reconstruction of long dynamic multi-view videos by performing stream optimization at the clip level with clip-independent spatio-temporal fields, residual anchor compensation, and inter-clip inherited anchors.
-
SEM-ROVER: Semantic Voxel-Guided Diffusion for Large-Scale Driving Scene Generation
SEM-ROVER generates large multiview-consistent 3D urban driving scenes via semantic-conditioned diffusion on Σ-Voxfield voxel grids with progressive outpainting and deferred rendering.
-
2D Triangle Splatting for Direct Differentiable Mesh Training
2D Triangle Splatting uses 2D triangles instead of 3D Gaussians to enable differentiable optimization that yields opaque mesh-like reconstructions with competitive visual quality.
-
ViVo: A Dataset for Volumetric Video Reconstruction and Compression
ViVo introduces a diverse multi-view volumetric video dataset with raw multi-camera RGB-depth data, calibration, masks, and point clouds to support reconstruction and compression research, with benchmarks highlighting limitations of current methods.
-
Curve Skeletonization in Continuous domain for Meshes and Point Clouds
CSCD generalizes LS to continuous domain with CSCD-M using intrinsic triangulation for meshes and CSCD-PC using tufted Laplacians for point clouds, claiming to match or outperform priors on benchmarks.
-
LAPS: Improving Incremental LiDAR Mapping using Active Pooling and Sampling for Neural Distance Fields
LAPS improves incremental neural LiDAR mapping by combining reliability-based active pooling for sample retention with uncertainty-guided active sampling for optimization focus.
-
Towards Virtual Qualification in Nuclear Fusion: Demonstrating Probabilistic Model Validation on a High Heat Flux Component
A probabilistic validation framework with a novel modified area validation metric quantifies finite element model error for fusion heat sinks while separating it from aleatoric and epistemic experimental uncertainties.
-
Learning Point Cloud Geometry as a Statistical Manifold: Theory and Practice
Point cloud geometry is cast as a statistical manifold of per-point Gaussians, with POLI learning the mapping self-supervisedly to improve perception without labeled data.
-
Globally adaptive and locally regular point discretization of curved surfaces
A gradient-descent algorithm with level-set surface representation and dynamic point adjustment generates curvature-adaptive, locally regular point distributions on curved surfaces with low deviation from target spacing.
-
Structural MAT: Clean and Scalable Medial Axis Simplification via Explicit Surface Correspondence
A new MAT simplification algorithm uses explicit surface correspondence tracking and priority-controlled edge collapses to preserve structural features like fillet alignments on discrete meshes.
-
PRIME: Protein Representation via Physics-Informed Multiscale Equivariant Hierarchies
PRIME is a five-level hierarchical equivariant graph model for proteins that uses physics-informed deterministic operators to exchange information across scales and achieves state-of-the-art results on fold classification and reaction class prediction.
-
From Stealthy Data Fabrication to Unsafe Driving: Realistic Scenario Attacks on Collaborative Perception
A new online attack framework manipulates object poses in shared CAV perception data below detection thresholds, propagating errors to cause unsafe trajectory predictions and behaviors in up to 50% of tested scenarios while evading defenses.
-
Leveraging Previous-Traversal Point Cloud Map Priors for Camera-Based 3D Object Detection and Tracking
DualViewMapDet fuses prior-traversal point cloud maps into camera features via dual perspective-view and bird's-eye-view encoding to improve 3D detection and tracking without LiDAR.
-
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
-
Visually-grounded Humanoid Agents
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
-
HandDreamer: Zero-Shot Text to 3D Hand Model Generation using Corrective Hand Shape Guidance
HandDreamer is the first zero-shot text-to-3D method for hands that uses MANO initialization, skeleton-guided diffusion, and corrective shape guidance to produce view-consistent models.
-
Multimodal-NF: A Wireless Dataset for Near-Field Low-Altitude Sensing and Communications
Introduces Multimodal-NF, a synchronized dataset of near-field CSI with RGB, LiDAR, and GPS data plus an open generator for low-altitude XL-MIMO research.
-
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding
SPEAR-1 combines a 3D-enriched VLM with embodied control to match or exceed existing robotic foundation models using 20 times fewer robot demonstrations.
-
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
-
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
MLLMs achieve competitive but subhuman performance on the new VSI-Bench for visual-spatial intelligence from videos, with spatial reasoning as the main bottleneck and explicit cognitive map generation improving distance estimation.
-
GEM: Generative Supervision Helps Embodied Intelligence
GEM adds generative depth supervision to VLM pre-training and reports improved results on embodied benchmarks plus real-world robot execution.
-
PQDT: Pseudo-Query Dual Transformer for Robust Point Cloud Restoration
PQDT is a unified Transformer-based network using a Pseudo-Query module to restore high-quality point cloud geometry from diverse degradations, claiming to surpass prior methods on combined completion, denoising, and deformation tasks.
-
From Full and Partial Intraoral Scans to Crown Proposal: A Classification-Guided Restoration Assistance Pipeline
A classification-routed pipeline segments partial and full intraoral scans then retrieves and fits crown proposals from neighboring teeth embeddings, reporting macro DSC 0.9249 on 1958 partial scans.
-
Real-Scale Island Area and Coastline Estimation using Only its Place Name or Coordinates
A monocular vision system estimates real-scale island area and coastline length with around 10% error using only place name or coordinates input via automated image capture, point cloud generation, and trajectory alignment.
-
Point Cloud Registration via Probabilistic Self-Update Local Correspondence and Line Vector Sets
A new PCR algorithm using probabilistic self-update local correspondence and line vector sets achieves superior time efficiency and at least 10% RMSE improvement over state-of-the-art methods.
-
MyoVision: A Mobile Research Tool and NEATBoost-Attention Ensemble Framework for Real Time Chicken Breast Myopathy Detection
Smartphone transillumination imaging paired with a neuroevolution-tuned ensemble model classifies chicken breast myopathies at 82.4% accuracy on 336 fillets, matching costly hyperspectral systems.
-
MeshOn: Intersection-Free Mesh-to-Mesh Composition
MeshOn composes two input meshes realistically without intersections by using VLM-based rigid initialization, attractive geometric losses, a barrier loss, and a diffusion prior for final deformation.
-
R3PM-Net: Real-time, Robust, Real-world Point Matching Network
R3PM-Net delivers real-time point cloud registration with high accuracy on synthetic and real-world datasets through a global-aware lightweight architecture and new evaluation benchmarks.
-
Real-to-Sim for Highly Cluttered Environments via Physics-Consistent Inter-Object Reasoning
A differentiable optimization pipeline uses a contact graph and rigid-body simulation to jointly refine object poses and physical properties, producing physically valid 3D scene reconstructions from single-view RGB-D observations for cluttered environments.
-
REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
A zero-shot framework that recovers part articulations and produces simulation-compatible interactive 3D scene replicas from static inputs.
-
Geometry-Aware Scene Configurations for Novel View Synthesis
Geometry-guided adaptive placement of bases and virtual viewpoints improves rendering quality and memory use over uniform arrangements in scalable NeRF for large indoor scenes.
-
DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation
LGAA is a modular adapter framework that lifts multi-view diffusion models to produce 2D Gaussian Splats with PBR channels for high-quality relightable 3D mesh extraction using data-efficient finetuning on 69k instances.
-
Hestia: Voxel-Face-Aware Hierarchical Next-Best-View Acquisition for Efficient 3D Reconstruction
Hestia improves generalizable next-best-view planning for 3D reconstruction via hierarchical action search, diverse data, close-greedy strategy, and face-aware voxel design, yielding higher coverage and lower Chamfer distance than prior RL-based methods.
-
3D Densification for Multi-Map Monocular VSLAM in Endoscopy
A densification pipeline for multi-map monocular endoscopic VSLAM that aligns NN LightDepth predictions to CudaSIFT sparse submaps via LMedS, reporting 4.15 mm RMS accuracy on the C3VD phantom dataset.
-
EnforceNet: Monocular Camera Localization in Large Scale Indoor Sparse LiDAR Point Cloud
EnforceNet achieves centimeter-level monocular camera localization in sparse LiDAR maps of indoor parking garages via a novel resistor module that improves generalization, accuracy, and training speed.
-
Towards Affordance Prediction with Vision via Task Oriented Grasp Quality Metrics
The work defines task-oriented grasp metrics from basic ones and trains vision models to infer them in both known-model simulation and partial-information range-image settings.
-
Efficiently Closing Loops in LiDAR-Based SLAM Using Point Cloud Density Maps
Introduces a sensor-agnostic loop closure pipeline for LiDAR SLAM using density maps, ground alignment, ORB on BEV projections, BST retrieval, and pruning to handle perceptual aliasing.
-
Contactless 3D Human Body Measurement Using Depth Cameras for Smart Health Monitoring
Framework for contactless anthropometric measurements from depth camera point clouds using standard libraries to segment body and compute linear dimensions plus volume and area.
-
A Survey on Deep Learning Architectures for Point Cloud Classification and Segmentation
A survey that categorizes deep learning models for point cloud tasks by backbone architecture, evaluates benchmark performance, and outlines challenges and future research directions.
- Bimanual Robot Manipulation via Multi-Agent In-Context Learning
- GSDeformer: Direct, Real-time and Extensible Cage-based Deformation for 3D Gaussian Splatting