The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
super hub Mixed citations
ShapeNet: An Information-Rich 3D Model Repository
Mixed citation behavior. Most common role is background (57%).
abstract
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometri
authors
co-cited works
representative citing papers
ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.
WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.
BIM-Edit benchmark finds best LLM scores only 49.5% average across geometric, semantic, and topological metrics on 324 IFC editing tasks, with no model fully solving more than 3.4%.
FllumaOne releases 100,000 kernel-validated CAD models as executable Python programs with aligned multimodal data including feature histories and geometry exports.
3D-CoS represents 3D objects as Blender code generated by VLMs, with workflows for planning, RAG, and agents, showing better edit fidelity than point-cloud baselines.
Diffusion for 3D shapes is moved from dense geometry to compact superquadric parameter sets, cutting state size to roughly 7 KB per shape and enabling faster generation plus new editing capabilities.
VLMs exhibit consistent vertical-distance entanglement in embeddings from perspective bias in natural images, producing accuracy gaps that a new synthetic benchmark SpatialTunnel exposes as model-intrinsic.
Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.
Metric-Phase Fields decouple unsigned metric proximity from a smooth phase field with learnable sharpness to enable faithful reconstruction of thin and open structures from point clouds.
ArtSplat is the first feed-forward framework for articulated 3D Gaussian Splatting that reconstructs geometry and joints from sparse multi-state uncalibrated views in one pass.
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
OffsetAxis reconstructs meshes from unsigned distance fields by extracting the medial axis of the alpha-offset volume using ray casting and variational medial ball optimization.
min-GSGW learns coupled nonlinear slicers to produce a rigid-motion-invariant, scalable approximation to the Gromov-Wasserstein distance and its transport plans.
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
Consistency learning reformulates 3D point cloud anomaly detection to predict clean geometry directly in one or two steps, yielding up to 80 times faster inference while matching state-of-the-art accuracy.
ADS adaptively refines a Delaunay scaffold to produce unbiased random samples on occupancy function surfaces together with a connecting mesh, using far fewer evaluations than existing approaches.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
AirZoo is a new dataset covering 378 regions across 22 countries with pixel-level metric depth and 6-DoF poses, shown via benchmarks to improve SoTA models on aerial image retrieval, cross-view matching, and multi-view 3D reconstruction.
Topo-ADV uses differentiable persistent homology to create topology-altering perturbations that achieve up to 100% attack success on point cloud classifiers like PointNet while remaining geometrically imperceptible.
citing papers explorer
-
Restore3D: Breathing Life into Broken Objects with Shape and Texture Restoration
Restore3D restores shape and texture of broken 3D objects via multi-view image refinement with a Mask Self-Perceiver and coarse-to-fine mesh reconstruction, outperforming baselines on synthetic and real benchmarks.
-
AC3S: Adaptive Conditioning for 3D-Aware Synthetic Data Generation
AC3S adds a self-supervised visual prompt modulator to ControlNet diffusion and a multi-agent VLM prompt composer to generate photorealistic images with accurate 2D/3D annotations while avoiding over-conditioning.
-
ReScene: Structured Indoor Scene Reconstruction from Multi-View Captures
ReScene introduces HierView for view prioritization and Relation-Aware Assembly for scene graph fusion, reporting 17% lower Chamfer Distance and 26% lower LPIPS than prior baselines on ScanNet while running faster.
-
3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning
3D-DLP decomposes 3D scenes into controllable latent particles via self-supervised reconstruction for improved robotic tasks.
-
Efficient RWKV-based Representation Learning for 3D Point Clouds
Introduces P-RWKV block and PointER self-supervised framework to adapt RWKV for efficient 3D point cloud representation learning.
-
Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers
ViSAE supplies a 64K-image probing suite with 16K concepts, top-down/bottom-up circuit algorithms, and editing methods that raise WaterBirds worst-group accuracy by 48.2% over baselines.
-
MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
MeshWeaver uses sparse-voxel guidance for autoregressive surface weaving to achieve 18% compression and generate up to 16K-face meshes with improved fidelity.
-
Artiverse: A Diverse and Physically Grounded Dataset for Articulated Objects
Artiverse is a new dataset of 5.4K human-authored articulated 3D objects with detailed annotations for parts, multi-DoF joints, interior structures, and physical attributes to enable functional modeling and physics-based interaction.
-
Unified 3D Scene Understanding Through Physical World Modeling
A probabilistic graphical model called 3WM unifies 3D vision tasks into one system that performs them zero-shot by selecting different inference pathways through multimodal scene nodes.
-
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers
EVA01 introduces a Mixture-of-Transformers model that natively adds 3D mesh understanding, generation, and multi-turn editing to MLLMs by decoupling understanding and generation experts with shared global self-attention.
-
Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection
Proposes MODIAD framework with MIS scheduling solved via SMG algorithm and REC-LoRA adaptation for efficient multimodal online distributed industrial anomaly detection, reporting superior performance on MVTec 3D-AD and Eyecandies datasets.
-
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
EvObj learns evolving object-centric representations for unsupervised 3D instance segmentation by dynamically refining object candidates and completing partial geometries to bridge the synthetic-to-real domain gap, outperforming baselines on real and synthetic datasets.
-
Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates
Explicit E(3)-equivariance in neural CFD surrogates improves generalization on diverse-geometry hemodynamics benchmarks but degrades in-distribution performance on strongly aligned aerodynamics data, consistently beating data augmentation.
-
Syn4D: A Multiview Synthetic 4D Dataset
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.
-
Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis
PointCRA reduces information loss in deep point cloud networks by treating temporal trend variation as an extra evaluation dimension alongside spatial and channel attention, guided by a neighborhood homogeneity constraint.
-
From Visual Synthesis to Interactive Worlds: Toward Production-Ready 3D Asset Generation
The paper surveys 3D asset generation methods and organizes them around the full production pipeline to assess which outputs meet engine-level requirements for interactive applications.
-
AmaraSpatial-10K: A Spatially and Semantically Aligned 3D Dataset for Spatial Computing and Embodied AI
AmaraSpatial-10K supplies 10K deployment-ready 3D assets with metric scaling and metadata, delivering 3.4x higher CLIP Recall@5 than Objaverse and 99.1% physics stability in Habitat-Sim.
-
Unposed-to-3D: Learning Simulation-Ready Vehicles from Real-World Images
Unposed-to-3D learns simulation-ready 3D vehicle models from unposed real images by predicting camera parameters for photometric self-supervision, then adding scale prediction and harmonization.
-
Neural Distribution Prior for LiDAR Out-of-Distribution Detection
NDP models prediction distributions and uses Perlin noise OOD synthesis to reach 61.31% point-level AP on STU LiDAR benchmark, over 10x prior best.
-
Hierarchical Feature Learning for Medical Point Clouds via State Space Model
Presents an SSM-based hierarchical feature learning method for medical point clouds that reports superior performance on classification, completion, and segmentation using a new dataset MedPointS.
-
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Zero123++ produces high-quality 3D-consistent multi-view images from a single input by fine-tuning Stable Diffusion with targeted conditioning and training methods.
-
DetailCLIP: Injecting Image Details into CLIP's Feature Space
A patch-based fusion method extends CLIP to high-resolution images by retaining multi-scale details for improved class-prompted retrieval.
-
SynthCity: A large scale synthetic point cloud
SynthCity is a 367.9M point synthetic full-colour Mobile Laser Scanning point cloud with per-point labels from nine categories, generated in Blender for an urban environment.
-
Linkify: Learning from Interface-Augmented Assembly Graphs
Linkify augments assembly graphs with corrected interface point clouds and trains GATv2 for masked part prediction, outperforming non-graph baselines on Fusion 360 data.
-
Mitigating Positional Leakage in 3D Masked Autoencoders for Robust Representation Learning
MPL-MAE introduces recalibrated positional embedding and gated positional interface modules to reduce positional over-reliance in 3D masked autoencoders and improve semantic representation quality.
-
Domain Generalizable Adaptation of 3D Vision-Language Models via Regularized Fine-Tuning
ReFine3D uses selective layer tuning, multi-view consistency regularization, LLM-generated text diversity, point-rendered supervision, and confidence-weighted test-time augmentation to improve domain generalization in 3D LMMs by 1-3% on benchmarks.
-
Kwai Keye-VL-2.0 Technical Report
Kwai Keye-VL-2.0-30B-A3B is a 30B MoE model with 3B active parameters using DSA adaptation and MOPD distillation that reports SOTA results on video understanding and agent benchmarks.
-
Learning Representations from 3D Gaussian Splats
Comparative benchmark of geometric deep learning models on 3D Gaussian Splatting representations for scene classification via end-to-end training, linear probing, and clustering.
-
Uni-RCM: Unified Reference-guided Cross-modal Mapping for Multi-Class Anomaly Detection
Uni-RCM achieves state-of-the-art multi-class anomaly detection on MVTec-3D AD via a reference guide block and offline residual quantizer.
-
RETO: A Rotary-Enhanced Transformer Operator for High-Fidelity Prediction of Automotive Aerodynamics
RETO achieves relative L2 errors of 0.063 on ShapeNet and 0.089/0.097 on DrivAerML surface pressure/velocity, outperforming Transolver and other baselines.
-
Reinforcing 3D Understanding in Point-VLMs via Geometric Reward Credit Assignment
Geometric Reward Credit Assignment disentangles rewards to geometric tokens and adds reprojection consistency to boost 3D keypoint accuracy from 0.64 to 0.93 and bounding box IoU to 0.686 on a ShapeNetCore benchmark while preserving 2D performance.
-
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation
Hunyuan3D 2.0 scales flow-based diffusion transformers and texture synthesis models to generate high-resolution textured 3D assets that outperform prior state-of-the-art in geometry, alignment, and texture quality.
-
AI+CAD Data Representation Architecture: From DeepCAD Solid Modeling to WHUCAD Industrial-Level Parametric Feature Modeling
The paper classifies AI+CAD data representations and argues WHUCAD's three-level architecture provides better foundational support for industrial parametric feature modeling than DeepCAD.
-
Benchmarking stereo reconstruction for 3D printable Martian terrain models
RAFT-Stereo outperforms SGBM on Middlebury but shows weaker edge alignment and higher reprojection error on Curiosity imagery, while geometry completion trades local accuracy for mesh connectivity in printable models.
-
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory
The book presents principles from optimization and information theory to explain deep network architectures and enable new interpretable models.
-
Attention Is not Everything: Efficient Alternatives for Vision
A survey that taxonomizes non-Transformer vision models and evaluates their practical trade-offs across efficiency, scalability, and robustness.
-
A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation
A survey that categorizes and summarizes methods applying 3D Gaussian Splatting to segmentation, editing, generation, and related tasks, including datasets and evaluation protocols.
-
Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
Hunyuan3D 2.1 is a two-part system with DiT for shape generation and Paint for texture synthesis that produces high-fidelity 3D assets with PBR materials.
-
Advances in Neural 3D Mesh Texturing: A Survey
A literature survey that organizes neural 3D mesh texturing methods into a taxonomy spanning early GAN-based approaches to modern diffusion pipelines, while reviewing architectures, datasets, evaluation, and open challenges.
-
3D Generation for Embodied AI and Robotic Simulation: A Survey
The paper surveys 3D generation techniques for embodied AI and robotics, categorizing them into data generation, simulation environments, and sim-to-real bridging while identifying bottlenecks in physical validity and transfer.
-
NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review (Updated Post-Gaussian Splatting)
A literature survey of NeRF and neural field methods from 2020-2025, organized by architecture and application taxonomies with benchmarks and dataset overviews, covering both pre- and post-Gaussian Splatting periods.
-
A review on deep learning techniques for 3D sensed data classification
A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.
- L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization
- Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
- Efficient Transferable Optimal Transport via Min-Sliced Transport Plans