The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.
super hub Mixed citations
ShapeNet: An Information-Rich 3D Model Repository
Mixed citation behavior. Most common role is background (57%).
abstract
We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometric analysis, and provide a large-scale quantitative benchmark for research in computer graphics and vision. At the time of this technical report, ShapeNet has indexed more than 3,000,000 models, 220,000 models out of which are classified into 3,135 categories (WordNet synsets). In this report we describe the ShapeNet effort as a whole, provide details for all currently available datasets, and summarize future plans.
hub tools
citation-role summary
citation-polarity summary
claims ledger
- abstract We present ShapeNet: a richly-annotated, large-scale repository of shapes represented by 3D CAD models of objects. ShapeNet contains 3D models from a multitude of semantic categories and organizes them under the WordNet taxonomy. It is a collection of datasets providing many semantic annotations for each 3D model such as consistent rigid alignments, parts and bilateral symmetry planes, physical sizes, keywords, as well as other planned annotations. Annotations are made available through a public web-based interface to enable data visualization of object attributes, promote data-driven geometri
authors
co-cited works
representative citing papers
ARKitScenes is the largest real-world indoor RGB-D dataset captured with mobile LiDAR, including high-resolution depth maps and 3D furniture bounding box annotations for advancing object detection and depth upsampling.
WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.
VLMs exhibit consistent vertical-distance entanglement in embeddings from perspective bias in natural images, producing accuracy gaps that a new synthetic benchmark SpatialTunnel exposes as model-intrinsic.
Morpheus learns morphable category-level shape priors to produce implicit 3D correspondences in camera space without explicit supervision and releases the HouseCorr3D benchmark with amodal and symmetry annotations.
Metric-Phase Fields decouple unsigned metric proximity from a smooth phase field with learnable sharpness to enable faithful reconstruction of thin and open structures from point clouds.
ArtSplat is the first feed-forward framework for articulated 3D Gaussian Splatting that reconstructs geometry and joints from sparse multi-state uncalibrated views in one pass.
MAPS provides 2618 validated 3D meshes and a controllable rendering pipeline to attribute vision model recognition failures to specific scene parameters, finding camera distance and elevation as the dominant failure factors across 20 tested models.
OffsetAxis reconstructs meshes from unsigned distance fields by extracting the medial axis of the alpha-offset volume using ray casting and variational medial ball optimization.
min-GSGW learns coupled nonlinear slicers to produce a rigid-motion-invariant, scalable approximation to the Gromov-Wasserstein distance and its transport plans.
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
Multi-grained counting is introduced with five granularity levels, supported by the new KubriCount dataset generated via 3D synthesis and editing, and HieraCount model that combines text and visual exemplars for improved accuracy.
Language representations serve as the asymptotic attractor for convergence in independently trained multimodal neural networks due to feature density asymmetry.
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
Reinforcement learning internalizes physical stability rules for brick structures, enabling the first rollback-free generation with orders-of-magnitude faster inference.
Consistency learning reformulates 3D point cloud anomaly detection to predict clean geometry directly in one or two steps, yielding up to 80 times faster inference while matching state-of-the-art accuracy.
ADS adaptively refines a Delaunay scaffold to produce unbiased random samples on occupancy function surfaces together with a connecting mesh, using far fewer evaluations than existing approaches.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
AirZoo is a new dataset covering 378 regions across 22 countries with pixel-level metric depth and 6-DoF poses, shown via benchmarks to improve SoTA models on aerial image retrieval, cross-view matching, and multi-view 3D reconstruction.
Topo-ADV uses differentiable persistent homology to create topology-altering perturbations that achieve up to 100% attack success on point cloud classifiers like PointNet while remaining geometrically imperceptible.
XShapeEnc encodes arbitrary 2D spatially grounded shapes into compact invertible representations by decomposing them into unit-disk geometry and harmonic pose fields then applying Zernike bases with frequency propagation.
3D-Fixer performs in-place 3D asset completion from single-view partial point clouds via coarse-to-fine generation with ORFA conditioning, plus a new ARSG-110K dataset, to achieve higher geometric accuracy than MIDI and Gen3DSR while keeping diffusion efficiency.
DeformPIC deforms query point clouds under prompt guidance for in-context learning, outperforming prior methods with lower Chamfer Distance on reconstruction, denoising, and registration tasks.
PointATA is a parameter-efficient transfer learning method that aligns 3D-4D modality gaps via optimal transport before adapting a frozen 3D model with video-specific modules to achieve strong 4D perception results.
citing papers explorer
-
Beyond Spatial Compression: Interface-Centric Generative States for Open-World 3D Structure
C2LT-3D factorizes 3D tokenization into canonical local geometry, partition-conditioned context, and relational seam variables to make latent states operational for assembly-level validation and repair in open-world multi-component assets.
-
Minimax Optimal Estimation of Transport-Growth Pairs in Unbalanced Optimal Transport
Estimators for transport-growth pairs in unbalanced OT achieve minimax optimal rates, supported by a value-based stability reduction through a UOT gap condition.
-
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic simulation benchmark for robot manipulation tasks that uses PBR materials and MLLM-assisted asset generation, achieving 0.92 Pearson correlation with real-world policy performance.
-
Prop-Chromeleon: Adaptive Haptic Props in Mixed Reality through Generative Artificial Intelligence
A generative-AI pipeline dynamically generates and anchors virtual assets to match the shape of physical props, enabling adaptive passive haptics in MR that users rate higher in realism, immersion, and enjoyment than static baselines.
-
TAFA-GSGC: Group-wise Scalable Point Cloud Geometry Compression with Progressive Residual Refinement
TAFA-GSGC is a scalable point cloud geometry compression codec using progressive residual refinement and group-wise entropy coding that achieves average BD-rate reductions of 4.99% (D1-PSNR) and 5.92% (D2-PSNR) over PCGCv2 while supporting monotonic multi-quality decoding from a single bitstream.
-
ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching
ShapeY is a benchmark dataset and nearest-neighbor protocol that measures shape-based recognition in vision models, revealing that even state-of-the-art networks fail to generalize consistently across 3D viewpoints and non-shape appearance changes.
-
Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows
Point-MF performs one-step point cloud reconstruction from single images by learning a mean velocity field in point space with a tailored Diffusion Transformer and a new auxiliary loss.
-
Text-Guided Multimodal Unified Industrial Anomaly Detection
A text-semantics-guided multimodal framework with geometry-aware mapping and object-conditioned text adaptation achieves state-of-the-art unsupervised anomaly detection and localization on RGB-3D industrial datasets while enabling a single model for multiple classes.
-
FILTR: Extracting Topological Features from Pretrained 3D Models
FILTR predicts persistence diagrams from pretrained 3D encoders on the new DONUT benchmark, showing limited topological signals in encoders but successful approximation via learnable feed-forward.
-
FurnSet: Exploiting Repeats for 3D Scene Reconstruction
FurnSet improves single-view 3D scene reconstruction by using per-object CLS tokens and set-aware self-attention to group and jointly reconstruct repeated object instances, with added scene-object conditioning and layout optimization.
-
Volume Transformer: Revisiting Vanilla Transformers for 3D Scene Understanding
A minimally modified vanilla Transformer called Volt achieves state-of-the-art 3D semantic and instance segmentation by using volumetric tokens, 3D rotary embeddings, and a data-efficient training recipe that scales better than domain-specific backbones.
-
One-Shot Cross-Geometry Skill Transfer through Part Decomposition
Part decomposition with generative shape models allows one-shot robot skill transfer across unfamiliar object geometries in simulation and real settings.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
ReplicateAnyScene: Zero-Shot Video-to-3D Composition via Textual-Visual-Spatial Alignment
ReplicateAnyScene performs fully automated zero-shot video-to-compositional-3D reconstruction by cascading alignments of generic priors from vision foundation models across textual, visual, and spatial dimensions.
-
TouchAnything: Diffusion-Guided 3D Reconstruction from Sparse Robot Touches
TouchAnything reconstructs accurate 3D object geometries from only a few tactile contacts by optimizing for consistency with a pretrained visual diffusion prior.
-
Part-Level 3D Gaussian Vehicle Generation with Joint and Hinge Axis Estimation
A new framework generates part-level animatable 3D Gaussian vehicles from images by adding modules for exclusive part ownership and kinematic joint/axis prediction.
-
FusionBERT: Multi-View Image-3D Retrieval via Cross-Attention Visual Fusion and Normal-Aware 3D Encoder
FusionBERT uses cross-attention to fuse multi-view images and a normal-aware encoder for 3D models, achieving higher image-3D retrieval accuracy than prior multimodal models in both single- and multi-view settings.
-
PhysSkin: Real-Time and Generalizable Physics-Based Animation via Self-Supervised Neural Skinning
PhysSkin uses a neural skinning autoencoder and physics-informed self-supervised training to create mesh-free, generalizable skinning fields for real-time animation.
-
GeoPT: Scaling Physics Simulation via Lifted Geometric Pre-Training
GeoPT pre-trains on over one million geometry samples augmented with synthetic dynamics to improve neural physics simulators on fluid and solid mechanics benchmarks while reducing labeled data needs by 20-60% and accelerating convergence by 2x.
-
DM3D: Deformable Mamba via Offset-Guided Differentiable Scanning for Point Cloud Understanding
DM3D introduces offset-guided differentiable scanning and continuity-aware state updates in a Mamba-based model to achieve state-of-the-art or competitive performance on point cloud classification, few-shot learning, and part segmentation.
-
SAM 3D: 3Dfy Anything in Images
SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.
-
A solution to generalized learning from small training sets found in infant repeated visual experiences of individual objects
Infant daily visual experiences of objects are dominated by repeated instances of few exemplars in lumpy similarity clusters, enabling category generalization from small training sets in computational models.
-
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
InternScenes is a new dataset of approximately 40,000 simulatable indoor scenes that combines real scans, procedural, and designer sources, preserves small objects for realistic layouts, and includes processing for simulation and interaction.
-
The Less You Depend, The More You Learn: Synthesizing Novel Views from Sparse, Unposed Images with Minimal 3D Knowledge
Data-centric novel view synthesis models with minimal 3D knowledge and no pose annotations scale better with data volume and outperform traditional bias-driven methods.
-
TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models
TripoSG generates high-fidelity 3D meshes from input images via a large-scale rectified flow transformer and hybrid-trained 3D VAE on a custom 2-million-sample dataset, claiming state-of-the-art fidelity and generalization.
-
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis
ViewCrafter tames video diffusion models with point-based 3D guidance and iterative trajectory planning to produce high-fidelity novel views from single or sparse images.
-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
DROID is a new 76k-trajectory in-the-wild robot manipulation dataset spanning 564 scenes and 84 tasks that improves policy performance and generalization when used for training.
-
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Transolver learns intrinsic physical states from discretized meshes by adaptively splitting domains into flexible learnable slices and computing attention over physics-aware tokens, achieving state-of-the-art PDE solving on general geometries.
-
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
SyncDreamer produces multiview-consistent images from a single input image by jointly modeling their distribution and synchronizing intermediate diffusion states via 3D-aware attention.
-
PointCaM: Cut-and-Mix for Open-Set Point Cloud Learning
PointCaM proposes a cut-and-mix mechanism with an Unknown-Point Simulator and Estimator to improve open-set recognition on point clouds by simulating out-of-distribution data and using multi-level features.
-
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
TetraSphere integrates a TetraTransform based on steerable spherical neurons into VN-DGCNN to produce an O(3)-equivariant descriptor that reports new SOTA results on rotated ScanObjectNN, ModelNet40 classification, and ShapeNet segmentation.
-
RECALL: Rehearsal-free Continual Learning for Object Classification
RECALL achieves rehearsal-free continual learning for object classification by logit recall before new training, regression regularization, Mahalanobis loss on known categories, and new heads per sequence, outperforming prior methods on CORe50, iCIFAR-100, and the introduced HOWS-CL-25 dataset.
-
Rectified Flow: A Marginal Preserving Approach to Optimal Transport
A single-objective rectified flow variant uses neural ODEs trained by regression to monotonically decrease a fixed convex transport cost while preserving marginal distributions.
-
Learning Embedding of 3D models with Quadric Loss
Quadric loss combined with Chamfer loss yields better sharp-feature reconstruction in 3D models than either loss alone or other point-surface alternatives.
-
Visualizing the Invisible: Occluded Vehicle Segmentation and Recovery
An iterative multi-task GAN-based framework completes occluded vehicle segmentation masks and recovers invisible appearance using coupled discriminators, a 3D silhouette pool, and a shared two-path network, outperforming prior methods on a new synthetic-plus-real dataset.
-
DeepOrganNet: On-the-Fly Reconstruction and Visualization of 3D / 4D Lung Models from Single-View Projections by Deep Deformation Network
DeepOrganNet reconstructs 3D/4D lung meshes from single-view 2D projections by learning smooth deformation fields from multiple templates via a deep network and trivariate tensor-product deformation.
-
A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization
A point cloud decoder using Adaptive Instance Normalization outperforms prior methods in auto-encoding, upsampling, and single-view reconstruction tasks.
-
Restore3D: Breathing Life into Broken Objects with Shape and Texture Restoration
Restore3D restores shape and texture of broken 3D objects via multi-view image refinement with a Mask Self-Perceiver and coarse-to-fine mesh reconstruction, outperforming baselines on synthetic and real benchmarks.
-
AC3S: Adaptive Conditioning for 3D-Aware Synthetic Data Generation
AC3S adds a self-supervised visual prompt modulator to ControlNet diffusion and a multi-agent VLM prompt composer to generate photorealistic images with accurate 2D/3D annotations while avoiding over-conditioning.
-
ReScene: Structured Indoor Scene Reconstruction from Multi-View Captures
ReScene introduces HierView for view prioritization and Relation-Aware Assembly for scene graph fusion, reporting 17% lower Chamfer Distance and 26% lower LPIPS than prior baselines on ScanNet while running faster.
-
Inside the Visual Mind: Neuroscience-Motivated Concept Circuits for Interpreting and Steering Vision Transformers
ViSAE supplies a 64K-image probing suite with 16K concepts, top-down/bottom-up circuit algorithms, and editing methods that raise WaterBirds worst-group accuracy by 48.2% over baselines.
-
MeshWeaver: Sparse-Voxel-Guided Surface Weaving for Autoregressive Mesh Generation
MeshWeaver uses sparse-voxel guidance for autoregressive surface weaving to achieve 18% compression and generate up to 16K-face meshes with improved fidelity.
-
Artiverse: A Diverse and Physically Grounded Dataset for Articulated Objects
Artiverse is a new dataset of 5.4K human-authored articulated 3D objects with detailed annotations for parts, multi-DoF joints, interior structures, and physical attributes to enable functional modeling and physics-based interaction.
-
Unified 3D Scene Understanding Through Physical World Modeling
A probabilistic graphical model called 3WM unifies 3D vision tasks into one system that performs them zero-shot by selecting different inference pathways through multimodal scene nodes.
-
EVA01: Unified Native 3D Understanding and Generation via Mixture-of-Transformers
EVA01 introduces a Mixture-of-Transformers model that natively adds 3D mesh understanding, generation, and multi-turn editing to MLLMs by decoupling understanding and generation experts with shared global self-attention.
-
Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection
Proposes MODIAD framework with MIS scheduling solved via SMG algorithm and REC-LoRA adaptation for efficient multimodal online distributed industrial anomaly detection, reporting superior performance on MVTec 3D-AD and Eyecandies datasets.
-
EvObj: Learning Evolving Object-centric Representations for 3D Instance Segmentation without Scene Supervision
EvObj learns evolving object-centric representations for unsupervised 3D instance segmentation by dynamically refining object candidates and completing partial geometries to bridge the synthetic-to-real domain gap, outperforming baselines on real and synthetic datasets.
-
Symmetry in the Wild: The Role of Equivariance in Neural Fluid Surrogates
Explicit E(3)-equivariance in neural CFD surrogates improves generalization on diverse-geometry hemodynamics benchmarks but degrades in-distribution performance on strongly aligned aerodynamics data, consistently beating data augmentation.
-
Syn4D: A Multiview Synthetic 4D Dataset
Syn4D is a new multiview synthetic 4D dataset supplying dense ground-truth annotations for dynamic scene reconstruction, tracking, and human pose estimation.
-
Channel-Level Relation to Attentive Aggregation with Neighborhood-Homogeneity Constraint for Point Cloud Analysis
PointCRA reduces information loss in deep point cloud networks by treating temporal trend variation as an extra evaluation dimension alongside spatial and channel attention, guided by a neighborhood homogeneity constraint.