Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
hub
Advances in neural information processing systems , volume=
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
A multi-exposure video model predicts bracketed linear SDR sequences from single nonlinear SDR input, which a merging model combines into HDR video preserving shadow and highlight detail.
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
TimeTok is a unified framework using hierarchical tokenization for granularity-controllable time-series generation that achieves state-of-the-art performance in standard tasks and shows transferability across heterogeneous datasets.
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
MUSE decouples reconstruction and semantic learning in visual tokenization via topological orthogonality, yielding SOTA generation quality and improved semantic performance over its teacher model.
A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
AnimeAdapter is a modular adapter for Stable Diffusion that enables appearance-consistent anime character generation from a single reference image using semantic-selective local attention and pose-aware conditioning, plus a new Danbooru-derived dataset.
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.
citing papers explorer
-
Functionalization via Structure Completion and Motion Rectification
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
-
Generating HDR Video from SDR Video
A multi-exposure video model predicts bracketed linear SDR sequences from single nonlinear SDR input, which a merging model combines into HDR video preserving shadow and highlight detail.
-
Img2CADSeq: Image-to-CAD Generation via Sequence-Based Diffusion
Img2CADSeq generates standard CAD sequences from images via a multi-stage pipeline with three-level hierarchical codebook encoding, importance-guided compression, and contrastive point-cloud conditioning of a VQ-Diffusion model, outperforming prior methods on new CAD-220K and PrintCAD datasets.
-
AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling
AffectCodec is an emotion-guided neural speech codec that preserves emotional cues during quantization while maintaining semantic fidelity and prosodic naturalness.
-
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
-
Learning to Theorize the World from Observation
NEO is a probabilistic neural model that induces compositional programs as a learned Language of Thought from non-textual observations and executes them via a shared transition model to enable explanation-driven generalization.
-
TimeTok: Granularity-Controllable Time-Series Generation via Hierarchical Tokenization
TimeTok is a unified framework using hierarchical tokenization for granularity-controllable time-series generation that achieves state-of-the-art performance in standard tasks and shows transferability across heterogeneous datasets.
-
Diffusion Posterior Sampling for General Noisy Inverse Problems
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
-
MUSE: Resolving Manifold Misalignment in Visual Tokenization via Topological Orthogonality
MUSE decouples reconstruction and semantic learning in visual tokenization via topological orthogonality, yielding SOTA generation quality and improved semantic performance over its teacher model.
-
From Pixels to Tokens: A Systematic Study of Latent Action Supervision for Vision-Language-Action Models
A unified comparison of latent action supervision strategies for VLA models reveals task-specific benefits, with image-based approaches aiding reasoning and generalization, action-based aiding motor control, and discrete tokens proving most effective.
-
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
-
AnimeAdapter: A Modular Adapter for Appearance-Consistent Anime Character Generation
AnimeAdapter is a modular adapter for Stable Diffusion that enables appearance-consistent anime character generation from a single reference image using semantic-selective local attention and pose-aware conditioning, plus a new Danbooru-derived dataset.
-
Mutual Enhancement Between Global Tokens and Patch Tokens: From Theory to Practice
TaTok is a theoretically grounded adaptive tokenization method that uses global tokens and cumulative conditional entropy filtering to reduce redundancy while improving reconstruction quality over fixed-rate patch tokenization.
-
PnP-Corrector: A Universal Correction Framework for Coupled Spatiotemporal Forecasting
PnP-Corrector decouples pre-trained physics engines from a correction agent to mitigate reciprocal error amplification in coupled spatiotemporal forecasting, cutting error by 28% on a 300-day ocean-atmosphere task.