GLENS uses diffusion models on solver iterates to generate high-quality and diverse initial guesses for multimodal non-convex optimization, leading to faster solver convergence.
Canonical reference
In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp
Canonical reference. 73% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
MemPoison enables stealthy memory poisoning in LLM agents via dialogue by using semantic relational bridges, entity masquerading, and joint embedding optimization to bypass selective extraction and rewriting, achieving up to 0.95 attack success rate.
MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
The virtual object MPC framework enables stable shared teleoperation for transporting up to nine objects, cutting sliding distance by 72.45% and eliminating tip-overs compared to baseline.
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
X-Morph retargets human motions to kinematically plausible references for multiple legged morphologies, trains privileged RL trackers, and distills them into deployable policies that generalize and enable teleoperation and text-conditioned generation.
ReGuide is a self-improving framework that uses phase-conditioned guidance to generate corrective rollouts and absorbs successful ones back into diffusion policy training, yielding 1.3-7.7x success gains on Robomimic tasks.
LAFM adapts the source distribution in flow matching policies via a latent action model to better match fragmented robotic action spaces, claiming 23.4% higher real-world success and 10.4% on LIBERO-90 while beating larger pre-trained models.
HilDA pre-trains LiDAR backbones via multi-layer and global distillation from vision models plus temporal occupancy diffusion, yielding SOTA results on detection, flow, and occupancy tasks.
A VLA policy using view-selective visual routing and interaction-aware action MoE improves average success by 27.7% in simulation and 43.3% in real-world bimanual tasks over monolithic baselines.
SpaceVLN proposes a stagewise closed-loop framework using Spatial Cognitive Memory and Spatial-CoT for zero-shot vision-and-language navigation and object-goal navigation, reporting SOTA results on R2R-CE, RxR-CE, GN-Bench, and HM3D-OVON plus real-robot tests.
RGB-S projects tactile contacts onto images as force-modulated Gaussian saliency maps via kinematics and zero-initialized conditioning, raising real-world occluded dexterous manipulation success by 26.7 percentage points over implicit baselines.
Perceptive BFM grounds human motion priors in robot terrain perception via terrain-conformal reference synthesis and teacher-student transfer from adapted to raw-reference tracking.
HORIZON is a recoverability-governed checkpointed frontier curriculum for on-policy physical-domain scaling on quadruped locomotion that identifies three regularities: uneven widening, non-monotonic composition, and the necessity of joint on-policy interaction.
AFUN predicts task-conditional functional masks and 3D post-contact motion curves from RGB-D and language, trained via a standardized multi-source data pipeline, and reports large gains over baselines on segmentation, contact prediction, and motion tasks.
POIROT protocol repurposes agents in LLM multi-agent systems as an internal diagnostic layer for failure detection, outperforming single-LLM evaluators with gains that increase with complexity, agent count, and fault types.
Per-Frame Deep Sets enables scaling single-sphere to five-sphere transport on a quadruped by performing permutation-invariant pooling within each history frame, reaching 100% no-drop success in simulation where standard encoders plateau.
A hierarchical multi-robot motion planner that refines workspace decompositions to enable scalable coordination through discrete search over smaller decoupled subproblems.
Combines LTL formal methods with LLMs for auditing, predictive monitoring, and runtime intervention on temporally extended behavioral constraints, outperforming LLM baselines and reducing violations.
MAVIC corrects Bellman backups at instruction boundaries by adjusting the incoming objective and restoring continuation value, enabling consistent estimation under stochastic instruction switching in cooperative MARL.
On a fully actuated hexarotor, sensor-based INDI outperforms model-based geometric NDI under mismatches, gusts, and sensor degradation with lower position errors, but NDI tracks attitude better at reduced control rates, providing the first experimental full-pose INDI validation with decoupled axes.
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
Autonomous excavator controller achieves 1.8 cm RMSE in heavy-duty grading across different hydraulic architectures, outperforming commercial solutions by a factor of 2.6 in precision while better utilizing machine pressure.
citing papers explorer
-
DigiForest: Digital Analytics and Robotics for Sustainable Forestry
DigiForest integrates heterogeneous autonomous robots for data collection, automated tree trait extraction, a decision support system for growth forecasting, and autonomous harvesters for selective logging, with real-world tests in European forests.