MDrive benchmark shows multi-agent cooperative driving systems generally outperform single-agent ones in closed-loop settings but perception sharing does not always improve planning and negotiation can harm performance in complex traffic.
Canonical reference
Title resolution pending
Canonical reference. 73% of citing Pith papers cite this work as background.
citation-role summary
citation-polarity summary
representative citing papers
BehaviorBench reveals that self-play RL policies for autonomous driving overfit to their training traffic agents and do not generalize to other behaviors, motivating a hybrid rule-based plus learned planner.
Rule-VLN is the first large-scale benchmark injecting 177 regulatory categories into an urban environment, and the proposed SNRM module equips pre-trained VLN agents with zero-shot semantic reasoning and detour planning to reduce constraint violations by 19.26% and improve task completion.
The virtual object MPC framework enables stable shared teleoperation for transporting up to nine objects, cutting sliding distance by 72.45% and eliminating tip-overs compared to baseline.
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
A hierarchical multi-robot motion planner that refines workspace decompositions to enable scalable coordination through discrete search over smaller decoupled subproblems.
Combines LTL formal methods with LLMs for auditing, predictive monitoring, and runtime intervention on temporally extended behavioral constraints, outperforming LLM baselines and reducing violations.
On a fully actuated hexarotor, sensor-based INDI outperforms model-based geometric NDI under mismatches, gusts, and sensor degradation with lower position errors, but NDI tracks attitude better at reduced control rates, providing the first experimental full-pose INDI validation with decoupled axes.
VISOR is a VLM-based automated test oracle that evaluates robot task correctness and quality from videos while reporting its own uncertainty, tested on GPT and Gemini across four tasks and over 1000 videos with Gemini showing higher recall and GPT higher precision but low uncertainty-correctness tie
Autonomous excavator controller achieves 1.8 cm RMSE in heavy-duty grading across different hydraulic architectures, outperforming commercial solutions by a factor of 2.6 in precision while better utilizing machine pressure.
MAG-VLAQ fuses multi-modal ground and aerial data via ODE-conditioned vector-of-locally-aggregated-queries to nearly double recall@1 on aerial-ground place recognition benchmarks.
A neuron-astrocyte network with dual-timescale memory reduces median path lengths up to sixfold in partially observable grid-world navigation tasks.
A framework using 3D Gaussian Splatting for visual domain randomization enables robust monocular RGB-based dexterous in-hand reorientation on real hardware for multiple objects under varied lighting.
WARPED synthesizes realistic wrist-view observations from monocular egocentric human videos via foundation models, hand-object tracking, retargeting, and Gaussian Splatting to train visuomotor policies that match teleoperation success rates on five tabletop tasks with 5-8x less collection effort.
SynFlow creates a 34-times larger synthetic LiDAR scene flow dataset that lets models trained only on simulation match or beat supervised real-data baselines on multiple benchmarks.
A co-design framework learns task-specific hand shapes and complementary control policies, supporting design, training, fabrication, and deployment of new dexterous hands in under 24 hours.
A 2084-parameter recurrent policy trained by distilling 1000 RL teacher policies enables zero-shot control across 10 real quadrotors differing in mass, motors, frames, propellers, and flight controllers.
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
AutoVLA unifies semantic reasoning and trajectory planning in one autoregressive VLA model for end-to-end autonomous driving by tokenizing trajectories into discrete actions and using GRPO reinforcement fine-tuning to adaptively reduce unnecessary reasoning.
KAPPS is a knowledge-based CPPS architecture that uses an ontology-grounded knowledge graph as the unifying data backbone and authoritative write-time state for handling uncertainty in circular manufacturing, demonstrated via anomaly detection and constraint enforcement use cases.
In a discrete pulse-coupled oscillator model, synchronization is bimodal near a critical quorum-pulse balance, with noise and sparse connectivity suppressing multi-cluster states to favor global timing.
Systematic grasping strategies for paper-like materials are developed and tested with a soft gripper by exploiting environmental constraints to improve force control and success rates.
Single-agent LLM frameworks outperform naive multi-agent systems in multimodal clinical risk prediction tasks and are better calibrated.
WLDS applies large models with factual and logical calibration to produce diverse text-and-image deductions of emergency scenarios beyond what traditional fixed simulations can generate.
citing papers explorer
No citing papers match the current filters.