{"total":15,"items":[{"citing_arxiv_id":"2606.04840","ref_index":32,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Reinforcement Learning-Enabled Agent for Transmitter Optimization in Digital-Analog Radio-over-Fiber Fronthaul","primary_cat":"physics.optics","submitted_at":"2026-06-03T13:09:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reinforcement learning agent optimizes DA-RoF transmitter parameters to achieve up to 2.7 dB SNR gains for high-order QAM modulations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19029","ref_index":34,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-18T18:54:29+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces a Stein variational inference-based deterministic formulation for distributionally robust control in contact-rich robotic manipulation, reporting up to 3x improved robustness under parametric uncertainty.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25050","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DiscreteRTC: Discrete Diffusion Policies are Natural Asynchronous Executors","primary_cat":"cs.RO","submitted_at":"2026-04-27T23:04:03+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Casta ˜neda, N. Cherniadev, X. Da, R. Ding, L. Fan, Y . Fang, D. Fox, F. Hu, S. Huang, et al. Gr00t n1: An open foundation model for generalist humanoid robots.arXiv preprint arXiv:2503.14734, 2025. [7] P. Wang, Q. Liu, H. Lin, Y . Li, G. Zhan, M. Tomizuka, and Y . Wang. Dadp: Domain adaptive diffusion policy.arXiv preprint arXiv:2602.04037, 2026. [8] OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. J 'ozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba. Learning dexterous in-hand manipulation.CoRR, 2018. URL http://arxiv.org/abs/1808.00177. [9] S. An, Z. Meng, C. Tang, Y . Zhou, T. Liu, F. Ding, S."},{"citing_arxiv_id":"2604.06067","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"HiPolicy: Hierarchical Multi-Frequency Action Chunking for Policy Learning","primary_cat":"cs.RO","submitted_at":"2026-04-07T16:47:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"HiPolicy is a new hierarchical multi-frequency action chunking method for imitation learning that jointly generates coarse and fine action sequences with entropy-guided execution to improve performance and efficiency in robotic manipulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2412.02818","ref_index":36,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RoboMD: Uncovering Robot Vulnerabilities through Semantic Potential Fields","primary_cat":"cs.RO","submitted_at":"2024-12-03T20:34:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A deep RL vulnerability-prediction policy trained in semantic embedding space finds up to 23% more unique robot manipulation failures than vision-language baselines and enables more efficient fine-tuning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2405.14093","ref_index":230,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Survey on Vision-Language-Action Models for Embodied AI","primary_cat":"cs.RO","submitted_at":"2024-05-23T01:43:54+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, \"Learning dexterous in-hand manipulation,\"CoRR, vol. abs/1808.00177, 2018. [229] R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, \"Direct preference optimization: Your language model is secretly a reward model,\" inNeurIPS, 2023. [230] X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Doll 'ar, and C. L. Zitnick, \"Microsoft COCO captions: Data collection and evaluation server,\"CoRR, vol. abs/1504.00325, 2015. [231] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, \"VQA: visual question answering,\" inICCV. IEEE Computer Society, 2015, pp. 2425-2433."},{"citing_arxiv_id":"2108.10470","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning","primary_cat":"cs.RO","submitted_at":"2021-08-24T01:38:11+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Isaac Gym achieves 2-3 orders of magnitude faster robot policy training by keeping physics simulation and PyTorch-based RL entirely on GPU with direct buffer sharing.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"problems. Deep reinforcement learning (Deep RL) has achieved superhuman performance in very challenging tasks, ranging from classic strategy games such as Go and Chess [ 1], to real-time computer games like StarCraft [2] and DOTA [3]. It has also shown impressive results in robotic settings, including legged locomotion [4] and dexterous manipulation [5]. Simulators play a key role in training robots improving both the safety and iteration speed in the learning process. Training a humanoid robot that walks up and down stairs in the real world can lead to damage to its machinery and the environment, including humans that are working on the robot. An alternative is to train inside simulators that offer an efﬁcient and scalable platform via"},{"citing_arxiv_id":"1910.11215","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"RoboNet: Large-Scale Multi-Robot Learning","primary_cat":"cs.RO","submitted_at":"2019-10-24T15:20:03+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11388","ref_index":3,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Learning to Solve a Rubik's Cube with a Dexterous Hand","primary_cat":"cs.RO","submitted_at":"2019-07-26T06:09:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.04796","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Bayesian Optimization in Variational Latent Spaces with Dynamic Compression","primary_cat":"cs.RO","submitted_at":"2019-07-10T15:34:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Sequential VAE embeds simulated trajectories into latent paths for Bayesian optimization with dynamic compression to enable data-efficient high-dimensional controller tuning on robots.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.02057","ref_index":1,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Benchmarking Model-Based Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2019-07-03T17:53:02+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Introduces a benchmark suite of over 18 MBRL environments, evaluates multiple algorithms under consistent settings, and identifies three core challenges: dynamics bottleneck, planning horizon dilemma, and early-termination dilemma.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.01475","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Generalizing from a few environments in safety-critical reinforcement learning","primary_cat":"cs.LG","submitted_at":"2019-07-02T16:12:34+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.11633","ref_index":10,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ORRB -- OpenAI Remote Rendering Backend","primary_cat":"cs.GR","submitted_at":"2019-06-26T16:58:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"ORRB is an open-source remote rendering backend that pairs Unity3d with MuJoCo for high-throughput, customizable visual domain randomization in robotics environments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.10124","ref_index":2,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"On Multi-Agent Learning in Team Sports Games","primary_cat":"cs.MA","submitted_at":"2019-06-25T15:18:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"Describes a hierarchical RL method for multi-agent learning in team sports games aiming for human-like agents, reporting preliminary results that show promise.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1906.09868","ref_index":43,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks","primary_cat":"cs.CV","submitted_at":"2019-06-24T11:51:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after training only on synthetic data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}