{"total":32,"items":[{"citing_arxiv_id":"2605.22123","ref_index":15,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations","primary_cat":"cs.RO","submitted_at":"2026-05-21T07:55:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework learns invariant symbolic reward functions from few demonstrations that generalize zero-shot to variations in robotic manipulation tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12421","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Formalize, Don't Optimize: The Heuristic Trap in LLM-Generated Combinatorial Solvers","primary_cat":"cs.AI","submitted_at":"2026-05-12T17:15:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"LLM-generated combinatorial solvers achieve highest correctness when the model formalizes problems for verified backends rather than attempting to optimize search, which often causes regressions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.11859","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models","primary_cat":"cs.RO","submitted_at":"2026-05-12T09:43:21+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EvoNav automates the design of reward functions for RL robot navigation by evolving LLM proposals through a three-stage cheap-to-expensive evaluation process and claims better policies than hand-crafted or prior automated rewards.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"biases that are difficult to detect [ 10, 38], and does not scale across environments where prior- ities shift or constraints tighten. Additionally, reward evaluation requires full pol- icy training and evaluation, creating a slow, ex- ∗Equal contribution. †Works from the AI4CO open research community. arXiv:2605.11859v1 [cs.RO] 12 May 2026 pensive feedback loop [29, 12, 6]. In settings like crowd navigation, where simulation is already costly and safety metrics require meaningful experience, exhaustive search over reward designs is impractical. As shown in Figure 1, large language models (LLMs) offer a new opportunity to address this challenge: by encoding broad world knowledge, LLMs can decompose natural language descriptions"},{"citing_arxiv_id":"2605.11359","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing","primary_cat":"cs.AI","submitted_at":"2026-05-12T00:24:30+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10118","ref_index":50,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Plan in Sandbox, Navigate in Open Worlds: Learning Physics-Grounded Abstracted Experience for Embodied Navigation","primary_cat":"cs.RO","submitted_at":"2026-05-11T07:34:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SAGE trains agents in physics-grounded semantic abstractions via RL with asymmetric clipping, achieving 53.21% LLM-Match Success on A-EQA (+9.7% over baseline) and encouraging physical robot transfer.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09423","ref_index":54,"ref_count":4,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning","primary_cat":"cs.AI","submitted_at":"2026-05-10T08:51:50+00:00","verdict":"ACCEPT","verdict_confidence":"MODERATE","novelty_score":8.0,"formal_verification":"none","one_line_summary":"SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"Generative agents: Interactive simulacra of human behavior, 2023. URL https://arxiv.org/abs/2304.03442. [53] Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, and Tim Rocktäschel. Evolving curricula with regret-based environment design. InInternational Conference on Machine Learning, pages 17473-17498. PMLR, 2022. [54] Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. Atiss: Autoregressive transformers for indoor scene synthesis.Advances in neural information processing systems, 34:12013-12026, 2021. [55] Nicholas Pfaff, Thomas Cohn, Sergey Zakharov, Rick Cory, and Russ Tedrake. Scenesmith: Agentic generation of simulation-ready indoor scenes."},{"citing_arxiv_id":"2605.07358","ref_index":50,"ref_count":4,"confidence":0.98,"is_internal_anchor":true,"paper_title":"A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications","primary_cat":"cs.IR","submitted_at":"2026-05-08T07:10:26+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"Generative Agents [30], GITM [31], RAP [32], Retroformer [33], MemGPT [34], TiM [35], Self-Discover [36], TextGrad [37], FINCON [38], M+ [39], Learned Memory Bank [40], Nemori [41], Intrinsic Memory [42], SkillForge [43] Code-Backed V oyager [12], SkillCraft [44], PolySkill [45], ASI [46], CUA-Skill [47], MetaGPT [6], Eureka [48], DS-Agent [49], LDB [50], CodeAct [51], SWE-agent [52], ToolCoder [53], PSN [54] Hybrid-BasedJARVIS-1 [55], Synapse [56], SkillWeaver [57], AgentSkillOS [58], TPTU [59], talker-reasoner [60], DAMCS [61], GraphSkill [62], Alita [63] Skill Acquisition (§IV) Human-DerivedSkillNet [64], AgentSkillOS [58], Agentic Skills [65], SkillOS [66], Agent Hospital [67] Experience-Derived"},{"citing_arxiv_id":"2605.06869","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Agentick: A Unified Benchmark for General Sequential Decision-Making Agents","primary_cat":"cs.AI","submitted_at":"2026-05-07T19:12:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Agentick is a new benchmark for sequential decision-making agents that evaluates RL, LLM, VLM, hybrid, and human approaches across 37 tasks and finds no single method dominates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.02600","ref_index":20,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-04T13:49:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"CoRAL lets LLMs act as adaptive cost designers for motion planners while using VLM priors and online identification to handle unknown physics, achieving over 50% higher success rates than baselines in unseen contact-rich robotic scenarios.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Integrating Foundation Models with Motion Planners and Controllers.Alternatively, foundation models can guide traditional motion planners using semantic understanding. V oxPoser [10] and IMPACT [17] use VLMs to generate static 3D cost maps for planners like MPPI or RRT*, while VLMPC [33] embeds a VLM within an MPC [6] loop. In code generation, Eureka [20] and DrEureka [21] synthesize RL reward functions but operateoffline. Closest to our approach, Language-to-Rewards (L2R) [31] generates rewards forreal- timeMPC. However, L2R relies on static physical assumptions and lacks mechanisms to correct model mismatches during execution. CoRAL advances this by elevating the LLM to a high-level strategist capable of online adaptation."},{"citing_arxiv_id":"2605.02073","ref_index":19,"ref_count":2,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Enhanced LLM Reasoning by Optimizing Reward Functions with Search-Driven Reinforcement Learning","primary_cat":"cs.CL","submitted_at":"2026-05-03T22:01:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Iterative search over reward functions with ranked feedback in GRPO training improves LLM math reasoning, achieving F1 of 0.795 on GSM8K versus 0.609 for baseline.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Modern language models can synthesize, cri- tique, and revise executable code, and they can participate directly in reward construction. Prior work has used LLMs as proxy reward functions, transformed language instructions into reward parameters, generated dense reward code from task descriptions, and iteratively improved reward programs through performance feedback [16], [17], [18], [19], [20], [21]. The common lesson is that reward design can be partially outsourced to a generative model when it is embedded in an iterative feedback loop. In parallel, reasoning-focused post- training has shown that RL can materially improve mathemat- ical and logical reasoning. Group Relative Policy Optimization (GRPO), introduced for DeepSeekMath, replaces the sepa-"},{"citing_arxiv_id":"2604.24043","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"A2DEPT: Large Language Model-Driven Automated Algorithm Design via Evolutionary Program Trees","primary_cat":"cs.AI","submitted_at":"2026-04-27T05:07:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A2DEPT generates complete algorithms for COPs using LLM-driven evolutionary program trees with hybrid selection and repair, reducing mean normalized optimality gap by 9.8% versus strongest AHD baselines on standard benchmarks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26969","ref_index":15,"ref_count":2,"confidence":0.98,"is_internal_anchor":true,"paper_title":"AgenticRecTune: Multi-Agent with Self-Evolving Skillhub for Recommendation System Optimization","primary_cat":"cs.IR","submitted_at":"2026-04-21T23:15:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"AgenticRecTune deploys five LLM agents (Actor, Critic, Insight, Skill, Online) and a self-evolving Skillhub to handle end-to-end configuration optimization for multi-stage recommendation systems.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.17191","ref_index":52,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Do LLM-derived graph priors improve multi-agent coordination?","primary_cat":"cs.LG","submitted_at":"2026-04-19T01:40:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM-generated coordination graph priors improve multi-agent reinforcement learning performance on MPE benchmarks, with models as small as 1.5B parameters proving effective.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10517","ref_index":14,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning","primary_cat":"cs.AI","submitted_at":"2026-04-12T08:14:49+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"EgoTSR applies a three-stage curriculum on a 46-million-sample dataset to build egocentric spatiotemporal reasoning, reaching 92.4% accuracy on long-horizon tasks and reducing chronological biases.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08664","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Generative Simulation for Policy Learning in Physical Human-Robot Interaction","primary_cat":"cs.RO","submitted_at":"2026-04-09T18:00:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A text-to-simulation pipeline using LLMs and VLMs generates synthetic pHRI data to train vision-based imitation learning policies that achieve over 80% success in zero-shot sim-to-real transfer on real assistive tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08508","ref_index":30,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Sumo: Dynamic and Generalizable Whole-Body Loco-Manipulation","primary_cat":"cs.RO","submitted_at":"2026-04-09T17:49:40+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Test-time steering of pre-trained whole-body policies via sample-based planning lets legged robots generalize dynamic loco-manipulation to varied heavy objects and tasks without additional training or tuning.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"cies [50, 24, 26] that typically interface with task-space tele- operation or follow motion references. By using MPC to steer such a policy online, we can synthesize dynamic behaviors from task objectives at deployment rather than depending on teleoperated or planner-generated references. Planning over pretrained robot policies has also been explored in tabletop manipulation [30, 16] where the base policy is trained via imitation learning, but to our knowledge has not been studied in dynamic, contact-rich loco-manipulation settings where the base policy is trained via RL. III. SUMO In this section, we provide a detailed description of our hierarchical control framework, which we call Sumo, that uses a high-level sample-based controller to steer a pre-"},{"citing_arxiv_id":"2604.05550","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery","primary_cat":"cs.CL","submitted_at":"2026-04-07T07:52:01+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.05226","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains","primary_cat":"cs.RO","submitted_at":"2026-04-06T22:42:05+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"model-authored instructions without sacrificing comparability or scientific rigor. II. RELATEDWORKS A. Evaluation and Benchmarking for Robotic Manipulation Robotic manipulation systems are most commonly evalu- ated using fixed benchmark suites composed of predefined task instances, environments, and success criteria, such as RL- Bench [11], LIBERO [21], RoboCasa [24], Behaviour-1k [17], ManiSkill [23], Colosseum [27], Simpler [18], and RoboEval [31]. While these benchmarks have been instrumental in standardizing evaluation, they define evaluation over a fixed and finite set of expert-authored tasks, with task structure, constraints, and success criteria encoded procedurally and not exposed for user modification."},{"citing_arxiv_id":"2604.02911","ref_index":26,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"Learning Task-Invariant Properties via Dreamer: Enabling Efficient Policy Transfer for Quadruped Robots","primary_cat":"cs.RO","submitted_at":"2026-04-03T09:27:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DreamTIP adds LLM-identified task-invariant properties as auxiliary targets in Dreamer's world model plus a mixed-replay adaptation step, delivering 28.1% average simulated transfer gains and 100% real-world climb success versus 10% for baselines.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"explicitly learning Task-Invariant Properties to enhance the model's robustness to dynamic variations, thereby better facilitating simulation-to-real transfer. C. LLM-driven robot skill learning Large language models are applied in quadruped robots across three main areas: reward modeling, motion control, and representation learning. In reward design, works like Eureka [26] show LLMs can automatically generate reward functions. For motion control, some studies use LLMs to convert language into intermediate commands (e.g., foot con- tact patterns) executed by reinforcement learning controllers [27], [28], or even directly output joint trajectories [29]. In representation learning, methods such as LESR [30] employ LLMs to improve state representations and intrinsic rewards,"},{"citing_arxiv_id":"2603.12145","ref_index":13,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Automatic Generation of High-Performance RL Environments","primary_cat":"cs.LG","submitted_at":"2026-03-12T16:45:47+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Closed-loop prompt-based translation with hierarchical verification and iterative repair produces equivalent high-performance RL environments across five cases including new TCGJax.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.20867","ref_index":47,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SoK: Agentic Skills -- Beyond Tool Use in LLM Agents","primary_cat":"cs.CR","submitted_at":"2026-02-24T13:11:38+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2602.18142","ref_index":17,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Toward Automated Virtual Electronic Control Unit (ECU) Twins for Shift-Left Automotive Software Testing","primary_cat":"cs.SE","submitted_at":"2026-02-20T11:03:46+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Prototype automates creation of virtual ECU twins via agentic feedback-driven modeling in SystemC to enable early shift-left software testing in automotive development.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.16615","ref_index":24,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"LLM-Guided Task- and Affordance-Level Exploration in Reinforcement Learning","primary_cat":"cs.RO","submitted_at":"2025-09-20T10:37:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"LLM-TALE steers RL exploration using LLM-generated plans at task and affordance levels with online suboptimality correction, improving sample efficiency and success rates on pick-and-place tasks without human supervision.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.19349","ref_index":238,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"ShinkaEvolve: Towards Open-Ended And Sample-Efficient Program Evolution","primary_cat":"cs.CL","submitted_at":"2025-09-17T17:49:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"ShinkaEvolve improves sample efficiency in LLM-driven program evolution via parent sampling, code novelty rejection-sampling, and bandit LLM ensemble selection, achieving new SOTA circle packing with 150 samples and gains on math reasoning and competitive programming tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.09674","ref_index":8,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning","primary_cat":"cs.RO","submitted_at":"2025-09-11T17:59:17+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SimpleVLA-RL applies tailored reinforcement learning to VLA models, reaching SoTA on LIBERO, outperforming π₀ on RoboTwin, and surpassing SFT in real-world tasks while reducing data needs and identifying a 'pushcut' phenomenon.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2509.00338","ref_index":37,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Scalable Option Learning in High-Throughput Environments","primary_cat":"cs.LG","submitted_at":"2025-08-30T03:42:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2506.04565","ref_index":118,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems","primary_cat":"cs.MA","submitted_at":"2025-06-05T02:34:43+00:00","verdict":"ACCEPT","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Vision encoder extracts high-level feature representations from raw visual inputs, which then transform pixel data into compact embeddings suitable for downstream multimodal processing. For instance, LLaVA [98] is a multimodal model combining a CLIP-based vision encoder with the Vicuna LLM, fine-tuned on GPT-4-generated multimodal instruction- following data. Additionally, MM1 [118] is a family of multimodal models trained using a carefully optimized mix of image-caption, interleaved image-text, and text-only data. MM1's model architecture includes the image encoder, vision-language connector, and LLM. Audio encoder converts raw audio signals into compact feature representations, capturing temporal and spectral characteristics, then processes waveform or spectrogram inputs for downstream multimodal tasks."},{"citing_arxiv_id":"2503.22020","ref_index":41,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models","primary_cat":"cs.CV","submitted_at":"2025-03-27T22:23:04+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"CoT-VLA is a 7B VLA that generates future visual frames autoregressively as planning goals before actions, outperforming prior VLAs by 17% on real-world tasks and 6% in simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2411.04983","ref_index":35,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning","primary_cat":"cs.RO","submitted_at":"2024-11-07T18:54:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.06292","ref_index":71,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery","primary_cat":"cs.AI","submitted_at":"2024-08-12T16:58:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2401.03568","ref_index":149,"ref_count":1,"confidence":0.98,"is_internal_anchor":true,"paper_title":"Agent AI: Surveying the Horizons of Multimodal Interaction","primary_cat":"cs.AI","submitted_at":"2024-01-07T19:11:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2307.05973","ref_index":84,"ref_count":1,"confidence":0.9,"is_internal_anchor":true,"paper_title":"VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models","primary_cat":"cs.RO","submitted_at":"2023-07-12T07:40:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"VoxPoser uses LLMs to compose 3D value maps via VLM interaction for model-based synthesis of robust robot trajectories on open-set language-specified manipulation tasks.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"Gileadi, C. Fu, S. Kirmani, K.-H. Lee, M. G. Arenas, H.-T. L. Chiang, T. Erez, L. Hasenclever, J. Humplik, et al. Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023. [83] H. Ha, P. Florence, and S. Song. Scaling up and distilling down: Language-guided robot skill acquisition. arXiv preprint arXiv:2307.14535, 2023. [84] Y . J. Ma, W. Liang, G. Wang, D.-A. Huang, O. Bastani, D. Jayaraman, Y . Zhu, L. Fan, and A. Anandkumar. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023. [85] T. Xie, S. Zhao, C. H. Wu, Y . Liu, Q. Luo, V . Zhong, Y . Yang, and T. Yu. Text2reward: Automated dense reward function generation for reinforcement learning."}],"limit":50,"offset":0}