{"work":{"id":"af60de99-112e-4dda-bc6b-5ec7381cfaad","openalex_id":null,"doi":null,"arxiv_id":"2408.07199","raw_key":null,"title":"Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents","authors":null,"authors_text":"Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg","year":2024,"venue":"cs.AI","abstract":"Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning, yet their application in agentic, multi-step reasoning within interactive environments remains a difficult challenge. Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities needed to perform complex decision-making in dynamic settings like web navigation. Previous attempts to bridge this ga-through supervised fine-tuning on curated expert demonstrations-often suffer from compounding errors and limited exploration data, resulting in sub-optimal policy outcomes. To overcome these challenges, we propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions using an off-policy variant of the Direct Preference Optimization (DPO) algorithm. Our method allows LLM agents to learn effectively from both successful and unsuccessful trajectories, thereby improving their generalization in complex, multi-step reasoning tasks. We validate our approach in the WebShop environment-a simulated e-commerce platform where it consistently outperforms behavior cloning and reinforced fine-tuning baseline, and beats average human performance when equipped with the capability to do online search. In real-world booking scenarios, our methodology boosts Llama-3 70B model's zero-shot performance from 18.6% to 81.7% success rate (a 340% relative increase) after a single day of data collection and further to 95.4% with online search. We believe this represents a substantial leap forward in the capabilities of autonomous agents, paving the way for more sophisticated and reliable decision-making in real-world settings.","external_url":"https://arxiv.org/abs/2408.07199","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-23T17:35:44.065778+00:00","pith_arxiv_id":"2408.07199","created_at":"2026-05-10T08:58:13.226568+00:00","updated_at":"2026-05-23T17:35:44.065778+00:00","title_quality_ok":true,"display_title":"Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents","render_title":"Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents"},"hub":{"state":{"work_id":"af60de99-112e-4dda-bc6b-5ec7381cfaad","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":24,"external_cited_by_count":null,"distinct_field_count":6,"first_pith_cited_at":"2024-11-23T16:03:35+00:00","last_pith_cited_at":"2026-05-20T12:17:43+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-04T05:47:19.967387+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":10}],"polarity_counts":[{"context_polarity":"background","n":10}],"runs":{},"summary":{},"graph":{},"authors":[]}}