{"paper":{"title":"HARPO: Hierarchical Agentic Reasoning for User-Aligned Conversational Recommendation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"HARPO uses hierarchical preference learning and value-guided tree search to optimize conversational recommendations for multi-dimensional user quality.","cross_cats":[],"primary_cat":"cs.IR","authors_text":"Aman Vaibhav Jha, Mayank Anand, Sriparna Saha, Subham Raj","submitted_at":"2026-04-11T06:07:15Z","abstract_excerpt":"Conversational recommender systems (CRSs) operate under incremental preference revelation, requiring recommendation decisions under uncertainty. While recent LLM-based approaches achieve strong performance on proxy metrics such as Recall@K and BLEU, they often fail to deliver high-quality, user-aligned recommendations in practice, as they optimize intermediate objectives like retrieval accuracy or fluent generation rather than recommendation quality itself. We propose HARPO (Hierarchical Agentic Reasoning with Preference Optimization), an agentic framework that reframes conversational recommen"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"HARPO integrates hierarchical preference learning that decomposes recommendation quality into interpretable dimensions (relevance, diversity, predicted user satisfaction, and engagement) and learns context-dependent weights over these dimensions; (ii) deliberative tree-search reasoning guided by a learned value network that evaluates candidate reasoning paths based on predicted recommendation quality rather than task completion; and (iii) domain-agnostic reasoning abstractions through Virtual Tool Operations and multi-agent refinement, enabling transferable recommendation reasoning across domains. We evaluate HARPO on ReDial, INSPIRED, and MUSE, demonstrating consistent improvements over strong baselines on recommendation-centric metrics while maintaining competitive response quality.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the learned value network and context-dependent weights over the four quality dimensions accurately capture and optimize for actual user-aligned recommendation quality in real interactions, rather than merely correlating with the chosen proxy metrics on the evaluation datasets.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HARPO reframes conversational recommendation as hierarchical agentic reasoning with learned weights over quality dimensions and value-guided tree search, yielding better recommendation metrics on ReDial, INSPIRED, and MUSE.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"HARPO uses hierarchical preference learning and value-guided tree search to optimize conversational recommendations for multi-dimensional user quality.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7b95fa06806c5fc5c32e0ded2bd606d4569f4b8a0ca4e78c5b7fa1b103363110"},"source":{"id":"2604.10048","kind":"arxiv","version":2},"verdict":{"id":"684cce00-5cdc-4589-b699-65bb7b0cd35a","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-10T16:13:38.526345Z","strongest_claim":"HARPO integrates hierarchical preference learning that decomposes recommendation quality into interpretable dimensions (relevance, diversity, predicted user satisfaction, and engagement) and learns context-dependent weights over these dimensions; (ii) deliberative tree-search reasoning guided by a learned value network that evaluates candidate reasoning paths based on predicted recommendation quality rather than task completion; and (iii) domain-agnostic reasoning abstractions through Virtual Tool Operations and multi-agent refinement, enabling transferable recommendation reasoning across domains. We evaluate HARPO on ReDial, INSPIRED, and MUSE, demonstrating consistent improvements over strong baselines on recommendation-centric metrics while maintaining competitive response quality.","one_line_summary":"HARPO reframes conversational recommendation as hierarchical agentic reasoning with learned weights over quality dimensions and value-guided tree search, yielding better recommendation metrics on ReDial, INSPIRED, and MUSE.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the learned value network and context-dependent weights over the four quality dimensions accurately capture and optimize for actual user-aligned recommendation quality in real interactions, rather than merely correlating with the chosen proxy metrics on the evaluation datasets.","pith_extraction_headline":"HARPO uses hierarchical preference learning and value-guided tree search to optimize conversational recommendations for multi-dimensional user quality."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.10048/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}