{"paper":{"title":"Neural Variance-aware Dueling Bandits with Deep Representation and Shallow Exploration","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Variance-aware neural algorithms for contextual dueling bandits achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(dT)).","cross_cats":["stat.ML"],"primary_cat":"cs.LG","authors_text":"Jaemin Park, Jinje Park, Taejin Paik, Youngmin Oh","submitted_at":"2025-06-02T01:58:48Z","abstract_excerpt":"We introduce the first variance-aware algorithms for contextual dueling bandits that leverage shallow exploration strategies with neural networks for nonlinear utility approximation. A key theoretical challenge is the absence of a closed-form estimator, which led prior work to require an extremely large network width $m$ (i.e., $m = \\widetilde{\\Omega}(T^{14})$). We address this constraint with a novel analytical approach that combines iterative self-improvement with spectral analysis. Our analysis significantly reduces the network width requirement to $m = \\widetilde{\\Omega}(T^{6})$, and shows"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Under standard assumptions, our algorithms achieve sublinear cumulative average regret of order O(d sqrt(sum_{t=1}^T sigma_t^2) + sqrt(d T)) for sufficiently wide neural networks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The neural networks must be sufficiently wide to approximate the unknown nonlinear utility functions, and the variance-aware exploration strategy must be effective when computed solely from last-layer gradients without requiring deeper network information.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Variance-aware neural algorithms for contextual dueling bandits achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(dT)).","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c104f98bf241e905f2687818ee7de2e43023026b31b600f38ec46d3306a09bdb"},"source":{"id":"2506.01250","kind":"arxiv","version":3},"verdict":{"id":"5d7dd0d9-1dd4-48f6-9834-7aab0d65d1c9","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T11:26:03.213117Z","strongest_claim":"Under standard assumptions, our algorithms achieve sublinear cumulative average regret of order O(d sqrt(sum_{t=1}^T sigma_t^2) + sqrt(d T)) for sufficiently wide neural networks.","one_line_summary":"Variance-aware neural dueling bandit algorithms achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(d T)) for wide networks on nonlinear utilities.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The neural networks must be sufficiently wide to approximate the unknown nonlinear utility functions, and the variance-aware exploration strategy must be effective when computed solely from last-layer gradients without requiring deeper network information.","pith_extraction_headline":"Variance-aware neural algorithms for contextual dueling bandits achieve sublinear regret of order O(d sqrt(sum sigma_t^2) + sqrt(dT))."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2506.01250/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}