{"paper":{"title":"Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A probabilistic update rule lets hyperdimensional contextual bandits run on low-precision hardware while outperforming binarized versions and approaching full performance with 3 bits per component.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Amy Loutfi, Antonello Rosato, Denis Kleyko, Kevin Johansson, Marco Angioli","submitted_at":"2026-05-13T11:04:47Z","abstract_excerpt":"Contextual bandits (CB) are online sequential decision-making problems under partial feedback that underpin many adaptive services. There is a growing demand to deploy CB agents directly on-device, under strict constraints on memory, compute, and energy. However, standard linear CB algorithms are often impractical for resource-constrained devices with their unfavorable scaling in computational and memory costs. Recently, HD-CB, a CB approach based on hyperdimensional computing principles, has been proposed to model and solve CB problems by moving into high-dimensional spaces. HD-CB offers fast"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Off-policy evaluation on standardized synthetic CB benchmarks using the Open Bandit Pipeline shows that probabilistic HD-CB consistently outperforms binarized HD-CB at equal precision, while approaching the performance of HD-CB with as few as 3 bits per component.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the probabilistic update rule with random subset selection and time-decaying probability preserves enough learning information to match or exceed binarized HD-CB without introducing bias or instability that would appear only in full real-world deployments beyond the synthetic benchmarks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Probabilistic HD-CB outperforms binarized HD-CB and approaches full HD-CB performance on synthetic benchmarks using as few as 3 bits per component via random partial updates with time-decaying probability.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A probabilistic update rule lets hyperdimensional contextual bandits run on low-precision hardware while outperforming binarized versions and approaching full performance with 3 bits per component.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7e2b5e86eb374cdc60d0dd88b26fe6814e1e0b8242c5d97e4449a0c521edbeb3"},"source":{"id":"2605.13346","kind":"arxiv","version":1},"verdict":{"id":"75e0efb6-aa86-465d-9fed-26113c3adc81","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:05:43.806442Z","strongest_claim":"Off-policy evaluation on standardized synthetic CB benchmarks using the Open Bandit Pipeline shows that probabilistic HD-CB consistently outperforms binarized HD-CB at equal precision, while approaching the performance of HD-CB with as few as 3 bits per component.","one_line_summary":"Probabilistic HD-CB outperforms binarized HD-CB and approaches full HD-CB performance on synthetic benchmarks using as few as 3 bits per component via random partial updates with time-decaying probability.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the probabilistic update rule with random subset selection and time-decaying probability preserves enough learning information to match or exceed binarized HD-CB without introducing bias or instability that would appear only in full real-world deployments beyond the synthetic benchmarks.","pith_extraction_headline":"A probabilistic update rule lets hyperdimensional contextual bandits run on low-precision hardware while outperforming binarized versions and approaching full performance with 3 bits per component."},"references":{"count":30,"sample":[{"doi":"","year":2010,"title":"A contextual-bandit approach to personalized news article recommendation,","work_id":"c3de964a-43b8-4ade-81c2-c55cf2589ebd","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Artwork personalization at Netflix,","work_id":"018e34db-411d-4c9d-a3c2-34ff555d4beb","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"Multiworld testing decision service: A system for experimentation, learning, and decision-making,","work_id":"a19299dc-6d84-43f3-8a77-7039a24e48b9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"How The New York Times is experimenting with recommendation algorithms,","work_id":"f463b9ce-39f7-4714-8f3c-291f68524648","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"How we boosted app revenue by 10% with real-time personalization,","work_id":"3579cbbf-34c8-42c7-a0fc-49dbd861fbcd","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":30,"snapshot_sha256":"2cfe35fbffa60073ec65bdb48742df4c319d74dc29262bfd96fb5f3200028b9f","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"38958f8a23de0f028d5e31a3ab760284923b34f5562b53a904dd25ac85838ebc"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}