{"paper":{"title":"InvEvolve: Evolving White-Box Inventory Policies via Large Language Models with Performance Guarantees","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"InvEvolve uses large language models to evolve white-box inventory policies with statistical safety guarantees and a lower bound on success probability.","cross_cats":["cs.AI"],"primary_cat":"cs.LG","authors_text":"Benyou Wang, Bo Jiang, Chenyu Huang, Jianghao Lin, Lai Wei, Ruoqing Jiang, Zhengyang Tang","submitted_at":"2026-05-01T03:12:16Z","abstract_excerpt":"We study how large language models can be used to generate inventory policies in online settings with non-stationary demand. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance on static and highly structured problems such as mathematical discovery, but is not directly suited to dynamic inventory settings with online updates. We propose InvEvolve, an end-to-end inventory policy evolution and inference framework grounded in confidence-interval-based certification. Built on a large language model trained via reinfo"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"InvEvolve evolves new policies that improve upon existing benchmarks and provides a lower bound on the probability that it evolves a statistically safe and improved policy, with outperformance shown on both synthetic data and real-world retail data.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The unified theoretical model correctly connects training, inference, and deployment to deliver a valid lower bound on the probability of a safe improved policy and an accurate characterization of the multi-period performance gap relative to the oracle-safe benchmark.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"InvEvolve uses large language models to evolve white-box inventory policies with statistical safety guarantees and a lower bound on success probability.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"85e3569805337103d300d55919b8eb47db95e478ecf142f32c1cb0a598e15145"},"source":{"id":"2605.00369","kind":"arxiv","version":4},"verdict":{"id":"a4e0c4ed-7f37-48a7-9727-488a20997162","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-12T02:31:00.648612Z","strongest_claim":"InvEvolve evolves new policies that improve upon existing benchmarks and provides a lower bound on the probability that it evolves a statistically safe and improved policy, with outperformance shown on both synthetic data and real-world retail data.","one_line_summary":"InvEvolve evolves white-box inventory policies from LLMs with statistical safety guarantees and outperforms classical and deep learning methods on synthetic and real retail data.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The unified theoretical model correctly connects training, inference, and deployment to deliver a valid lower bound on the probability of a safe improved policy and an accurate characterization of the multi-period performance gap relative to the oracle-safe benchmark.","pith_extraction_headline":"InvEvolve uses large language models to evolve white-box inventory policies with statistical safety guarantees and a lower bound on success probability."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.00369/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-20T20:33:39.722081Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T18:13:34.828293Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"07bbfa8089677d96b99a150e48df8fab58d50ec230b3eae99949509c2809afa9"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"26a06c37db4a49d33d2f4b27056a288654b03689f3e701f0806c8fd3f3c80b14"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}