arxiv: 2605.05833 · v1 · submitted 2026-05-07 · 💻 cs.AI

Recognition: unknown

On the Role of Language Representations in Auto-Bidding: Findings and Implications

Ersheng Ni, Guanyu Zhu, Hanwen Du, Hongji Li, Huacan Wang, Jincheng Fang, Jining Luan, Ronghao Chen, Sibo Xu, Xinyu Fang, Xuanqi Lan, Yiqi Sun, Yongxin Ni, Youhua Li

Authors on Pith no claims yet

Pith reviewed 2026-05-08 11:13 UTC · model grok-4.3

classification 💻 cs.AI

keywords auto-biddinglanguage model embeddingssemantic-numeric integrationoffline reinforcement learningconstraint satisfactiontrajectory token injectionreal-time advertising

0 comments

The pith

Injecting LLM-encoded semantics as tokens into bidding trajectories improves performance and constraint satisfaction in auto-bidding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Auto-bidding policies must maximize long-horizon value while satisfying delivery constraints such as budgets and CPA targets. Existing numerical state representations capture dynamics implicitly but offer little explicit control over high-level intent or strategic guidance. Preliminary studies establish that language model embeddings carry bidding-relevant cues yet cannot substitute for numerical features, and that performance gains appear only when semantics are integrated carefully rather than through simple concatenation. The paper therefore proposes a framework that encodes three semantic inputs—Task, History, and Strategy—from language models and injects them as tokens alongside numerical trajectory tokens. Self-attention then fuses the two streams, producing policies that outperform offline RL and generative sequence baselines with more stable results across scenarios and budget levels.

Core claim

The paper claims that language representations from LLMs contain useful bidding cues that become effective only when injected at the token level into offline trajectories and fused via self-attention with numerical features. The resulting framework, which supplies Task, History, and Strategy semantics as additional tokens, yields higher overall performance, better constraint satisfaction, and greater robustness than competitive baselines from offline reinforcement learning and generative sequence modeling across varied scenarios and budget regimes.

What carries the argument

SemBid, which encodes Task, History, and Strategy semantics from LLMs as tokens and injects them into numerical bidding trajectories for self-attention integration.

If this is right

SemBid produces more consistent gains than offline RL and generative sequence baselines across scenarios and budget regimes.
Constraint satisfaction and robustness both increase while numerical precision is preserved.
Explicit semantic inputs enable greater controllability and generalization across different campaign objectives.
Token-level injection avoids the loss of precision that occurs with naive concatenation of embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-level fusion pattern could be tested in other long-horizon control tasks that combine high-level instructions with precise numeric feedback, such as inventory management or energy bidding.
If the pattern holds, bidding systems might shift from hand-crafted numerical features toward hybrid semantic-numeric trajectories, reducing the cost of manual state engineering.
The finding that pure language representations are insufficient implies that any deployment must retain a numerical backbone rather than attempting end-to-end language-based bidding.

Load-bearing premise

That language model embeddings supply bidding-relevant cues which improve outcomes only when fused carefully with numerical features rather than used alone or concatenated naively.

What would settle it

Running SemBid on a fresh advertising dataset with previously unseen budget distributions and campaign objectives and finding no improvement over the strongest numerical baseline in either value or constraint metrics would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.05833 by Ersheng Ni, Guanyu Zhu, Hanwen Du, Hongji Li, Huacan Wang, Jincheng Fang, Jining Luan, Ronghao Chen, Sibo Xu, Xinyu Fang, Xuanqi Lan, Yiqi Sun, Yongxin Ni, Youhua Li.

**Figure 1.** Figure 1: Preliminary results. Left: Text embeddings predict view at source ↗

**Figure 2.** Figure 2: Overview of the SemBid framework. The model augments Decision Transformer with three complementary semantic view at source ↗

**Figure 3.** Figure 3: Average relative gains of SemBid under different view at source ↗

**Figure 4.** Figure 4: Impact of prompt formulation on AuctionNet-High view at source ↗

**Figure 5.** Figure 5: High-conversion template pool used in our experiments. view at source ↗

**Figure 6.** Figure 6: Low-conversion template pool used in our experiments. view at source ↗

**Figure 7.** Figure 7: Prompt templates for Directive style (high-conversion prompt-variant study): imperative commands with minimal view at source ↗

**Figure 8.** Figure 8: Prompt templates for Concise style (high-conversion prompt-variant study): minimal tokens with essential information view at source ↗

**Figure 9.** Figure 9: Prompt templates for Verbose style (high-conversion prompt-variant study): detailed explanations with full context view at source ↗

**Figure 10.** Figure 10: Prompt templates for Structured style (high-conversion prompt-variant study): label-value pairs with explicit field view at source ↗

read the original abstract

Auto-bidding is a crucial task in real-time advertising markets, where policies must optimize long-horizon value under delivery constraints (e.g., budget and CPA). Existing methods for auto-bidding rely on compact numerical state representations: while they can implicitly capture delivery dynamics, they offer limited support for explicitly representing and controlling high-level intent, evolving feedback, and operator-style strategic guidance in real campaigns. Meanwhile, Large Language Models (LLMs) offer a powerful method for encoding semantic information, it remains unclear when LLMs help and how to integrate them without sacrificing numerical precision. Through systematic preliminary studies, we find that (1) LLM embeddings contain bidding-relevant cues yet cannot replace numerical features, and (2) gains emerge only with careful semantic--numeric integration rather than naive concatenation. Motivated by these findings, we propose \textit{SemBid}, a novel auto-bidding framework that injects LLM-encoded semantics into offline bidding trajectories at the token level. SemBid introduces three semantic inputs: \textit{Task}, \textit{History}, and \textit{Strategy}. It injects these semantics as tokens alongside numerical trajectory tokens and uses self-attention to integrate them, improving controllability and generalization across objectives. Across diverse scenarios and budget regimes, SemBid outperforms competitive baselines from offline RL and generative sequence modeling, with more consistent gains in overall performance, constraint satisfaction, and robustness. Our code is available at: \href{https://github.com/AlanYu04/SemBid-KDD2026}{\textcolor{blue}{here}}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SemBid adds three LLM semantic tokens to bidding trajectories via self-attention and reports gains over offline RL baselines, but the experimental evidence stays thin on specifics.

read the letter

The main point is that this work shows a workable way to add high-level semantic control to auto-bidding without ditching the numerical features that actually enforce budgets and CPA targets. They encode Task, History, and Strategy descriptions as extra tokens, mix them into the trajectory with self-attention, and claim more consistent performance and constraint satisfaction across budget regimes than standard offline RL or generative sequence models. The code release is a plus for anyone who wants to inspect the implementation.

Referee Report

0 major / 3 minor

Summary. The paper investigates the role of LLM-based language representations in auto-bidding for real-time advertising auctions. Preliminary studies establish that LLM embeddings encode bidding-relevant cues but cannot substitute for numerical state features, and that performance gains require careful semantic-numeric integration rather than naive concatenation. Motivated by these findings, the authors propose SemBid, which augments offline bidding trajectories with three semantic token types (Task, History, Strategy) and fuses them via self-attention alongside numerical tokens. Empirical results across diverse scenarios and budget regimes claim that SemBid yields more consistent improvements than offline RL and generative sequence-modeling baselines in overall performance, constraint satisfaction, and robustness. Code is released.

Significance. If the empirical claims hold, the work offers actionable guidance on integrating semantic signals into numerical control policies without sacrificing precision, which could extend to other long-horizon constrained RL settings. The emphasis on preliminary ablation-style findings before proposing the fusion architecture is a constructive contribution, and the public code release supports reproducibility.

minor comments (3)

[Abstract] Abstract: the claim of 'consistent gains' across 'diverse scenarios and budget regimes' would benefit from a brief quantitative summary (e.g., number of scenarios, average relative improvement, or mention of statistical testing) to orient readers before the detailed experiments.
[Methods] Methods section: the precise tokenization and self-attention integration of the three semantic inputs (Task/History/Strategy) with numerical trajectory tokens should be illustrated with a diagram or pseudocode, as the current description leaves the exact fusion architecture ambiguous.
[Experiments] Experiments: baseline implementations (offline RL and generative sequence models) require explicit hyperparameter ranges, training budgets, and any data-filtering rules to allow fair replication; without these, the reported outperformance is harder to interpret.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of our work, including the value placed on our preliminary studies and the SemBid framework. The assessment correctly notes the empirical claims and code release. As no specific major comments were raised in the report, we interpret the minor_revision recommendation as guidance to polish presentation and ensure full reproducibility details are explicit in the final version.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's argument proceeds from preliminary empirical studies (finding that LLM embeddings carry bidding signals but cannot replace numerical features, and that naive concatenation fails) to the design of SemBid (token-level semantic-numeric fusion via self-attention on Task/History/Strategy tokens) and then to comparative evaluation against offline-RL and generative baselines. All load-bearing claims are experimental performance deltas across scenarios and budgets; no equations, fitted parameters, or self-citations are invoked to derive the reported gains by construction. The central result is an empirical comparison of trained policies on offline trajectories, which remains independent of any internal definitional reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so the ledger is minimally populated. The central claim rests on the empirical utility of LLM semantics when fused via self-attention; no explicit free parameters, background axioms, or newly postulated entities are described.

pith-pipeline@v0.9.0 · 5624 in / 1251 out tokens · 57328 ms · 2026-05-08T11:13:27.103946+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 13 canonical work pages · 7 internal anchors

[1]

Gagan Aggarwal, Ashwinkumar Badanidiyuru, Santiago R Balseiro, Kshipra Bhawalkar, Yuan Deng, Zhe Feng, Gagan Goel, Christopher Liaw, Haihao Lu, Mohammad Mahdian, et al. 2024. Auto-bidding and auctions in online advertising: A survey.ACM SIGecom Exchanges22, 1 (2024), 159–183

2024
[2]

Cameron Allen, Neev Parikh, Omer Gottesman, and George Konidaris. 2021. Learning markov state abstractions for deep reinforcement learning.Advances in Neural Information Processing Systems34 (2021), 8229–8241

2021
[3]

David Brandfonbrener, Alberto Bietti, Jacob Buckman, Romain Laroche, and Joan Bruna. 2022. When does return-conditioned supervised learning work for offline reinforcement learning?Advances in Neural Information Processing Systems35 (2022), 1542–1553

2022
[4]

Andrei Broder, Marcus Fontoura, Vanja Josifovski, and Lance Riedel. 2007. A semantic approach to contextual advertising. InProceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 559–566

2007
[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners.Advances in neural information processing systems33 (2020), 1877–1901

2020
[6]

Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display adver- tising. InProceedings of the tenth ACM international conference on web search and data mining. 661–670

2017
[7]

Leng Cai, Junxuan He, Yikai Li, Junjie Liang, Yuanping Lin, Ziming Quan, Yawen Zeng, and Jin Xu. 2025. RTBAgent: A LLM-based Agent System for Real-Time Bidding. InCompanion Proceedings of the ACM on Web Conference 2025. 104–113

2025
[8]

Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Misha Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021. Decision transformer: Reinforcement learning via sequence modeling.Advances in neural information processing systems34 (2021), 15084–15097

2021
[9]

Jinren Ding, Xuejian Xu, Shen Jiang, Zhitong Hao, Jinhui Yang, and Peng Jiang
[10]

C2: Cross learning module enhanced decision transformer with Constraint- aware loss for auto-bidding.arXiv preprint arXiv:2601.20257(2026)

work page arXiv 2026
[11]

Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords.American economic review97, 1 (2007), 242–259

2007
[12]

Dylan J Foster, Akshay Krishnamurthy, David Simchi-Levi, and Yunzong Xu
[13]

Offline reinforcement learning: Fundamental barriers for value function approximation.arXiv preprint arXiv:2111.10919(2021)

work page arXiv 2021
[14]

Scott Fujimoto and Shixiang Shane Gu. 2021. A minimalist approach to offline reinforcement learning.Advances in neural information processing systems34 (2021), 20132–20145

2021
[15]

Scott Fujimoto, David Meger, and Doina Precup. 2019. Off-policy deep rein- forcement learning without exploration. InInternational conference on machine learning. PMLR, 2052–2062

2019
[16]

Jingtong Gao, Bo Chen, Menghui Zhu, Xiangyu Zhao, Xiaopeng Li, Yuhao Wang, Yichao Wang, Huifeng Guo, and Ruiming Tang. 2024. Hierrec: Scenario-aware hierarchical modeling for multi-scenario recommendations. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 653–662

2024
[17]

Jingtong Gao, Yewen Li, Shuai Mao, Peng Jiang, Nan Jiang, Yejing Wang, Qing- peng Cai, Fei Pan, Peng Jiang, Kun Gai, et al. 2025. Generative auto-bidding with value-guided explorations. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 244–254

2025
[18]

Team GLM, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, et al. 2024. Chatglm: A fam- ily of large language models from glm-130b to glm-4 all tools.arXiv preprint arXiv:2406.12793(2024)

work page internal anchor Pith review arXiv 2024
[19]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)

work page internal anchor Pith review arXiv 2025
[20]

Jiayan Guo, Yusen Huo, Zhilin Zhang, Tianyu Wang, Chuan Yu, Jian Xu, Yan Zhang, and Bo Zheng. 2024. AIGB: Generative Auto-bidding via Conditional Diffusion Modeling. arXiv:2405.16141 [cs.LG] https://arxiv.org/abs/2405.16141

work page arXiv 2024
[21]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. 2024. Qwen2. 5-coder technical report.arXiv preprint arXiv:2409.12186(2024)

work page internal anchor Pith review arXiv 2024
[22]

Michael Janner, Qiyang Li, and Sergey Levine. 2021. Offline reinforcement learning as one big sequence modeling problem.Advances in neural information processing systems34 (2021), 1273–1286

2021
[23]

Hao Jiang, Yongxiang Tang, Yanxiang Zeng, Pengjia Yuan, Yanhua Cheng, Teng Sha, Xialong Liu, and Peng Jiang. 2025. Optimal Return-to-Go Guided Decision Transformer for Auto-Bidding in Advertisement. InCompanion Proceedings of the ACM on Web Conference 2025. 1033–1037

2025
[24]

Junqi Jin, Chengru Song, Han Li, Kun Gai, Jun Wang, and Weinan Zhang. 2018. Real-time bidding with multi-agent reinforcement learning in display advertis- ing. InProceedings of the 27th ACM international conference on information and knowledge management. 2193–2201

2018
[25]

Ilya Kostrikov, Ashvin Nair, and Sergey Levine. 2021. Offline reinforcement learning with implicit q-learning.arXiv preprint arXiv:2110.06169(2021)

work page internal anchor Pith review arXiv 2021
[26]

Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. 2020. Conserva- tive q-learning for offline reinforcement learning.Advances in neural information processing systems33 (2020), 1179–1191

2020
[27]

Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. 2020. Offline rein- forcement learning: Tutorial, review, and perspectives on open problems.arXiv preprint arXiv:2005.01643(2020)

work page internal anchor Pith review arXiv 2020
[28]

Haoming Li, Yusen Huo, Shuai Dou, Zhenzhe Zheng, Zhilin Zhang, Chuan Yu, Jian Xu, and Fan Wu. 2024. Trajectory-wise iterative reinforcement learning framework for auto-bidding. InProceedings of the ACM Web Conference 2024. 4193–4203

2024
[29]

Yewen Li, Shuai Mao, Jingtong Gao, Nan Jiang, Yunjian Xu, Qingpeng Cai, Fei Pan, Peng Jiang, and Bo An. 2025. GAS: Generative Auto-bidding with Post-training Search. InCompanion Proceedings of the ACM on Web Conference 2025. 315–324

2025
[30]

Ollie Liu, Deqing Fu, Dani Yogatama, and Willie Neiswanger. 2024. Dellma: A framework for decision making under uncertainty with large language models. arXiv preprint arXiv:2402.023921 (2024)

work page arXiv 2024
[31]

Zhiyu Mou, Yusen Huo, Rongquan Bai, Mingzhou Xie, Chuan Yu, Jian Xu, and Bo Zheng. 2022. Sustainable online reinforcement learning for auto-bidding. Advances in Neural Information Processing Systems35 (2022), 2651–2663

2022
[32]

Cuong V Nguyen, Sanjiv R Das, John He, Shenghua Yue, Vinay Hanumaiah, Xavier Ragot, and Li Zhang. 2021. Multimodal machine learning for credit mod- eling. In2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 1754–1759

2021
[33]

Machel Reid, Yutaro Yamada, and Shixiang Shane Gu. 2022. Can wikipedia help offline reinforcement learning?arXiv preprint arXiv:2201.12122(2022)

work page arXiv 2022
[34]

Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084(2019)

work page internal anchor Pith review arXiv 2019
[35]

Kefan Su, Yusen Huo, Zhilin Zhang, Shuai Dou, Chuan Yu, Jian Xu, Zongqing Lu, and Bo Zheng. 2024. Auctionnet: A novel benchmark for decision-making in large-scale games.Advances in Neural Information Processing Systems37 (2024), 94428–94452

2024
[36]

Faraz Torabi, Garrett Warnell, and Peter Stone. 2018. Behavioral cloning from observation.arXiv preprint arXiv:1805.01954(2018)

work page arXiv 2018
[37]

Jun Wang and Shuai Yuan. 2015. Real-time bidding: A new frontier of com- putational advertising research. InProceedings of the Eighth ACM International Conference on Web Search and Data Mining. 415–416

2015
[38]

Di Wu, Xiujun Chen, Xun Yang, Hao Wang, Qing Tan, Xiaoxun Zhang, Jian Xu, and Kun Gai. 2018. Budget constrained bidding by model-free reinforcement learning in display advertising. InProceedings of the 27th ACM International Conference on Information and Knowledge Management. 1443–1451

2018
[39]

Jian Xu, Kuang-chih Lee, Wentong Li, Hang Qi, and Quan Lu. 2015. Smart pacing for effective online ad campaign optimization. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 2217–2226

2015
[40]

Yong Yuan, Feiyue Wang, Juanjuan Li, and Rui Qin. 2014. A survey on real time bidding advertising. InProceedings of 2014 IEEE International Conference on Service Operations and Logistics, and Informatics. IEEE, 418–423

2014
[41]

Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, and Xiaofan Wang. 2016. Feedback control of real-time display advertising. InProceedings of the Ninth ACM International Conference on Web Search and Data Mining. 407–416

2016
[42]

LLM-Auction: Generative Auction towards LLM-Native Advertising

Chujie Zhao, Qun Hu, Shiping Song, Dagui Chen, Han Zhu, Jian Xu, and Bo Zheng. 2025. LLM-Auction: Generative Auction towards LLM-Native Advertising. arXiv preprint arXiv:2512.10551(2025). 9 A Preliminary Study: Detailed Results This appendix reports thecompleteset of preliminary experiments referenced in Section 3. In addition to the results summarized in...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

High pValue indi- cates good opportunity

Value Analysis (pValue):Evaluates the conversion probability 𝑝of the current impression batch. • High Opportunity( 𝑝> 0.01): Suggests “High pValue indi- cates good opportunity. ” • Low Opportunity( 𝑝< 0.001): Suggests “Low pValue sug- gests lower conversion potential. ”
[44]

Remaining budget is low. Bid conser- vatively

Budget Health:Evaluates the remaining budget ratio 𝑅𝑏 = 𝐵𝑙𝑒 𝑓 𝑡 /𝐵𝑡𝑜𝑡𝑎𝑙 . • Scarcity( 𝑅𝑏 < 0.2): “Remaining budget is low. Bid conser- vatively. ” • Abundance( 𝑅𝑏 > 0.7): “Remaining budget is sufficient. You can bid more aggressively. ”
[45]

Consider increasing the bid

Reference-based Guidance:To facilitate stable learning, we pro- vide bidding suggestions based on the magnitude of the suggested bid𝑏 𝑡 . •If𝑏 𝑡 >50: “Consider increasing the bid... ” (Aggressive) • If 𝑏𝑡 < 10: “Consider bidding conservatively... ” (Conserva- tive) • Otherwise: “Consider a balanced bidding approach. ” (Moder- ate) Note: These thresholds a...

2048