arxiv: 2605.08526 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

Julian McAuley, Junda Wu, Lina Yao, Qianqi Yan, Rohan Surana, Tong Yu, Uttaran Bhattacharya, Xin Eric Wang, Zihan Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:21 UTC · model grok-4.3

classification 💻 cs.LG

keywords multimodal skillsinformation bottleneckagent consistencyconditional compressionLLM agentsskill distillationvariational objectiveexecution stability

0 comments

The pith

Conditional Multimodal Information Bottleneck distills reusable skills from vision and language to stabilize agent actions without extra sampling cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CMIB to turn inconsistent trial-and-error trajectories into reusable multimodal skills for LLM-based agents. It first compresses information into interpretable text skill cards, then applies a second bottleneck that keeps only the perceptual details still useful once text is known. The sequential split is realized through a variational objective that avoids naive redundancy between modalities. If this works, agents can internalize stable behavior patterns and run them reliably on single passes rather than relying on repeated sampling at decision time. A reader would care because current agents often vary wildly across identical tasks, and this offers a way to bake in consistency during training.

Core claim

CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: a text-stage bottleneck distilling interpretable skill cards, followed by a conditional multimodal bottleneck that compresses only residual information in perception that remains predictive beyond text. Unlike naive two-stream approaches, the multimodal latent is explicitly conditioned on the text skill, which structurally reduces cross-modal redundancy. The decomposition is made tractable by a variational objective, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.

What carries the argument

Conditional Multimodal Information Bottleneck, which conditions the multimodal latent variable on the output of a preceding text-stage bottleneck to isolate residual perceptual information.

If this is right

Agents can store and reuse skills across repeated trials without retraining or extra decoding passes.
Textual skill cards stay human-interpretable while perceptual residuals are kept only when they add predictive value.
Compression ratios for language and vision can be tuned independently rather than traded off in a single shared latent.
Training produces skills that run at single-sample inference speed while matching the reliability of multi-sample methods.
The same skill construction can be applied to any sequence of multimodal trajectories collected from agent rollouts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same conditional split might reduce redundancy in other multimodal tasks such as video captioning or robotic planning where text instructions accompany visual observations.
If the text bottleneck proves too lossy in new domains, the framework could be extended by adding a second verbalization stage after the perceptual residual is extracted.
Measuring how much perceptual information survives the conditional step would give a practical diagnostic for whether a given task truly needs vision beyond language.
The method opens a route to skills that are partially editable: a user could revise the text card and expect the perceptual component to adapt accordingly.

Load-bearing premise

A variational objective can enforce the exact sequential text-then-conditional-multimodal decomposition while still retaining every task-relevant perceptual cue not already captured in the text.

What would settle it

An ablation that trains the same model with a non-conditional joint multimodal bottleneck and shows no gain in action consistency over text-only skills, or a drop in performance when perceptual compression is applied.

Figures

Figures reproduced from arXiv: 2605.08526 by Julian McAuley, Junda Wu, Lina Yao, Qianqi Yan, Rohan Surana, Tong Yu, Uttaran Bhattacharya, Xin Eric Wang, Zihan Huang.

**Figure 1.** Figure 1: Skill-CMIB illustration. Equation (4) directly encodes the three design requirements of multimodal skill construction: Sufficiency is captured by the relevance terms I(c;Y) and I(z;Y | c), which preserve the information needed to predict Y, with the conditional term measuring the residual predictive value beyond the text card. Minimality is imposed by the compression terms I((X, M); c) and I((X, M); z | … view at source ↗

**Figure 2.** Figure 2: Efficiency analysis. Skill-CMIB achieves higher task success rate with significantly lower inference latency, demonstrating a favorable performance–cost trade-off. 2000 4000 6000 8000 10000 12000 14000 16000 18000 Latency (ms/step) 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5 42.0 Step Success Rate (%) Qwen2.5-VL-7B CMIB (Ours) Self-Consistency (K=3) Self-Consistency (K=5) [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

read the original abstract

While LLM-based agents excel at planning and executing long action sequences, their execution often remains inconsistent across trials, limiting reliability. Consolidating agent consistency requires distilling trial-error trajectories into reusable skills that preserve task-relevant invariants while discarding trajectory-specific noise. However, in multimodal settings, the key challenge is not only that useful invariants are distributed across vision and language information, but that different modalities support different kinds of reusable skill content: while some skills are verbalizable and interpretable, others reside in perceptual evidence beyond text. Text-only skills may lose perceptual cues, whereas storing text and perception naively introduces redundancy and noise. Existing inference-time methods, such as self-consistency, improve reliability through costly multi-sample decoding, while internalization strategies lack a way to separate verbalizable skill content from residual perceptual information. To address this, we introduce Conditional Multimodal Information Bottleneck (CMIB), a method for multimodal skill construction. CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: (1) a text-stage bottleneck distilling interpretable skill cards, and (2) a conditional multimodal bottleneck compressing only residual information in perception that remains predictive beyond text. Unlike naive two-stream formulations, CMIB explicitly conditions the multimodal latent on the text skill, thus structurally reducing cross-modal redundancy and enabling independent control over textual and perceptual compression. We instantiate CMIB with a variational objective that makes its conditional decomposition tractable to optimize, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's sequential conditional IB for multimodal agent skills is a clean framing of the consistency problem, but the variational exactness claim looks like the part that needs the most checking.

read the letter

The main takeaway is a method that first bottlenecks multimodal trajectories down to a text skill card, then runs a second bottleneck on the residual perceptual information conditioned on that card. This is meant to give reusable skills that stabilize long-horizon execution without paying for multiple samples at inference time. The abstract does a decent job spelling out why naive two-stream or plain IB approaches fall short on redundancy and why the conditioning step is supposed to help.

Referee Report

1 major / 1 minor

Summary. The paper proposes Conditional Multimodal Information Bottleneck (CMIB) to construct reusable multimodal skills for LLM-based agents. Starting from a joint multimodal bottleneck over skills, it derives an exact sequential decomposition into (1) a text-stage bottleneck that produces interpretable skill cards and (2) a conditional multimodal bottleneck that compresses only residual perceptual information predictive beyond text. The decomposition is instantiated via a variational objective for tractability, with the goal of reducing cross-modal redundancy while improving execution stability without multi-sample inference overhead.

Significance. If the variational realization of the claimed exact decomposition can be shown to preserve task-relevant perceptual invariants not captured by text, the approach would provide a principled, efficient alternative to inference-time consistency methods such as self-consistency. This could meaningfully advance reliable long-horizon execution in multimodal agent settings by internalizing stable skills.

major comments (1)

[Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.

minor comments (1)

The abstract would benefit from explicit mention of the experimental domains, baselines, and quantitative metrics used to validate improved stability, as these are essential to assess whether the variational construction delivers the claimed benefits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address the single major comment below, clarifying the information-theoretic status of the decomposition versus its variational optimization.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.

Authors: The sequential decomposition is exact at the level of the information-bottleneck objective: by the chain rule for mutual information, the joint multimodal IB objective decomposes without approximation into a text-stage bottleneck followed by a conditional multimodal bottleneck that compresses only residual perceptual information. This structural property holds independently of how the objective is optimized. The variational formulation (amortized inference and KL bounds) is introduced solely to make the decomposed objective tractable, exactly as in the original variational information bottleneck. While the bounds are not guaranteed to be tight, they do not alter the conditioning mechanism or the exact decomposition itself. Empirical results in the paper demonstrate that task-relevant perceptual invariants are retained, yielding improved execution stability. We acknowledge that the manuscript could more explicitly separate the exact decomposition from the variational approximation; we will revise the abstract and Section 3 to state this distinction clearly and add a short paragraph on approximation gaps. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation remains independent of fitted inputs or self-citations

full rationale

The paper claims an exact sequential decomposition of a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck, then instantiates this via a variational objective for tractability. No equations, fitted parameters, or self-citations are exhibited in the abstract or described structure that would make the claimed skill reusability, stability gains, or decomposition reduce by construction to the training inputs or prior author results. The variational instantiation is presented as an approximation step rather than a re-expression of the target outcome, and the central empirical claim (improved execution stability without multi-sample overhead) is positioned as a downstream consequence rather than a definitional identity. This satisfies the default expectation of a non-circular derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the variational objective is mentioned at a conceptual level without equations or assumptions listed.

pith-pipeline@v0.9.0 · 5601 in / 1091 out tokens · 35523 ms · 2026-05-12T01:21:17.576386+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
Lemma 3.2 (Factorization underlying CMIB). The objective in Equation (3) admits an exact decomposition into a text-stage term involving c and a conditional multimodal term involving z given c... LCMIB = [I((X,M);c) - βc I(c;Y)] + [I((X,M);z|c) - βz I(z;Y|c)]
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
We instantiate CMIB with a variational objective that makes its conditional decomposition tractable to optimize, yielding reusable multimodal skills...

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 9 internal anchors

[1]

2026 , eprint=

When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents , author=. 2026 , eprint=

work page 2026
[2]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Soft self-consistency improves language models agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

work page
[3]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Expert Systems with Applications , pages=

Enhancing belief consistency of Large Language Model agents in decision-making process based on attribution theory , author=. Expert Systems with Applications , pages=. 2025 , publisher=

work page 2025
[5]

arXiv preprint arXiv:2509.19136 , year=

On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language , author=. arXiv preprint arXiv:2509.19136 , year=

work page arXiv
[6]

Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

work page doi:10.18653/v1/2023.emnlp-main.761 2023
[7]

Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

Reevaluating Self-Consistency Scaling in Multi-Agent Systems , author=. arXiv preprint arXiv:2511.00751 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

2026 , eprint=

Self-Improvement of Language Models by Post-Training on Multi-Agent Debate , author=. 2026 , eprint=

work page 2026
[9]

The information bottleneck method

The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

Agent skills for large language models: Architecture, acquisition, security, and the path forward , author=. arXiv preprint arXiv:2602.12430 , year=

work page internal anchor Pith review arXiv
[11]

SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

SoK: Agentic Skills--Beyond Tool Use in LLM Agents , author=. arXiv preprint arXiv:2602.20867 , year=

work page internal anchor Pith review arXiv
[12]

arXiv preprint arXiv:2602.16653 , year=

Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments , author=. arXiv preprint arXiv:2602.16653 , year=

work page arXiv
[13]

arXiv preprint arXiv:2602.08004 , year=

Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality , author=. arXiv preprint arXiv:2602.08004 , year=

work page arXiv
[14]

SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

SkillsBench: Benchmarking how well agent skills work across diverse tasks , author=. arXiv preprint arXiv:2602.12670 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2603.02176 , year=

Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale , author=. arXiv preprint arXiv:2603.02176 , year=

work page arXiv
[16]

Memp: Exploring Agent Procedural Memory

Memp: Exploring agent procedural memory , author=. arXiv preprint arXiv:2508.06433 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[17]

Learning hierarchical procedural memory for LLM agents through Bayesian selection and contrastive refinement.arXiv preprint arXiv:2512.18950, 2025

Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement , author=. arXiv preprint arXiv:2512.18950 , year=

work page arXiv
[18]

2026 , eprint=

SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale , author=. 2026 , eprint=

work page 2026
[19]

Memento-skills: Let agents design agents

Memento-Skills: Let Agents Design Agents , author=. arXiv preprint arXiv:2603.18743 , year=

work page arXiv
[20]

Reinforcement learning for self-improving agent with skill library, 2025

Reinforcement learning for self-improving agent with skill library , author=. arXiv preprint arXiv:2512.17102 , year=

work page arXiv
[21]

Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents , author=. arXiv preprint arXiv:2602.01869 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried

XSkill: Continual Learning from Experience and Skills in Multimodal Agents , author=. arXiv preprint arXiv:2603.12056 , year=

work page arXiv
[23]

International conference on machine learning , pages=

On variational bounds of mutual information , author=. International conference on machine learning , pages=. 2019 , organization=

work page 2019
[24]

arXiv preprint arXiv:2106.05469 , year=

Variational information bottleneck for effective low-resource fine-tuning , author=. arXiv preprint arXiv:2106.05469 , year=

work page arXiv
[25]

Infobot: Trans- fer and exploration via the information bottleneck,

Infobot: Transfer and exploration via the information bottleneck , author=. arXiv preprint arXiv:1901.10902 , year=

work page arXiv 1901
[26]

Mind2Web: Towards a Generalist Agent for the Web , booktitle =

Xiang Deng and Yu Gu and Boyuan Zheng and Shijie Chen and Samual Stevens and Boshi Wang and Huan Sun and Yu Su , editor =. Mind2Web: Towards a Generalist Agent for the Web , booktitle =. 2023 , url =

work page 2023
[27]

Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =

Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024
[28]

Android in the wild: A large-scale dataset for android device control

Christopher Rawles and Alice Li and Daniel Rodriguez and Oriana Riva and Timothy P. Lillicrap , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2307.10088 , eprinttype =. 2307.10088 , timestamp =

work page doi:10.48550/arxiv.2307.10088 2023
[29]

2024 , eprint=

Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction , author=. 2024 , eprint=

work page 2024
[30]

Danny Driess and Fei Xia and Mehdi S. M. Sajjadi and Corey Lynch and Aakanksha Chowdhery and Brian Ichter and Ayzaan Wahid and Jonathan Tompson and Quan Vuong and Tianhe Yu and Wenlong Huang and Yevgen Chebotar and Pierre Sermanet and Daniel Duckworth and Sergey Levine and Vincent Vanhoucke and Karol Hausman and Marc Toussaint and Klaus Greff and Andy Zen...

work page 2023
[31]

2023 , eprint=

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control , author=. 2023 , eprint=

work page 2023
[32]

2023 , eprint=

Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=

work page 2023
[33]

Le and Ed H

Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

work page 2023
[34]

Anthropic Engineering Blog , year=

Equipping agents for the real world with agent skills , author=. Anthropic Engineering Blog , year=

work page
[35]

2026 , eprint=

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author=. 2026 , eprint=

work page 2026
[36]

Authorea Preprints , year=

Agent Skills from the Perspective of Procedural Memory: A Survey , author=. Authorea Preprints , year=

work page
[37]

Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =

Simon Stepputtis and Joseph Campbell and Mariano J. Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =. Language-Conditioned Imitation Learning for Robot Manipulation Tasks , booktitle =. 2020 , url =

work page 2020
[38]

Learning Transferable Visual Models From Natural Language Supervision , booktitle =

Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , editor =. Learning Transferable Visual Models From Natural Language Supervision , booktitle =. 2021 , url =

work page 2021
[39]

Botvinick and Yoshua Bengio and Sergey Levine , title =

Anirudh Goyal and Riashat Islam and Daniel Strouse and Zafarali Ahmed and Hugo Larochelle and Matthew M. Botvinick and Yoshua Bengio and Sergey Levine , title =. 7th International Conference on Learning Representations,. 2019 , url =

work page 2019
[40]

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=

work page internal anchor Pith review arXiv
[41]

GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =

Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su , editor =. GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =. 2024 , url =

work page 2024
[42]

URL https://doi.org/10.18653/v1/2024.acl-long.505

Kanzhi Cheng and Qiushi Sun and Yougang Chu and Fangzhi Xu and Yantao Li and Jianbing Zhang and Zhiyong Wu , editor =. SeeClick: Harnessing. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.505 , timestamp =

work page doi:10.18653/v1/2024.acl-long.505 2024
[43]

2024 , eprint=

EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data , author=. 2024 , eprint=

work page 2024
[44]

2025 , eprint=

GUICourse: From General Vision Language Models to Versatile GUI Agents , author=. 2025 , eprint=

work page 2025
[45]

2024 , eprint=

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data , author=. 2024 , eprint=

work page 2024
[46]

2025 , eprint=

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials , author=. 2025 , eprint=

work page 2025
[47]

Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents , booktitle =

Vardaan Pahuja and Yadong Lu and Corby Rosset and Boyu Gou and Arindam Mitra and Spencer Whitehead and Yu Su and Ahmed Hassan Awadallah , editor =. Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents , booktitle =. 2025 , url =

work page 2025
[48]

Qwen2.5-VL , url =

Qwen Team , month =. Qwen2.5-VL , url =

work page
[49]

2024 , url =

Qiming Zhang and Jing Zhang and Yufei Xu and Dacheng Tao , title =. 2024 , url =. doi:10.1109/TPAMI.2023.3347693 , timestamp =

work page doi:10.1109/tpami.2023.3347693 2024
[50]

Importance sampling for multi-negative multimodal direct preference optimization

Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization , author=. arXiv preprint arXiv:2509.25717 , year=

work page arXiv
[51]

Second Conference on Language Modeling , year=

A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models , author=. Second Conference on Language Modeling , year=

work page
[52]

Advances in neural information processing systems , volume=

Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

work page
[53]

The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

work page
[54]

arXiv preprint arXiv:2506.15757 , year=

Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation , author=. arXiv preprint arXiv:2506.15757 , year=

work page arXiv
[55]

From reviews to dialogues: Active synthesis for zero-shot llm-based conversational recommender system

From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System , author=. arXiv preprint arXiv:2504.15476 , year=

work page arXiv
[56]

Scenealign: Aligning multimodal reasoning to scene graphs in complex visual scenes

SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes , author=. arXiv preprint arXiv:2601.05600 , year=

work page arXiv
[57]

Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics

Yu, Sheldon and Xiong, Yuxin and Wu, Junda and Li, Xintong and Yu, Tong and Chen, Xiang and Sinha, Ritwik and Shang, Jingbo and McAuley, Julian. Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.904

work page doi:10.18653/v1/2025.findings-emnlp.904 2025
[58]

CTRLS: Chain-of-Thought Reasoning via Latent State-Transition , author=

work page
[59]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

work page
[60]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Decot: Debiasing chain-of-thought for knowledge-intensive tasks in large language models via causal intervention , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[61]

International Conference on Learning Representations , volume=

Ocean: Offline chain-of-thought evaluation and alignment in large language models , author=. International Conference on Learning Representations , volume=

work page
[62]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

Doc-react: Multi-page heterogeneous document question-answering , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

work page
[63]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

SAND: Boosting LLM agents with self-taught action deliberation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[64]

arXiv preprint arXiv:2507.23554 , year=

Dice: Dynamic in-context example selection in llm agents via efficient knowledge transfer , author=. arXiv preprint arXiv:2507.23554 , year=

work page arXiv
[65]

Active learning for direct preference optimization

Active learning for direct preference optimization , author=. arXiv preprint arXiv:2503.01076 , year=

work page arXiv
[66]

Image Difference Captioning via Adversarial Preference Optimization

Huang, Zihan and Wu, Junda and Surana, Rohan and Yu, Tong and Arbour, David and Sinha, Ritwik and McAuley, Julian. Image Difference Captioning via Adversarial Preference Optimization. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1713

work page doi:10.18653/v1/2025.emnlp-main.1713 2025
[67]

Pluralistic off-policy evaluation and alignment

Pluralistic Off-policy Evaluation and Alignment , author=. arXiv preprint arXiv:2509.19333 , year=

work page arXiv
[68]

Second Conference on Language Modeling , year=

Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View , author=. Second Conference on Language Modeling , year=

work page
[69]

Thirty-seventh Conference on Neural Information Processing Systems , year=

InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

work page
[70]

Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling

Wu, Junda and Wang, Rui and Yu, Tong and Zhang, Ruiyi and Zhao, Handong and Li, Shuai and Henao, Ricardo and Nenkova, Ani. Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.251

work page doi:10.18653/v1/2022.findings-emnlp.251 2022
[71]

Xia, Yu and Mukherjee, Subhojyoti and Xie, Zhouhang and Wu, Junda and Li, Xintong and Aponte, Ryan and Lyu, Hanjia and Barrow, Joe and Chen, Hongjie and Dernoncourt, Franck and Kveton, Branislav and Yu, Tong and Zhang, Ruiyi and Gu, Jiuxiang and Ahmed, Nesreen K. and Wang, Yu and Chen, Xiang and Deilamsalehy, Hanieh and Kim, Sungchul and Hu, Zhengmian and...

work page doi:10.18653/v1/2025.acl-long.708 2025
[72]

Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

Dynamics-aware adaptation for reinforcement learning based cross-domain interactive recommendation , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

work page
[73]

Personalized multimodal large language models: A survey

Personalized multimodal large language models: A survey , author=. arXiv preprint arXiv:2412.02142 , year=

work page arXiv
[74]

arXiv preprint arXiv:2411.00027 , year=

Personalization of large language models: A survey , author=. arXiv preprint arXiv:2411.00027 , year=

work page arXiv
[75]

2026 , eprint=

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning , author=. 2026 , eprint=

work page 2026
[76]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Large language models and causal inference in collaboration: A comprehensive survey , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

work page 2025
[77]

arXiv preprint arXiv:2409.15723 , year=

Federated large language models: Current progress and future directions , author=. arXiv preprint arXiv:2409.15723 , year=

work page arXiv
[78]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

CoMMIT: Coordinated Multimodal Instruction Tuning , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025
[79]

arXiv preprint arXiv:2602.12533 , year=

AMPS: Adaptive Modality Preference Steering via Functional Entropy , author=. arXiv preprint arXiv:2602.12533 , year=

work page arXiv
[80]

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Image Difference Captioning via Adversarial Preference Optimization , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

work page 2025

Showing first 80 references.