pith. machine review for the scientific record. sign in

arxiv: 2605.08526 · v1 · submitted 2026-05-08 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Skill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck

Julian McAuley, Junda Wu, Lina Yao, Qianqi Yan, Rohan Surana, Tong Yu, Uttaran Bhattacharya, Xin Eric Wang, Zihan Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-12 01:21 UTC · model grok-4.3

classification 💻 cs.LG
keywords multimodal skillsinformation bottleneckagent consistencyconditional compressionLLM agentsskill distillationvariational objectiveexecution stability
0
0 comments X

The pith

Conditional Multimodal Information Bottleneck distills reusable skills from vision and language to stabilize agent actions without extra sampling cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops CMIB to turn inconsistent trial-and-error trajectories into reusable multimodal skills for LLM-based agents. It first compresses information into interpretable text skill cards, then applies a second bottleneck that keeps only the perceptual details still useful once text is known. The sequential split is realized through a variational objective that avoids naive redundancy between modalities. If this works, agents can internalize stable behavior patterns and run them reliably on single passes rather than relying on repeated sampling at decision time. A reader would care because current agents often vary wildly across identical tasks, and this offers a way to bake in consistency during training.

Core claim

CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: a text-stage bottleneck distilling interpretable skill cards, followed by a conditional multimodal bottleneck that compresses only residual information in perception that remains predictive beyond text. Unlike naive two-stream approaches, the multimodal latent is explicitly conditioned on the text skill, which structurally reduces cross-modal redundancy. The decomposition is made tractable by a variational objective, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.

What carries the argument

Conditional Multimodal Information Bottleneck, which conditions the multimodal latent variable on the output of a preceding text-stage bottleneck to isolate residual perceptual information.

If this is right

  • Agents can store and reuse skills across repeated trials without retraining or extra decoding passes.
  • Textual skill cards stay human-interpretable while perceptual residuals are kept only when they add predictive value.
  • Compression ratios for language and vision can be tuned independently rather than traded off in a single shared latent.
  • Training produces skills that run at single-sample inference speed while matching the reliability of multi-sample methods.
  • The same skill construction can be applied to any sequence of multimodal trajectories collected from agent rollouts.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same conditional split might reduce redundancy in other multimodal tasks such as video captioning or robotic planning where text instructions accompany visual observations.
  • If the text bottleneck proves too lossy in new domains, the framework could be extended by adding a second verbalization stage after the perceptual residual is extracted.
  • Measuring how much perceptual information survives the conditional step would give a practical diagnostic for whether a given task truly needs vision beyond language.
  • The method opens a route to skills that are partially editable: a user could revise the text card and expect the perceptual component to adapt accordingly.

Load-bearing premise

A variational objective can enforce the exact sequential text-then-conditional-multimodal decomposition while still retaining every task-relevant perceptual cue not already captured in the text.

What would settle it

An ablation that trains the same model with a non-conditional joint multimodal bottleneck and shows no gain in action consistency over text-only skills, or a drop in performance when perceptual compression is applied.

Figures

Figures reproduced from arXiv: 2605.08526 by Julian McAuley, Junda Wu, Lina Yao, Qianqi Yan, Rohan Surana, Tong Yu, Uttaran Bhattacharya, Xin Eric Wang, Zihan Huang.

Figure 1
Figure 1. Figure 1: Skill-CMIB illustration. Equation (4) directly encodes the three design require￾ments of multimodal skill construction: Sufficiency is captured by the relevance terms I(c;Y) and I(z;Y | c), which preserve the information needed to predict Y, with the conditional term measuring the residual predic￾tive value beyond the text card. Minimality is imposed by the compression terms I((X, M); c) and I((X, M); z | … view at source ↗
Figure 2
Figure 2. Figure 2: Efficiency analysis. Skill-CMIB achieves higher task success rate with signifi￾cantly lower inference latency, demonstrating a favorable performance–cost trade-off. 2000 4000 6000 8000 10000 12000 14000 16000 18000 Latency (ms/step) 38.0 38.5 39.0 39.5 40.0 40.5 41.0 41.5 42.0 Step Success Rate (%) Qwen2.5-VL-7B CMIB (Ours) Self-Consistency (K=3) Self-Consistency (K=5) [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
read the original abstract

While LLM-based agents excel at planning and executing long action sequences, their execution often remains inconsistent across trials, limiting reliability. Consolidating agent consistency requires distilling trial-error trajectories into reusable skills that preserve task-relevant invariants while discarding trajectory-specific noise. However, in multimodal settings, the key challenge is not only that useful invariants are distributed across vision and language information, but that different modalities support different kinds of reusable skill content: while some skills are verbalizable and interpretable, others reside in perceptual evidence beyond text. Text-only skills may lose perceptual cues, whereas storing text and perception naively introduces redundancy and noise. Existing inference-time methods, such as self-consistency, improve reliability through costly multi-sample decoding, while internalization strategies lack a way to separate verbalizable skill content from residual perceptual information. To address this, we introduce Conditional Multimodal Information Bottleneck (CMIB), a method for multimodal skill construction. CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: (1) a text-stage bottleneck distilling interpretable skill cards, and (2) a conditional multimodal bottleneck compressing only residual information in perception that remains predictive beyond text. Unlike naive two-stream formulations, CMIB explicitly conditions the multimodal latent on the text skill, thus structurally reducing cross-modal redundancy and enabling independent control over textual and perceptual compression. We instantiate CMIB with a variational objective that makes its conditional decomposition tractable to optimize, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes Conditional Multimodal Information Bottleneck (CMIB) to construct reusable multimodal skills for LLM-based agents. Starting from a joint multimodal bottleneck over skills, it derives an exact sequential decomposition into (1) a text-stage bottleneck that produces interpretable skill cards and (2) a conditional multimodal bottleneck that compresses only residual perceptual information predictive beyond text. The decomposition is instantiated via a variational objective for tractability, with the goal of reducing cross-modal redundancy while improving execution stability without multi-sample inference overhead.

Significance. If the variational realization of the claimed exact decomposition can be shown to preserve task-relevant perceptual invariants not captured by text, the approach would provide a principled, efficient alternative to inference-time consistency methods such as self-consistency. This could meaningfully advance reliable long-horizon execution in multimodal agent settings by internalizing stable skills.

major comments (1)
  1. [Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.
minor comments (1)
  1. The abstract would benefit from explicit mention of the experimental domains, baselines, and quantitative metrics used to validate improved stability, as these are essential to assess whether the variational construction delivers the claimed benefits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address the single major comment below, clarifying the information-theoretic status of the decomposition versus its variational optimization.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.

    Authors: The sequential decomposition is exact at the level of the information-bottleneck objective: by the chain rule for mutual information, the joint multimodal IB objective decomposes without approximation into a text-stage bottleneck followed by a conditional multimodal bottleneck that compresses only residual perceptual information. This structural property holds independently of how the objective is optimized. The variational formulation (amortized inference and KL bounds) is introduced solely to make the decomposed objective tractable, exactly as in the original variational information bottleneck. While the bounds are not guaranteed to be tight, they do not alter the conditioning mechanism or the exact decomposition itself. Empirical results in the paper demonstrate that task-relevant perceptual invariants are retained, yielding improved execution stability. We acknowledge that the manuscript could more explicitly separate the exact decomposition from the variational approximation; we will revise the abstract and Section 3 to state this distinction clearly and add a short paragraph on approximation gaps. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation remains independent of fitted inputs or self-citations

full rationale

The paper claims an exact sequential decomposition of a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck, then instantiates this via a variational objective for tractability. No equations, fitted parameters, or self-citations are exhibited in the abstract or described structure that would make the claimed skill reusability, stability gains, or decomposition reduce by construction to the training inputs or prior author results. The variational instantiation is presented as an approximation step rather than a re-expression of the target outcome, and the central empirical claim (improved execution stability without multi-sample overhead) is positioned as a downstream consequence rather than a definitional identity. This satisfies the default expectation of a non-circular derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the variational objective is mentioned at a conceptual level without equations or assumptions listed.

pith-pipeline@v0.9.0 · 5601 in / 1091 out tokens · 35523 ms · 2026-05-12T01:21:17.576386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

100 extracted references · 100 canonical work pages · 9 internal anchors

  1. [1]

    2026 , eprint=

    When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents , author=. 2026 , eprint=

  2. [2]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

    Soft self-consistency improves language models agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

  3. [3]

    Self-Consistency Improves Chain of Thought Reasoning in Language Models

    Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=

  4. [4]

    Expert Systems with Applications , pages=

    Enhancing belief consistency of Large Language Model agents in decision-making process based on attribution theory , author=. Expert Systems with Applications , pages=. 2025 , publisher=

  5. [5]

    arXiv preprint arXiv:2509.19136 , year=

    On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language , author=. arXiv preprint arXiv:2509.19136 , year=

  6. [6]

    Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s

    Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761

  7. [7]

    Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs

    Reevaluating Self-Consistency Scaling in Multi-Agent Systems , author=. arXiv preprint arXiv:2511.00751 , year=

  8. [8]

    2026 , eprint=

    Self-Improvement of Language Models by Post-Training on Multi-Agent Debate , author=. 2026 , eprint=

  9. [9]

    The information bottleneck method

    The information bottleneck method , author=. arXiv preprint physics/0004057 , year=

  10. [10]

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

    Agent skills for large language models: Architecture, acquisition, security, and the path forward , author=. arXiv preprint arXiv:2602.12430 , year=

  11. [11]

    SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    SoK: Agentic Skills--Beyond Tool Use in LLM Agents , author=. arXiv preprint arXiv:2602.20867 , year=

  12. [12]

    arXiv preprint arXiv:2602.16653 , year=

    Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments , author=. arXiv preprint arXiv:2602.16653 , year=

  13. [13]

    arXiv preprint arXiv:2602.08004 , year=

    Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality , author=. arXiv preprint arXiv:2602.08004 , year=

  14. [14]

    SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks

    SkillsBench: Benchmarking how well agent skills work across diverse tasks , author=. arXiv preprint arXiv:2602.12670 , year=

  15. [15]

    arXiv preprint arXiv:2603.02176 , year=

    Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale , author=. arXiv preprint arXiv:2603.02176 , year=

  16. [16]

    Memp: Exploring Agent Procedural Memory

    Memp: Exploring agent procedural memory , author=. arXiv preprint arXiv:2508.06433 , year=

  17. [17]

    Learning hierarchical procedural memory for LLM agents through Bayesian selection and contrastive refinement.arXiv preprint arXiv:2512.18950, 2025

    Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement , author=. arXiv preprint arXiv:2512.18950 , year=

  18. [18]

    2026 , eprint=

    SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale , author=. 2026 , eprint=

  19. [19]

    Memento-skills: Let agents design agents

    Memento-Skills: Let Agents Design Agents , author=. arXiv preprint arXiv:2603.18743 , year=

  20. [20]

    Reinforcement learning for self-improving agent with skill library, 2025

    Reinforcement learning for self-improving agent with skill library , author=. arXiv preprint arXiv:2512.17102 , year=

  21. [21]

    Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

    ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents , author=. arXiv preprint arXiv:2602.01869 , year=

  22. [22]

    Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Russ Salakhutdinov, and Daniel Fried

    XSkill: Continual Learning from Experience and Skills in Multimodal Agents , author=. arXiv preprint arXiv:2603.12056 , year=

  23. [23]

    International conference on machine learning , pages=

    On variational bounds of mutual information , author=. International conference on machine learning , pages=. 2019 , organization=

  24. [24]

    arXiv preprint arXiv:2106.05469 , year=

    Variational information bottleneck for effective low-resource fine-tuning , author=. arXiv preprint arXiv:2106.05469 , year=

  25. [25]

    Infobot: Trans- fer and exploration via the information bottleneck,

    Infobot: Transfer and exploration via the information bottleneck , author=. arXiv preprint arXiv:1901.10902 , year=

  26. [26]

    Mind2Web: Towards a Generalist Agent for the Web , booktitle =

    Xiang Deng and Yu Gu and Boyuan Zheng and Shijie Chen and Samual Stevens and Boshi Wang and Huan Sun and Yu Su , editor =. Mind2Web: Towards a Generalist Agent for the Web , booktitle =. 2023 , url =

  27. [27]

    Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =

    Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

  28. [28]

    Android in the wild: A large-scale dataset for android device control

    Christopher Rawles and Alice Li and Daniel Rodriguez and Oriana Riva and Timothy P. Lillicrap , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2307.10088 , eprinttype =. 2307.10088 , timestamp =

  29. [29]

    2024 , eprint=

    Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction , author=. 2024 , eprint=

  30. [30]

    Danny Driess and Fei Xia and Mehdi S. M. Sajjadi and Corey Lynch and Aakanksha Chowdhery and Brian Ichter and Ayzaan Wahid and Jonathan Tompson and Quan Vuong and Tianhe Yu and Wenlong Huang and Yevgen Chebotar and Pierre Sermanet and Daniel Duckworth and Sergey Levine and Vincent Vanhoucke and Karol Hausman and Marc Toussaint and Klaus Greff and Andy Zen...

  31. [31]

    2023 , eprint=

    RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control , author=. 2023 , eprint=

  32. [32]

    2023 , eprint=

    Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=

  33. [33]

    Le and Ed H

    Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =

  34. [34]

    Anthropic Engineering Blog , year=

    Equipping agents for the real world with agent skills , author=. Anthropic Engineering Blog , year=

  35. [35]

    2026 , eprint=

    Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author=. 2026 , eprint=

  36. [36]

    Authorea Preprints , year=

    Agent Skills from the Perspective of Procedural Memory: A Survey , author=. Authorea Preprints , year=

  37. [37]

    Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =

    Simon Stepputtis and Joseph Campbell and Mariano J. Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =. Language-Conditioned Imitation Learning for Robot Manipulation Tasks , booktitle =. 2020 , url =

  38. [38]

    Learning Transferable Visual Models From Natural Language Supervision , booktitle =

    Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , editor =. Learning Transferable Visual Models From Natural Language Supervision , booktitle =. 2021 , url =

  39. [39]

    Botvinick and Yoshua Bengio and Sergey Levine , title =

    Anirudh Goyal and Riashat Islam and Daniel Strouse and Zafarali Ahmed and Hugo Larochelle and Matthew M. Botvinick and Yoshua Bengio and Sergey Levine , title =. 7th International Conference on Learning Representations,. 2019 , url =

  40. [40]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=

  41. [41]

    GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =

    Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su , editor =. GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =. 2024 , url =

  42. [42]

    URL https://doi.org/10.18653/v1/2024.acl-long.505

    Kanzhi Cheng and Qiushi Sun and Yougang Chu and Fangzhi Xu and Yantao Li and Jianbing Zhang and Zhiyong Wu , editor =. SeeClick: Harnessing. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.505 , timestamp =

  43. [43]

    2024 , eprint=

    EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data , author=. 2024 , eprint=

  44. [44]

    2025 , eprint=

    GUICourse: From General Vision Language Models to Versatile GUI Agents , author=. 2025 , eprint=

  45. [45]

    2024 , eprint=

    ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data , author=. 2024 , eprint=

  46. [46]

    2025 , eprint=

    AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials , author=. 2025 , eprint=

  47. [47]

    Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents , booktitle =

    Vardaan Pahuja and Yadong Lu and Corby Rosset and Boyu Gou and Arindam Mitra and Spencer Whitehead and Yu Su and Ahmed Hassan Awadallah , editor =. Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents , booktitle =. 2025 , url =

  48. [48]

    Qwen2.5-VL , url =

    Qwen Team , month =. Qwen2.5-VL , url =

  49. [49]

    2024 , url =

    Qiming Zhang and Jing Zhang and Yufei Xu and Dacheng Tao , title =. 2024 , url =. doi:10.1109/TPAMI.2023.3347693 , timestamp =

  50. [50]

    Importance sampling for multi-negative multimodal direct preference optimization

    Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization , author=. arXiv preprint arXiv:2509.25717 , year=

  51. [51]

    Second Conference on Language Modeling , year=

    A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models , author=. Second Conference on Language Modeling , year=

  52. [52]

    Advances in neural information processing systems , volume=

    Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=

  53. [53]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

  54. [54]

    arXiv preprint arXiv:2506.15757 , year=

    Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation , author=. arXiv preprint arXiv:2506.15757 , year=

  55. [55]

    From reviews to dialogues: Active synthesis for zero-shot llm-based conversational recommender system

    From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System , author=. arXiv preprint arXiv:2504.15476 , year=

  56. [56]

    Scenealign: Aligning multimodal reasoning to scene graphs in complex visual scenes

    SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes , author=. arXiv preprint arXiv:2601.05600 , year=

  57. [57]

    Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics

    Yu, Sheldon and Xiong, Yuxin and Wu, Junda and Li, Xintong and Yu, Tong and Chen, Xiang and Sinha, Ritwik and Shang, Jingbo and McAuley, Julian. Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.904

  58. [58]

    CTRLS: Chain-of-Thought Reasoning via Latent State-Transition , author=

  59. [59]

    Advances in neural information processing systems , volume=

    Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

  60. [60]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Decot: Debiasing chain-of-thought for knowledge-intensive tasks in large language models via causal intervention , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  61. [61]

    International Conference on Learning Representations , volume=

    Ocean: Offline chain-of-thought evaluation and alignment in large language models , author=. International Conference on Learning Representations , volume=

  62. [62]

    Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

    Doc-react: Multi-page heterogeneous document question-answering , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=

  63. [63]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    SAND: Boosting LLM agents with self-taught action deliberation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  64. [64]

    arXiv preprint arXiv:2507.23554 , year=

    Dice: Dynamic in-context example selection in llm agents via efficient knowledge transfer , author=. arXiv preprint arXiv:2507.23554 , year=

  65. [65]

    Active learning for direct preference optimization

    Active learning for direct preference optimization , author=. arXiv preprint arXiv:2503.01076 , year=

  66. [66]

    Image Difference Captioning via Adversarial Preference Optimization

    Huang, Zihan and Wu, Junda and Surana, Rohan and Yu, Tong and Arbour, David and Sinha, Ritwik and McAuley, Julian. Image Difference Captioning via Adversarial Preference Optimization. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1713

  67. [67]

    Pluralistic off-policy evaluation and alignment

    Pluralistic Off-policy Evaluation and Alignment , author=. arXiv preprint arXiv:2509.19333 , year=

  68. [68]

    Second Conference on Language Modeling , year=

    Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View , author=. Second Conference on Language Modeling , year=

  69. [69]

    Thirty-seventh Conference on Neural Information Processing Systems , year=

    InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

  70. [70]

    Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling

    Wu, Junda and Wang, Rui and Yu, Tong and Zhang, Ruiyi and Zhao, Handong and Li, Shuai and Henao, Ricardo and Nenkova, Ani. Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.251

  71. [71]

    Xia, Yu and Mukherjee, Subhojyoti and Xie, Zhouhang and Wu, Junda and Li, Xintong and Aponte, Ryan and Lyu, Hanjia and Barrow, Joe and Chen, Hongjie and Dernoncourt, Franck and Kveton, Branislav and Yu, Tong and Zhang, Ruiyi and Gu, Jiuxiang and Ahmed, Nesreen K. and Wang, Yu and Chen, Xiang and Deilamsalehy, Hanieh and Kim, Sungchul and Hu, Zhengmian and...

  72. [72]

    Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

    Dynamics-aware adaptation for reinforcement learning based cross-domain interactive recommendation , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=

  73. [73]

    Personalized multimodal large language models: A survey

    Personalized multimodal large language models: A survey , author=. arXiv preprint arXiv:2412.02142 , year=

  74. [74]

    arXiv preprint arXiv:2411.00027 , year=

    Personalization of large language models: A survey , author=. arXiv preprint arXiv:2411.00027 , year=

  75. [75]

    2026 , eprint=

    Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning , author=. 2026 , eprint=

  76. [76]

    Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

    Large language models and causal inference in collaboration: A comprehensive survey , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

  77. [77]

    arXiv preprint arXiv:2409.15723 , year=

    Federated large language models: Current progress and future directions , author=. arXiv preprint arXiv:2409.15723 , year=

  78. [78]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    CoMMIT: Coordinated Multimodal Instruction Tuning , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  79. [79]

    arXiv preprint arXiv:2602.12533 , year=

    AMPS: Adaptive Modality Preference Steering via Functional Entropy , author=. arXiv preprint arXiv:2602.12533 , year=

  80. [80]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Image Difference Captioning via Adversarial Preference Optimization , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

Showing first 80 references.