Recognition: 2 theorem links
· Lean TheoremSkill-CMIB: Multimodal Agent Skill for Consistent Action via Conditional Multimodal Information Bottleneck
Pith reviewed 2026-05-12 01:21 UTC · model grok-4.3
The pith
Conditional Multimodal Information Bottleneck distills reusable skills from vision and language to stabilize agent actions without extra sampling cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: a text-stage bottleneck distilling interpretable skill cards, followed by a conditional multimodal bottleneck that compresses only residual information in perception that remains predictive beyond text. Unlike naive two-stream approaches, the multimodal latent is explicitly conditioned on the text skill, which structurally reduces cross-modal redundancy. The decomposition is made tractable by a variational objective, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.
What carries the argument
Conditional Multimodal Information Bottleneck, which conditions the multimodal latent variable on the output of a preceding text-stage bottleneck to isolate residual perceptual information.
If this is right
- Agents can store and reuse skills across repeated trials without retraining or extra decoding passes.
- Textual skill cards stay human-interpretable while perceptual residuals are kept only when they add predictive value.
- Compression ratios for language and vision can be tuned independently rather than traded off in a single shared latent.
- Training produces skills that run at single-sample inference speed while matching the reliability of multi-sample methods.
- The same skill construction can be applied to any sequence of multimodal trajectories collected from agent rollouts.
Where Pith is reading between the lines
- The same conditional split might reduce redundancy in other multimodal tasks such as video captioning or robotic planning where text instructions accompany visual observations.
- If the text bottleneck proves too lossy in new domains, the framework could be extended by adding a second verbalization stage after the perceptual residual is extracted.
- Measuring how much perceptual information survives the conditional step would give a practical diagnostic for whether a given task truly needs vision beyond language.
- The method opens a route to skills that are partially editable: a user could revise the text card and expect the perceptual component to adapt accordingly.
Load-bearing premise
A variational objective can enforce the exact sequential text-then-conditional-multimodal decomposition while still retaining every task-relevant perceptual cue not already captured in the text.
What would settle it
An ablation that trains the same model with a non-conditional joint multimodal bottleneck and shows no gain in action consistency over text-only skills, or a drop in performance when perceptual compression is applied.
Figures
read the original abstract
While LLM-based agents excel at planning and executing long action sequences, their execution often remains inconsistent across trials, limiting reliability. Consolidating agent consistency requires distilling trial-error trajectories into reusable skills that preserve task-relevant invariants while discarding trajectory-specific noise. However, in multimodal settings, the key challenge is not only that useful invariants are distributed across vision and language information, but that different modalities support different kinds of reusable skill content: while some skills are verbalizable and interpretable, others reside in perceptual evidence beyond text. Text-only skills may lose perceptual cues, whereas storing text and perception naively introduces redundancy and noise. Existing inference-time methods, such as self-consistency, improve reliability through costly multi-sample decoding, while internalization strategies lack a way to separate verbalizable skill content from residual perceptual information. To address this, we introduce Conditional Multimodal Information Bottleneck (CMIB), a method for multimodal skill construction. CMIB begins with a joint bottleneck over multimodal skills and derives an exact sequential decomposition: (1) a text-stage bottleneck distilling interpretable skill cards, and (2) a conditional multimodal bottleneck compressing only residual information in perception that remains predictive beyond text. Unlike naive two-stream formulations, CMIB explicitly conditions the multimodal latent on the text skill, thus structurally reducing cross-modal redundancy and enabling independent control over textual and perceptual compression. We instantiate CMIB with a variational objective that makes its conditional decomposition tractable to optimize, yielding reusable multimodal skills that improve execution stability without incurring multi-sample inference overhead.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Conditional Multimodal Information Bottleneck (CMIB) to construct reusable multimodal skills for LLM-based agents. Starting from a joint multimodal bottleneck over skills, it derives an exact sequential decomposition into (1) a text-stage bottleneck that produces interpretable skill cards and (2) a conditional multimodal bottleneck that compresses only residual perceptual information predictive beyond text. The decomposition is instantiated via a variational objective for tractability, with the goal of reducing cross-modal redundancy while improving execution stability without multi-sample inference overhead.
Significance. If the variational realization of the claimed exact decomposition can be shown to preserve task-relevant perceptual invariants not captured by text, the approach would provide a principled, efficient alternative to inference-time consistency methods such as self-consistency. This could meaningfully advance reliable long-horizon execution in multimodal agent settings by internalizing stable skills.
major comments (1)
- [Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.
minor comments (1)
- The abstract would benefit from explicit mention of the experimental domains, baselines, and quantitative metrics used to validate improved stability, as these are essential to assess whether the variational construction delivers the claimed benefits.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. We address the single major comment below, clarifying the information-theoretic status of the decomposition versus its variational optimization.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim rests on deriving an 'exact sequential decomposition' from a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck. However, the method is instantiated with a variational objective that relies on amortized inference and KL bounds, which are not guaranteed to be tight. It is therefore unclear whether the decomposition remains exact or whether task-critical perceptual cues are lost when conditioning the multimodal latent on the text latent, directly undermining the asserted improvement in execution stability while preserving full multimodal utility.
Authors: The sequential decomposition is exact at the level of the information-bottleneck objective: by the chain rule for mutual information, the joint multimodal IB objective decomposes without approximation into a text-stage bottleneck followed by a conditional multimodal bottleneck that compresses only residual perceptual information. This structural property holds independently of how the objective is optimized. The variational formulation (amortized inference and KL bounds) is introduced solely to make the decomposed objective tractable, exactly as in the original variational information bottleneck. While the bounds are not guaranteed to be tight, they do not alter the conditioning mechanism or the exact decomposition itself. Empirical results in the paper demonstrate that task-relevant perceptual invariants are retained, yielding improved execution stability. We acknowledge that the manuscript could more explicitly separate the exact decomposition from the variational approximation; we will revise the abstract and Section 3 to state this distinction clearly and add a short paragraph on approximation gaps. revision: partial
Circularity Check
No significant circularity; derivation remains independent of fitted inputs or self-citations
full rationale
The paper claims an exact sequential decomposition of a joint multimodal bottleneck into a text-stage bottleneck followed by a conditional multimodal bottleneck, then instantiates this via a variational objective for tractability. No equations, fitted parameters, or self-citations are exhibited in the abstract or described structure that would make the claimed skill reusability, stability gains, or decomposition reduce by construction to the training inputs or prior author results. The variational instantiation is presented as an approximation step rather than a re-expression of the target outcome, and the central empirical claim (improved execution stability without multi-sample overhead) is positioned as a downstream consequence rather than a definitional identity. This satisfies the default expectation of a non-circular derivation chain.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclearLemma 3.2 (Factorization underlying CMIB). The objective in Equation (3) admits an exact decomposition into a text-stage term involving c and a conditional multimodal term involving z given c... LCMIB = [I((X,M);c) - βc I(c;Y)] + [I((X,M);z|c) - βz I(z;Y|c)]
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclearWe instantiate CMIB with a variational objective that makes its conditional decomposition tractable to optimize, yielding reusable multimodal skills...
Reference graph
Works this paper leans on
-
[1]
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents , author=. 2026 , eprint=
work page 2026
-
[2]
Soft self-consistency improves language models agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=
-
[3]
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-consistency improves chain of thought reasoning in language models , author=. arXiv preprint arXiv:2203.11171 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Expert Systems with Applications , pages=
Enhancing belief consistency of Large Language Model agents in decision-making process based on attribution theory , author=. Expert Systems with Applications , pages=. 2025 , publisher=
work page 2025
-
[5]
arXiv preprint arXiv:2509.19136 , year=
On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language , author=. arXiv preprint arXiv:2509.19136 , year=
-
[6]
Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s
Aggarwal, Pranjal and Madaan, Aman and Yang, Yiming and Mausam. Let ' s Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLM s. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023. doi:10.18653/v1/2023.emnlp-main.761
-
[7]
Self-Consistency Is Losing Its Edge: Diminishing Returns and Rising Costs in Modern LLMs
Reevaluating Self-Consistency Scaling in Multi-Agent Systems , author=. arXiv preprint arXiv:2511.00751 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Self-Improvement of Language Models by Post-Training on Multi-Agent Debate , author=. 2026 , eprint=
work page 2026
-
[9]
The information bottleneck method
The information bottleneck method , author=. arXiv preprint physics/0004057 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward
Agent skills for large language models: Architecture, acquisition, security, and the path forward , author=. arXiv preprint arXiv:2602.12430 , year=
work page internal anchor Pith review arXiv
-
[11]
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
SoK: Agentic Skills--Beyond Tool Use in LLM Agents , author=. arXiv preprint arXiv:2602.20867 , year=
work page internal anchor Pith review arXiv
-
[12]
arXiv preprint arXiv:2602.16653 , year=
Agent Skill Framework: Perspectives on the Potential of Small Language Models in Industrial Environments , author=. arXiv preprint arXiv:2602.16653 , year=
-
[13]
arXiv preprint arXiv:2602.08004 , year=
Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality , author=. arXiv preprint arXiv:2602.08004 , year=
-
[14]
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
SkillsBench: Benchmarking how well agent skills work across diverse tasks , author=. arXiv preprint arXiv:2602.12670 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
arXiv preprint arXiv:2603.02176 , year=
Organizing, Orchestrating, and Benchmarking Agent Skills at Ecosystem Scale , author=. arXiv preprint arXiv:2603.02176 , year=
-
[16]
Memp: Exploring Agent Procedural Memory
Memp: Exploring agent procedural memory , author=. arXiv preprint arXiv:2508.06433 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement , author=. arXiv preprint arXiv:2512.18950 , year=
-
[18]
SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale , author=. 2026 , eprint=
work page 2026
-
[19]
Memento-skills: Let agents design agents
Memento-Skills: Let Agents Design Agents , author=. arXiv preprint arXiv:2603.18743 , year=
-
[20]
Reinforcement learning for self-improving agent with skill library, 2025
Reinforcement learning for self-improving agent with skill library , author=. arXiv preprint arXiv:2512.17102 , year=
-
[21]
Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents
ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents , author=. arXiv preprint arXiv:2602.01869 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
XSkill: Continual Learning from Experience and Skills in Multimodal Agents , author=. arXiv preprint arXiv:2603.12056 , year=
-
[23]
International conference on machine learning , pages=
On variational bounds of mutual information , author=. International conference on machine learning , pages=. 2019 , organization=
work page 2019
-
[24]
arXiv preprint arXiv:2106.05469 , year=
Variational information bottleneck for effective low-resource fine-tuning , author=. arXiv preprint arXiv:2106.05469 , year=
-
[25]
Infobot: Trans- fer and exploration via the information bottleneck,
Infobot: Transfer and exploration via the information bottleneck , author=. arXiv preprint arXiv:1901.10902 , year=
-
[26]
Mind2Web: Towards a Generalist Agent for the Web , booktitle =
Xiang Deng and Yu Gu and Boyuan Zheng and Shijie Chen and Samual Stevens and Boshi Wang and Huan Sun and Yu Su , editor =. Mind2Web: Towards a Generalist Agent for the Web , booktitle =. 2023 , url =
work page 2023
-
[27]
Shuyan Zhou and Frank F. Xu and Hao Zhu and Xuhui Zhou and Robert Lo and Abishek Sridhar and Xianyi Cheng and Tianyue Ou and Yonatan Bisk and Daniel Fried and Uri Alon and Graham Neubig , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =
work page 2024
-
[28]
Android in the wild: A large-scale dataset for android device control
Christopher Rawles and Alice Li and Daniel Rodriguez and Oriana Riva and Timothy P. Lillicrap , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2307.10088 , eprinttype =. 2307.10088 , timestamp =
-
[29]
Mobile-Env: Building Qualified Evaluation Benchmarks for LLM-GUI Interaction , author=. 2024 , eprint=
work page 2024
-
[30]
Danny Driess and Fei Xia and Mehdi S. M. Sajjadi and Corey Lynch and Aakanksha Chowdhery and Brian Ichter and Ayzaan Wahid and Jonathan Tompson and Quan Vuong and Tianhe Yu and Wenlong Huang and Yevgen Chebotar and Pierre Sermanet and Daniel Duckworth and Sergey Levine and Vincent Vanhoucke and Karol Hausman and Marc Toussaint and Klaus Greff and Andy Zen...
work page 2023
-
[31]
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control , author=. 2023 , eprint=
work page 2023
-
[32]
Reflexion: Language Agents with Verbal Reinforcement Learning , author=. 2023 , eprint=
work page 2023
-
[33]
Xuezhi Wang and Jason Wei and Dale Schuurmans and Quoc V. Le and Ed H. Chi and Sharan Narang and Aakanksha Chowdhery and Denny Zhou , title =. The Eleventh International Conference on Learning Representations,. 2023 , url =
work page 2023
-
[34]
Anthropic Engineering Blog , year=
Equipping agents for the real world with agent skills , author=. Anthropic Engineering Blog , year=
-
[35]
Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward , author=. 2026 , eprint=
work page 2026
-
[36]
Agent Skills from the Perspective of Procedural Memory: A Survey , author=. Authorea Preprints , year=
-
[37]
Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =
Simon Stepputtis and Joseph Campbell and Mariano J. Phielipp and Stefan Lee and Chitta Baral and Heni Ben Amor , editor =. Language-Conditioned Imitation Learning for Robot Manipulation Tasks , booktitle =. 2020 , url =
work page 2020
-
[38]
Learning Transferable Visual Models From Natural Language Supervision , booktitle =
Alec Radford and Jong Wook Kim and Chris Hallacy and Aditya Ramesh and Gabriel Goh and Sandhini Agarwal and Girish Sastry and Amanda Askell and Pamela Mishkin and Jack Clark and Gretchen Krueger and Ilya Sutskever , editor =. Learning Transferable Visual Models From Natural Language Supervision , booktitle =. 2021 , url =
work page 2021
-
[39]
Botvinick and Yoshua Bengio and Sergey Levine , title =
Anirudh Goyal and Riashat Islam and Daniel Strouse and Zafarali Ahmed and Hugo Larochelle and Matthew M. Botvinick and Yoshua Bengio and Sergey Levine , title =. 7th International Conference on Learning Representations,. 2019 , url =
work page 2019
-
[40]
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=
work page internal anchor Pith review arXiv
-
[41]
GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =
Boyuan Zheng and Boyu Gou and Jihyung Kil and Huan Sun and Yu Su , editor =. GPT-4V(ision) is a Generalist Web Agent, if Grounded , booktitle =. 2024 , url =
work page 2024
-
[42]
URL https://doi.org/10.18653/v1/2024.acl-long.505
Kanzhi Cheng and Qiushi Sun and Yougang Chu and Fangzhi Xu and Yantao Li and Jianbing Zhang and Zhiyong Wu , editor =. SeeClick: Harnessing. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.505 , timestamp =
-
[43]
EDGE: Enhanced Grounded GUI Understanding with Enriched Multi-Granularity Synthetic Data , author=. 2024 , eprint=
work page 2024
-
[44]
GUICourse: From General Vision Language Models to Versatile GUI Agents , author=. 2025 , eprint=
work page 2025
-
[45]
ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data , author=. 2024 , eprint=
work page 2024
-
[46]
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials , author=. 2025 , eprint=
work page 2025
-
[47]
Vardaan Pahuja and Yadong Lu and Corby Rosset and Boyu Gou and Arindam Mitra and Spencer Whitehead and Yu Su and Ahmed Hassan Awadallah , editor =. Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents , booktitle =. 2025 , url =
work page 2025
- [48]
-
[49]
Qiming Zhang and Jing Zhang and Yufei Xu and Dacheng Tao , title =. 2024 , url =. doi:10.1109/TPAMI.2023.3347693 , timestamp =
-
[50]
Importance sampling for multi-negative multimodal direct preference optimization
Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization , author=. arXiv preprint arXiv:2509.25717 , year=
-
[51]
Second Conference on Language Modeling , year=
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models , author=. Second Conference on Language Modeling , year=
-
[52]
Advances in neural information processing systems , volume=
Direct preference optimization: Your language model is secretly a reward model , author=. Advances in neural information processing systems , volume=
-
[53]
The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=
-
[54]
arXiv preprint arXiv:2506.15757 , year=
Weakly-supervised VLM-guided Partial Contrastive Learning for Visual Language Navigation , author=. arXiv preprint arXiv:2506.15757 , year=
-
[55]
From Reviews to Dialogues: Active Synthesis for Zero-Shot LLM-based Conversational Recommender System , author=. arXiv preprint arXiv:2504.15476 , year=
-
[56]
Scenealign: Aligning multimodal reasoning to scene graphs in complex visual scenes
SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes , author=. arXiv preprint arXiv:2601.05600 , year=
-
[57]
Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics
Yu, Sheldon and Xiong, Yuxin and Wu, Junda and Li, Xintong and Yu, Tong and Chen, Xiang and Sinha, Ritwik and Shang, Jingbo and McAuley, Julian. Explainable Chain-of-Thought Reasoning: An Empirical Analysis on State-Aware Reasoning Dynamics. Findings of the Association for Computational Linguistics: EMNLP 2025. 2025. doi:10.18653/v1/2025.findings-emnlp.904
-
[58]
CTRLS: Chain-of-Thought Reasoning via Latent State-Transition , author=
-
[59]
Advances in neural information processing systems , volume=
Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
-
[60]
Decot: Debiasing chain-of-thought for knowledge-intensive tasks in large language models via causal intervention , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
-
[61]
International Conference on Learning Representations , volume=
Ocean: Offline chain-of-thought evaluation and alignment in large language models , author=. International Conference on Learning Representations , volume=
-
[62]
Doc-react: Multi-page heterogeneous document question-answering , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages=
-
[63]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
SAND: Boosting LLM agents with self-taught action deliberation , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2025
-
[64]
arXiv preprint arXiv:2507.23554 , year=
Dice: Dynamic in-context example selection in llm agents via efficient knowledge transfer , author=. arXiv preprint arXiv:2507.23554 , year=
-
[65]
Active learning for direct preference optimization
Active learning for direct preference optimization , author=. arXiv preprint arXiv:2503.01076 , year=
-
[66]
Image Difference Captioning via Adversarial Preference Optimization
Huang, Zihan and Wu, Junda and Surana, Rohan and Yu, Tong and Arbour, David and Sinha, Ritwik and McAuley, Julian. Image Difference Captioning via Adversarial Preference Optimization. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1713
-
[67]
Pluralistic off-policy evaluation and alignment
Pluralistic Off-policy Evaluation and Alignment , author=. arXiv preprint arXiv:2509.19333 , year=
-
[68]
Second Conference on Language Modeling , year=
Traceable and Explainable Multimodal Large Language Models: An Information-Theoretic View , author=. Second Conference on Language Modeling , year=
-
[69]
Thirty-seventh Conference on Neural Information Processing Systems , year=
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
-
[70]
Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling
Wu, Junda and Wang, Rui and Yu, Tong and Zhang, Ruiyi and Zhao, Handong and Li, Shuai and Henao, Ricardo and Nenkova, Ani. Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling. Findings of the Association for Computational Linguistics: EMNLP 2022. 2022. doi:10.18653/v1/2022.findings-emnlp.251
-
[71]
Xia, Yu and Mukherjee, Subhojyoti and Xie, Zhouhang and Wu, Junda and Li, Xintong and Aponte, Ryan and Lyu, Hanjia and Barrow, Joe and Chen, Hongjie and Dernoncourt, Franck and Kveton, Branislav and Yu, Tong and Zhang, Ruiyi and Gu, Jiuxiang and Ahmed, Nesreen K. and Wang, Yu and Chen, Xiang and Deilamsalehy, Hanieh and Kim, Sungchul and Hu, Zhengmian and...
-
[72]
Dynamics-aware adaptation for reinforcement learning based cross-domain interactive recommendation , author=. Proceedings of the 45th international ACM SIGIR conference on research and development in information retrieval , pages=
-
[73]
Personalized multimodal large language models: A survey
Personalized multimodal large language models: A survey , author=. arXiv preprint arXiv:2412.02142 , year=
-
[74]
arXiv preprint arXiv:2411.00027 , year=
Personalization of large language models: A survey , author=. arXiv preprint arXiv:2411.00027 , year=
-
[75]
Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning , author=. 2026 , eprint=
work page 2026
-
[76]
Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
Large language models and causal inference in collaboration: A comprehensive survey , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=
work page 2025
-
[77]
arXiv preprint arXiv:2409.15723 , year=
Federated large language models: Current progress and future directions , author=. arXiv preprint arXiv:2409.15723 , year=
-
[78]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
CoMMIT: Coordinated Multimodal Instruction Tuning , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2025
-
[79]
arXiv preprint arXiv:2602.12533 , year=
AMPS: Adaptive Modality Preference Steering via Functional Entropy , author=. arXiv preprint arXiv:2602.12533 , year=
-
[80]
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
Image Difference Captioning via Adversarial Preference Optimization , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.