Recognition: unknown
Discrete Preference Learning for Personalized Multimodal Generation
Pith reviewed 2026-05-09 23:25 UTC · model grok-4.3
The pith
A two-stage model learns discrete modal-specific preferences from user interactions and injects them into generators to produce personalized text and images with enforced cross-modal consistency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce DPPMG, a two-stage framework in which a modal-specific graph neural network learns users' modal-specific preferences from multimodal interactions and quantizes them into discrete preference tokens. These tokens are injected into downstream text and image generators, which are then fine-tuned with a cross-modal consistent and personalized reward that maintains alignment without eroding individual tailoring. Experiments on two real-world datasets show the resulting outputs are both personalized and consistent.
What carries the argument
Modal-specific graph neural network that learns and quantizes continuous preferences into discrete tokens, combined with a cross-modal consistency reward applied during generator fine-tuning.
If this is right
- Generators receive preference signals already formatted as the discrete tokens they expect, closing the input mismatch.
- A single reward term can enforce consistency between text and image outputs while the tokens keep the personalization intact.
- Preference modeling is handled upstream by a dedicated graph network rather than inside the generators themselves.
- The same discrete tokens can be reused across multiple generator architectures without retraining the preference stage.
Where Pith is reading between the lines
- The discretization step may allow preference signals to transfer more easily to new generators or modalities than raw continuous vectors would.
- If the quantization proves robust, similar token-based preference models could be tested on video or audio generation tasks by adding further modal-specific graph components.
- The separation of preference learning from generation opens a path to plug the same tokens into recommendation systems that already use discrete item embeddings.
Load-bearing premise
Turning continuous user preferences into discrete tokens still carries enough information to guide accurate personalization in the generators.
What would settle it
On the same datasets, a version that injects continuous rather than quantized preferences yields measurably higher personalization scores or lower cross-modal inconsistency rates.
Figures
read the original abstract
The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning for Personalized Multimodal Generation (DPPMG). In the first stage, to accurately learn discrete modal-specific preferences, we introduce a modal-specific graph neural network (a dedicated preference model) to learn users' modal-specific preferences, which preferences are then quantized into discrete preference tokens. In the second stage, the discrete modal-specific preference tokens are injected into downstream text and image generators. To further enhance cross-modal consistency while preserving personalization, we design a cross-modal consistent and personalized reward to fine-tune token-associated parameters. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model in generating personalized and consistent multimodal content.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Discrete Preference learning for Personalized Multimodal Generation (DPPMG), a two-stage framework for personalized multimodal content generation. Stage 1 employs a modal-specific graph neural network to learn continuous modal-specific user preferences from multimodal interactions and quantizes these into discrete preference tokens. Stage 2 injects the tokens into downstream text and image generators and fine-tunes token-associated parameters via a cross-modal consistent and personalized reward to improve consistency while retaining personalization. The authors assert that extensive experiments on two real-world datasets confirm the framework's effectiveness in producing personalized and consistent multimodal outputs.
Significance. If the empirical results and the information-preservation properties of the quantization step hold, the work would offer a concrete paradigm for bridging continuous preference modeling with discrete generator inputs in multimodal settings, addressing a recognized gap in personalized generation. The modular separation of preference learning from generation and the explicit cross-modal reward are potentially reusable contributions. The explicit framing of the two challenges (continuous-discrete gap and cross-modal inconsistency) and the two-stage design provide a clear structure that could be extended to other generators.
major comments (3)
- [Abstract / Experiments] Abstract and Experiments section: the central claim that the model generates 'personalized and consistent multimodal content' rests on 'extensive experiments on two real-world datasets' yet the manuscript provides no quantitative metrics, baseline comparisons, ablation results, or statistical tests. Without these, the effectiveness assertions cannot be evaluated and the load-bearing role of the discrete tokens remains unverified.
- [§3] §3 (Preference Modeling) and Challenge (1): the quantization of continuous modal-specific preferences into discrete tokens is presented as the solution to the continuous-to-discrete gap, but no reconstruction error, cosine similarity, or ablation comparing original continuous vectors to the resulting tokens is reported. Because every downstream generator result flows through these tokens, any information loss at this step is unrecoverable by the later reward stage and directly undermines the personalization claim.
- [§3.2] §3.2 (Reward Design): the cross-modal consistent and personalized reward is introduced to balance the two objectives, yet no analysis is given on how the reward weights are chosen or whether optimizing for consistency degrades the personalization signal already encoded in the discrete tokens. This interaction is load-bearing for the second-stage fine-tuning.
minor comments (2)
- [§3] Notation for the modal-specific GNN and the quantization operator is introduced without a clear equation or diagram showing the exact mapping from continuous preference vectors to discrete tokens.
- [Experiments] The two real-world datasets are mentioned but not named or characterized (size, modality distribution, user-item interaction statistics), which hinders reproducibility and comparison with prior work.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment point by point below. Where gaps in experimental reporting and analysis are identified, we will revise the manuscript to include the requested evidence and clarifications.
read point-by-point responses
-
Referee: [Abstract / Experiments] Abstract and Experiments section: the central claim that the model generates 'personalized and consistent multimodal content' rests on 'extensive experiments on two real-world datasets' yet the manuscript provides no quantitative metrics, baseline comparisons, ablation results, or statistical tests. Without these, the effectiveness assertions cannot be evaluated and the load-bearing role of the discrete tokens remains unverified.
Authors: We agree that the current manuscript does not present the quantitative metrics, baseline comparisons, ablation results, or statistical tests needed to substantiate the claims, even though the abstract references extensive experiments. This omission prevents proper evaluation of the framework. In the revised version, we will expand the Experiments section with tables reporting personalization and consistency metrics, comparisons to relevant baselines, ablations on key components including the discrete tokens, and statistical significance tests. The abstract will be updated to summarize these specific results. revision: yes
-
Referee: [§3] §3 (Preference Modeling) and Challenge (1): the quantization of continuous modal-specific preferences into discrete tokens is presented as the solution to the continuous-to-discrete gap, but no reconstruction error, cosine similarity, or ablation comparing original continuous vectors to the resulting tokens is reported. Because every downstream generator result flows through these tokens, any information loss at this step is unrecoverable by the later reward stage and directly undermines the personalization claim.
Authors: The referee is correct that direct evidence of information preservation through quantization is essential, given its central role in the pipeline. The manuscript currently lacks reconstruction error, cosine similarity, or targeted ablations comparing continuous preferences to the quantized tokens. We will add these analyses in the revision, including similarity metrics between original and quantized representations and an ablation isolating the quantization step to quantify any impact on downstream personalization performance. revision: yes
-
Referee: [§3.2] §3.2 (Reward Design): the cross-modal consistent and personalized reward is introduced to balance the two objectives, yet no analysis is given on how the reward weights are chosen or whether optimizing for consistency degrades the personalization signal already encoded in the discrete tokens. This interaction is load-bearing for the second-stage fine-tuning.
Authors: We acknowledge the absence of analysis on reward weight selection and potential trade-offs between consistency and personalization. The manuscript introduces the reward formulation but does not report hyperparameter sensitivity or experiments measuring whether consistency optimization affects the personalization encoded in the tokens. In the revision, we will expand §3.2 with a sensitivity study on the weights and additional results showing the effect on both consistency and personalization metrics. revision: yes
Circularity Check
No circularity: new framework components do not reduce to fitted inputs or self-citations by construction
full rationale
The derivation introduces a modal-specific GNN to extract continuous preferences from multimodal interactions, followed by explicit quantization into discrete tokens, injection into separate text/image generators, and a downstream cross-modal reward for fine-tuning. These steps are presented as sequential architectural choices to address stated challenges (continuous-to-discrete gap and consistency), without any equation that equates a claimed prediction back to a fitted parameter or renames an input quantity. No self-citation is invoked as a uniqueness theorem or load-bearing premise, and no 'prediction' is statistically forced by prior fitting on the same data. The framework therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circular patterns.
Axiom & Free-Parameter Ledger
invented entities (2)
-
discrete preference tokens
no independent evidence
-
modal-specific graph neural network
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Floren- cia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report.arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[2]
Xiang Ao, Xiting Wang, Ling Luo, Ying Qiao, Qing He, and Xing Xie. 2021. PENS: A dataset and generic framework for personalized news headline generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Lin- guistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 82–92
2021
-
[3]
Qibin Chen, Junyang Lin, Yichang Zhang, Hongxia Yang, Jingren Zhou, and Jie Tang. 2019. Towards knowledge-based personalized product description generation in e-commerce. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3040–3050
2019
-
[4]
Ting Chen, Lala Li, and Yizhou Sun. 2020. Differentiable product quantization for end-to-end embedding compression. InInternational Conference on Machine Learning. PMLR, 1617–1626
2020
-
[5]
Xingye Chen, Wei Feng, Zhenbang Du, Weizhen Wang, Yanyin Chen, Haohan Wang, Linkai Liu, Yaoyu Li, Jinyuan Zhao, Yu Li, et al. 2025. CTR-Driven Adver- tising Image Generation with Multimodal Large Language Models. InProceedings of the ACM on Web Conference 2025. 2262–2275
2025
-
[6]
Shuting Cui, Ying Sun, Yuting Zhang, Qingxin Meng, and Hengshu Zhu. 2026. LLM-enhanced Career Knowledge Graph Understanding for Job Mobility Predic- tion.ACM Transactions on Management Information Systems(2026)
2026
-
[7]
Shuqi Dai, Xichu Ma, Ye Wang, and Roger B Dannenberg. 2022. Personalised popular music generation using imitation and structure.Journal of New Music Research51, 1 (2022), 69–85
2022
-
[8]
Alaaeldin El-Nouby, Matthew J Muckley, Karen Ullrich, Ivan Laptev, Jakob Ver- beek, and Herve Jegou. [n. d.]. Image Compression with Product Quantized Masked Image Modeling.Transactions on Machine Learning Research([n. d.])
-
[9]
Guy Elad, Ido Guy, Slava Novgorodov, Benny Kimelfeld, and Kira Radinsky. 2019. Learning to generate personalized product descriptions. InProceedings of the 28th ACM International Conference on Information and Knowledge Management. 389–398
2019
-
[10]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-or. [n. d.]. An Image is Worth One Word: Per- sonalizing Text-to-Image Generation using Textual Inversion. InThe Eleventh International Conference on Learning Representations
-
[11]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[12]
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. 2020. Lightgcn: Simplifying and powering graph convolution network for recommendation. InProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 639–648
2020
-
[13]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural collaborative filtering. InProceedings of the 26th international conference on world wide web. 173–182
2017
-
[14]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30 (2017)
2017
- [15]
-
[16]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3
2022
-
[17]
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bo- janowski, Armand Joulin, and Edouard Grave. [n. d.]. Unsupervised Dense Infor- mation Retrieval with Contrastive Learning.Transactions on Machine Learning Research([n. d.])
-
[18]
Herve Jegou, Matthijs Douze, and Cordelia Schmid. 2010. Product quantization for nearest neighbor search.IEEE transactions on pattern analysis and machine intelligence33, 1 (2010), 117–128
2010
-
[19]
Yang Ji, Ying Sun, Yuting Zhang, Zhigaoyuan Wang, Yuanxin Zhuang, Zheng Gong, Dazhong Shen, Chuan Qin, Hengshu Zhu, and Hui Xiong. 2025. A com- prehensive survey on self-interpretable neural networks.Proc. IEEE(2025)
2025
-
[20]
Yang Jin, Kun Xu, Liwei Chen, Chao Liao, Jianchao Tan, Quzhe Huang, Bin CHEN, Chengru Song, Di ZHANG, Wenwu Ou, et al. [n. d.]. Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization. ([n. d.])
-
[21]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic opti- mization.arXiv preprint arXiv:1412.6980(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[22]
Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, and Wook-Shin Han. 2022. Autoregressive image generation using residual quantization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11523–11532
2022
-
[23]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. InText summarization branches out. 74–81
2004
-
[24]
Yoseph Linde, Andres Buzo, and Robert Gray. 2003. An algorithm for vector quantizer design.IEEE Transactions on communications28, 1 (2003), 84–95. Discrete Preference Learning for Personalized Multimodal Generation SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia
2003
-
[25]
Qidong Liu, Jiaxi Hu, Yutian Xiao, Xiangyu Zhao, Jingtong Gao, Wanyu Wang, Qing Li, and Jiliang Tang. 2024. Multimodal recommender systems: A survey. Comput. Surveys57, 2 (2024), 1–17
2024
-
[26]
Julieta Martinez, Holger H Hoos, and James J Little. 2014. Stacked quantizers for compositional vector compression.arXiv preprint arXiv:1411.2173(2014)
work page Pith review arXiv 2014
-
[27]
Fabian Mentzer, David Minnen, Eirikur Agustsson, and Michael Tschannen. [n. d.]. Finite Scalar Quantization: VQ-VAE Made Simple. InThe Twelfth International Conference on Learning Representations
-
[28]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. 2024. DINOv2: Learning Robust Visual Features without Supervision. Transactions on Machine Learning Research Journal(2024), 1–31
2024
-
[29]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318
2002
-
[30]
Manos Plitsis, Theodoros Kouzelis, Georgios Paraskevopoulos, Vassilis Katsouros, and Yannis Panagakis. 2024. Investigating personalization methods in text to music generation. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1081–1085
2024
-
[31]
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. [n. d.]. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. InThe Twelfth Interna- tional Conference on Learning Representations
-
[32]
Libo Qin, Qiguang Chen, Xiachong Feng, Yang Wu, Yongheng Zhang, Yinghui Li, Min Li, Wanxiang Che, and Philip S Yu. 2026. Large language models meet nlp: A survey.Frontiers of Computer Science20, 11 (2026), 2011361
2026
-
[33]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763
2021
-
[34]
Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme
-
[35]
InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
BPR: Bayesian personalized ranking from implicit feedback. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. 452–461
-
[36]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695
2022
-
[37]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention. Springer, 234–241
2015
-
[38]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 22500–22510
2023
-
[39]
Alireza Salemi, Surya Kallumadi, and Hamed Zamani. 2024. Optimization meth- ods for personalizing large language models through retrieval augmentation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 752–762
2024
-
[40]
Alireza Salemi, Sheshera Mysore, Michael Bendersky, and Hamed Zamani. 2024. Lamp: When large language models meet personalization. (2024), 7370–7392
2024
-
[41]
Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, and Xi Xiao. 2024. Pmg: Personalized multimodal generation with large language models. InProceedings of the ACM Web Conference 2024. 3833–3843
2024
-
[42]
Veronika Shilova, Ludovic Dos Santos, Flavian Vasile, Gaëtan Racic, and Ugo Tanielian. 2023. Adbooster: Personalized ad creative generation using stable diffusion outpainting. InWorkshop on Recommender Systems in Fashion and Retail. Springer, 73–93
2023
-
[43]
Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A survey of collaborative filtering techniques.Advances in artificial intelligence2009, 1 (2009), 421425
2009
-
[44]
Ying Sun, Yang Ji, Hengshu Zhu, Fuzhen Zhuang, Qing He, and Hui Xiong
-
[45]
Market-aware long-term job skill recommendation with explainable deep reinforcement learning.ACM Transactions on Information Systems43, 2 (2025), 1–35
2025
-
[46]
Zhaoxuan Tan, Zheyuan Liu, and Meng Jiang. 2024. Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts. InProceed- ings of the 2024 Conference on Empirical Methods in Natural Language Processing. 6459–6475
2024
-
[47]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
Aaron Van Den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning.Advances in neural information processing systems30 (2017)
2017
-
[49]
Sebastian T Vincent, Rowanne Sumner, Alice Dowek, Charlotte Blundell, Emily Preston, Chris Bayliss, Chris Oakley, and Carolina Scarton. 2023. Personalised language modelling of screen characters using rich metadata annotations.CoRR (2023)
2023
-
[50]
Steve Walker et al. 1995. Okapi at TREC-3. (1995)
1995
-
[51]
Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, et al. 2024. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution.arXiv preprint arXiv:2409.12191(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[52]
Xianquan Wang, Likang Wu, Shukang Yin, Zhi Li, Yanjiang Chen, Hufeng Hufeng, Yu Su, and Qi Liu. 2024. I-AM-G: Interest Augmented Multimodal Generator for Item Personalization. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 21303–21317
2024
-
[53]
Zhou Wang, Eero P Simoncelli, and Alan C Bovik. 2003. Multiscale structural similarity for image quality assessment. InThe thrity-seventh asilomar conference on signals, systems & computers, 2003, Vol. 2. Ieee, 1398–1402
2003
-
[54]
Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, et al . 2024. A survey on large language models for recommendation.World Wide Web27, 5 (2024), 60
2024
-
[55]
Shiwen Wu, Fei Sun, Wentao Zhang, Xu Xie, and Bin Cui. 2022. Graph neural networks in recommender systems: a survey.Comput. Surveys55, 5 (2022), 1–37
2022
-
[56]
Haoran Xin, Ying Sun, Chao Wang, and Hui Xiong. 2025. Llmcdsr: Enhancing cross-domain sequential recommendation with large language models.ACM Transactions on Information Systems43, 5 (2025), 1–33
2025
- [57]
-
[58]
Yiyan Xu, Wenjie Wang, Fuli Feng, Yunshan Ma, Jizhi Zhang, and Xiangnan He
-
[59]
InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval
Diffusion models for generative outfit recommendation. InProceedings of the 47th international ACM SIGIR conference on research and development in information retrieval. 1350–1359
-
[60]
Yiyan Xu, Wenjie Wang, Yang Zhang, Biao Tang, Peng Yan, Fuli Feng, and Xiangnan He. 2025. Personalized image generation with large multimodal models. InProceedings of the ACM on Web Conference 2025. 264–274
2025
-
[61]
Hao Yang, Jianxin Yuan, Shuai Yang, Linhe Xu, Shuo Yuan, and Yifan Zeng. 2024. A new creative generation pipeline for click-through rate with stable diffusion model. InCompanion Proceedings of the ACM Web Conference 2024. 180–189
2024
-
[62]
Lijun Yu, Jose Lezama, Nitesh Bharadwaj Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Agrim Gupta, Xiuye Gu, Alexander G Haupt- mann, et al. [n. d.]. Language Model Beats Diffusion-Tokenizer is key to visual generation. ([n. d.])
-
[63]
Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, and Marco Tagliasacchi. 2021. Soundstream: An end-to-end neural audio codec.IEEE/ACM Transactions on Audio, Speech, and Language Processing30 (2021), 495–507
2021
-
[64]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang
-
[65]
InProceedings of the IEEE conference on computer vision and pattern recognition
The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition. 586–595
-
[66]
Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep learning based recom- mender system: A survey and new perspectives.ACM computing surveys (CSUR) 52, 1 (2019), 1–38
2019
- [67]
-
[68]
Hanxun Zhong, Zhicheng Dou, Yutao Zhu, Hongjin Qian, and Ji-Rong Wen. 2022. Less is More: Learning to Refine Dialogue History for Personalized Dialogue Generation. (2022), 5808–5820
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.