{"work":{"id":"40702548-f094-4c67-a5db-a62f426f852e","openalex_id":null,"doi":null,"arxiv_id":"2306.09341","raw_key":null,"title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","authors":null,"authors_text":"Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao","year":2023,"venue":"cs.CV","abstract":"Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in previous datasets. By fine-tuning CLIP on HPD v2, we obtain Human Preference Score v2 (HPS v2), a scoring model that can more accurately predict human preferences on generated images. Our experiments demonstrate that HPS v2 generalizes better than previous metrics across various image distributions and is responsive to algorithmic improvements of text-to-image generative models, making it a preferable evaluation metric for these models. We also investigate the design of the evaluation prompts for text-to-image generative models, to make the evaluation stable, fair and easy-to-use. Finally, we establish a benchmark for text-to-image generative models using HPS v2, which includes a set of recent text-to-image models from the academic, community and industry. The code and dataset is available at https://github.com/tgxs002/HPSv2 .","external_url":"https://arxiv.org/abs/2306.09341","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-04T03:39:29.311928+00:00","pith_arxiv_id":"2306.09341","created_at":"2026-05-09T06:30:44.292343+00:00","updated_at":"2026-07-04T03:39:29.311928+00:00","title_quality_ok":true,"display_title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","render_title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis"},"hub":{"state":{"work_id":"40702548-f094-4c67-a5db-a62f426f852e","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":152,"external_cited_by_count":null,"distinct_field_count":5,"first_pith_cited_at":"2023-09-29T17:01:02+00:00","last_pith_cited_at":"2026-07-02T15:08:56+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-04T04:56:30.579594+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":17},{"context_role":"dataset","n":13},{"context_role":"method","n":4},{"context_role":"baseline","n":3},{"context_role":"other","n":3}],"polarity_counts":[{"context_polarity":"background","n":17},{"context_polarity":"use_dataset","n":12},{"context_polarity":"baseline","n":4},{"context_polarity":"use_method","n":4},{"context_polarity":"unclear","n":2},{"context_polarity":"support","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","claims":[{"claim_text":"Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in prev","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"[42] Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, et al. Omnigen2: Exploration to advanced multimodal generation.arXiv preprint arXiv:2506.18871, 2025. [43] Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, and Wenhu Chen. Editre- ward: A human-aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025. [44] Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, ","claim_type":"other","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"by excessively large advantages as well as inefficient learning under excessively small scales. Extensive experiments are conducted to evaluate the effectiveness of SLAS. Two backbones of different scales, SD1.4 (0.9B) [ 11] and FLUX.1 Dev (12B) [ 3] are utilized, with DanceGRPO [7] serving as the baseline under identical reward configurations and training settings, and both methods are trained on the publicly available HPD-v2 training set [12]. From the training dynamics, SLAS consistently outp","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"between low-rank pixel outputs and full-rank pixel targets. To our knowledge, this is the first practical path for turning existing large-scale latent flow models themselves into strong pixel generators. We evaluate AsymFlow in two settings. On ImageNet 256×256 [12], AsymFlow reaches 1.76 FID with the JiT-H/16 network [ 35] and 1.57 FID with an additional REPA loss [ 70], outperforming prior DiT/JiT-like pixel diffusion models by a large margin. For text-to-image generation, our pixel AsymFlow m","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"individual benchmarks, it exhibits clearly superior overall performance across multiple benchmarks, indicating higher data quality. Text-to-image Preference Reformulation.To validate the superiority of our reformulation method over the base- line method adopted in Omni-Reward [16], we first ran- domly sample 50k raw text-to-image preference data from EvalMuse [13] and HPDv2 [54]. We then reconstruct these preferences using the two methods, respectively, and train MRMs on the resulting datasets. ","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Further analysis in the fine-grand dimension shows that, compared to Seedream 2.0, Seedream 3.0 has improvements in most dimensions, especially in terms of objects, activities, locations, food, and space. To align with the previous reported results, Ideogram 2.0 is included in the assessment here and subsequent chapters. For image quality evaluation, we reuse two external metrics, HPSv2 [24] and MPS [26], and two internal evaluation models, Internal-Align and Internal-Aes. Seedream 3.0 ranks fir","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Together, LeapAlign makes early-step fine-tuning practical and stable. We fine-tune Flux [18] with LeapAlign and show the performance gains in Fig. 1. Moreover, compared with the state-of-the-art GRPO-based methods [22, 55] and direct-gradient methods [3, 52, 53], LeapAlign consistently performs better in image generation, reflected by better scores in HPSv2.1 [51], HPSv3 [32], PickScore [17], UnifiedReward [48], ImageReward [53], and image-text alignment on GenEval [10]. In summary, this paper ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (16 contexts).","role_counts":[{"n":16,"context_role":"background"},{"n":13,"context_role":"dataset"},{"n":4,"context_role":"method"},{"n":3,"context_role":"baseline"},{"n":3,"context_role":"other"}]},"error":null,"updated_at":"2026-05-23T07:04:16.875431+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"b2ebe93a-2133-4dcf-858e-dcb2a33fe2c1","orcid":null,"display_name":"Xiaoshi Wu"},{"id":"0155e1bb-6d4a-4831-9ead-081b8f1be231","orcid":null,"display_name":"Yiming Hao"},{"id":"2f1f493e-c401-45dd-9d2f-ae7e55e497fc","orcid":null,"display_name":"Keqiang Sun"},{"id":"46feace2-279d-43a8-8200-3076f1fd6905","orcid":null,"display_name":"Yixiong Chen"},{"id":"f594eee8-e9d2-41c5-9e75-8e58323846de","orcid":null,"display_name":"Feng Zhu"},{"id":"ffcd7d23-d2f6-4ae8-b9ac-f42695a19a5e","orcid":null,"display_name":"Rui Zhao"}]},"error":null,"updated_at":"2026-05-23T07:04:16.869525+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T12:20:48.379963+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"DanceGRPO: Unleashing GRPO on Visual Generation","work_id":"7404dd36-8f9c-478f-b089-ef9f8189c711","shared_citers":19},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":19},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":19},{"title":"Flow-GRPO: Training Flow Matching Models via Online RL","work_id":"bf1e8e81-ff31-401a-a5dc-d9c49df168ab","shared_citers":17},{"title":"MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE","work_id":"8b0ab84a-b7ea-46ea-a6ba-bf490a84d251","shared_citers":15},{"title":"Training Diffusion Models with Reinforcement Learning","work_id":"67684dda-3930-452a-b91a-36cbb8e2e219","shared_citers":15},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":14},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":14},{"title":"Score-Based Generative Modeling through Stochastic Differential Equations","work_id":"d9110e53-a5d4-4794-a4c5-a575e91c31ad","shared_citers":14},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":11},{"title":"DiffusionNFT: Online Diffusion Reinforcement with Forward Process","work_id":"0ed3cf57-36ba-4962-847e-7a8f5f99901d","shared_citers":11},{"title":"Denoising Diffusion Implicit Models","work_id":"8fa2128b-d18c-405c-ac92-0e669cf89ac0","shared_citers":10},{"title":"Unified Reward Model for Multimodal Understanding and Generation","work_id":"bf9fcf9a-1781-4008-960e-2bec1a717e4e","shared_citers":10},{"title":"FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space","work_id":"5dfe19d5-3541-4803-8fe9-3c8b9e29b281","shared_citers":9},{"title":"Qwen-Image Technical Report","work_id":"d06d7ecc-7579-4f89-a60b-4278a0f3c562","shared_citers":9},{"title":"Tempflow-grpo: When timing matters for grpo in flow models.arXiv preprint arXiv:2508.04324","work_id":"fecc731c-f8a2-4f00-ab1b-51b87a133726","shared_citers":9},{"title":"Aligning text-to-image models using human feedback.arXiv preprint arXiv:2302.12192","work_id":"39a08cdc-3986-4994-baf0-91e8ddf1b855","shared_citers":8},{"title":"arXiv preprint arXiv:2509.06040 (2025) 2, 3","work_id":"4b5ca02b-b12f-4ca1-88b1-99007b8f9528","shared_citers":8},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":8},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":8},{"title":"arXiv preprint arXiv:2509.25050 , year=","work_id":"b71cd776-fb9d-4e2f-b936-7105099de5b7","shared_citers":7},{"title":"arXiv preprint arXiv:2510.22319 (2025) 3","work_id":"ca09b893-7353-413f-b231-16f132d92aee","shared_citers":7},{"title":"Directly fine-tuning diffusion models on differentiable rewards","work_id":"cd2b3c6d-1ce2-434e-b114-5567026b9b82","shared_citers":7},{"title":"ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment","work_id":"94248955-4bc5-4517-98a0-66224a36d865","shared_citers":7}],"time_series":[{"n":1,"year":2024},{"n":4,"year":2025},{"n":50,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T12:20:30.632014+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T12:20:44.869178+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","claims":[{"claim_text":"Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in prev","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"[42] Chenyuan Wu, Pengfei Zheng, Ruiran Yan, Shitao Xiao, Xin Luo, Yueze Wang, Wanli Li, Xiyan Jiang, Yexin Liu, Junjie Zhou, et al. Omnigen2: Exploration to advanced multimodal generation.arXiv preprint arXiv:2506.18871, 2025. [43] Keming Wu, Sicong Jiang, Max Ku, Ping Nie, Minghao Liu, and Wenhu Chen. Editre- ward: A human-aligned reward model for instruction-guided image editing.arXiv preprint arXiv:2509.26346, 2025. [44] Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, ","claim_type":"other","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"by excessively large advantages as well as inefficient learning under excessively small scales. Extensive experiments are conducted to evaluate the effectiveness of SLAS. Two backbones of different scales, SD1.4 (0.9B) [ 11] and FLUX.1 Dev (12B) [ 3] are utilized, with DanceGRPO [7] serving as the baseline under identical reward configurations and training settings, and both methods are trained on the publicly available HPD-v2 training set [12]. From the training dynamics, SLAS consistently outp","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"between low-rank pixel outputs and full-rank pixel targets. To our knowledge, this is the first practical path for turning existing large-scale latent flow models themselves into strong pixel generators. We evaluate AsymFlow in two settings. On ImageNet 256×256 [12], AsymFlow reaches 1.76 FID with the JiT-H/16 network [ 35] and 1.57 FID with an additional REPA loss [ 70], outperforming prior DiT/JiT-like pixel diffusion models by a large margin. For text-to-image generation, our pixel AsymFlow m","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"individual benchmarks, it exhibits clearly superior overall performance across multiple benchmarks, indicating higher data quality. Text-to-image Preference Reformulation.To validate the superiority of our reformulation method over the base- line method adopted in Omni-Reward [16], we first ran- domly sample 50k raw text-to-image preference data from EvalMuse [13] and HPDv2 [54]. We then reconstruct these preferences using the two methods, respectively, and train MRMs on the resulting datasets. ","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Further analysis in the fine-grand dimension shows that, compared to Seedream 2.0, Seedream 3.0 has improvements in most dimensions, especially in terms of objects, activities, locations, food, and space. To align with the previous reported results, Ideogram 2.0 is included in the assessment here and subsequent chapters. For image quality evaluation, we reuse two external metrics, HPSv2 [24] and MPS [26], and two internal evaluation models, Internal-Align and Internal-Aes. Seedream 3.0 ranks fir","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Together, LeapAlign makes early-step fine-tuning practical and stable. We fine-tune Flux [18] with LeapAlign and show the performance gains in Fig. 1. Moreover, compared with the state-of-the-art GRPO-based methods [22, 55] and direct-gradient methods [3, 52, 53], LeapAlign consistently performs better in image generation, reflected by better scores in HPSv2.1 [51], HPSv3 [32], PickScore [17], UnifiedReward [48], ImageReward [53], and image-text alignment on GenEval [10]. In summary, this paper ","claim_type":"baseline","confidence":0.9,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (16 contexts).","role_counts":[{"n":16,"context_role":"background"},{"n":13,"context_role":"dataset"},{"n":4,"context_role":"method"},{"n":3,"context_role":"baseline"},{"n":3,"context_role":"other"}]},"error":null,"updated_at":"2026-05-23T07:04:16.879664+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","claims":[{"claim_text":"Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in prev","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T12:20:44.873378+00:00"}},"summary":{"title":"Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis","claims":[{"claim_text":"Recent text-to-image generative models can generate high-fidelity images from text inputs, but the quality of these generated images cannot be accurately evaluated by existing evaluation metrics. To address this issue, we introduce Human Preference Dataset v2 (HPD v2), a large-scale dataset that captures human preferences on images from a wide range of sources. HPD v2 comprises 798,090 human preference choices on 433,760 pairs of images, making it the largest dataset of its kind. The text prompts and images are deliberately collected to eliminate potential bias, which is a common issue in prev","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"DanceGRPO: Unleashing GRPO on Visual Generation","work_id":"7404dd36-8f9c-478f-b089-ef9f8189c711","shared_citers":19},{"title":"Flow Matching for Generative Modeling","work_id":"6edb71c4-5d64-40af-a394-9757ea051a36","shared_citers":19},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":19},{"title":"Flow-GRPO: Training Flow Matching Models via Online RL","work_id":"bf1e8e81-ff31-401a-a5dc-d9c49df168ab","shared_citers":17},{"title":"MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE","work_id":"8b0ab84a-b7ea-46ea-a6ba-bf490a84d251","shared_citers":15},{"title":"Training Diffusion Models with Reinforcement Learning","work_id":"67684dda-3930-452a-b91a-36cbb8e2e219","shared_citers":15},{"title":"Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow","work_id":"a1989e1b-d66d-4533-be3a-fb9c5fd62290","shared_citers":14},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":14},{"title":"Score-Based Generative Modeling through Stochastic Differential Equations","work_id":"d9110e53-a5d4-4794-a4c5-a575e91c31ad","shared_citers":14},{"title":"Classifier-Free Diffusion Guidance","work_id":"acf2c588-c088-4a6c-938e-150ad7c666d7","shared_citers":11},{"title":"DiffusionNFT: Online Diffusion Reinforcement with Forward Process","work_id":"0ed3cf57-36ba-4962-847e-7a8f5f99901d","shared_citers":11},{"title":"Denoising Diffusion Implicit Models","work_id":"8fa2128b-d18c-405c-ac92-0e669cf89ac0","shared_citers":10},{"title":"Unified Reward Model for Multimodal Understanding and Generation","work_id":"bf9fcf9a-1781-4008-960e-2bec1a717e4e","shared_citers":10},{"title":"FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space","work_id":"5dfe19d5-3541-4803-8fe9-3c8b9e29b281","shared_citers":9},{"title":"Qwen-Image Technical Report","work_id":"d06d7ecc-7579-4f89-a60b-4278a0f3c562","shared_citers":9},{"title":"Tempflow-grpo: When timing matters for grpo in flow models.arXiv preprint arXiv:2508.04324","work_id":"fecc731c-f8a2-4f00-ab1b-51b87a133726","shared_citers":9},{"title":"Aligning text-to-image models using human feedback.arXiv preprint arXiv:2302.12192","work_id":"39a08cdc-3986-4994-baf0-91e8ddf1b855","shared_citers":8},{"title":"arXiv preprint arXiv:2509.06040 (2025) 2, 3","work_id":"4b5ca02b-b12f-4ca1-88b1-99007b8f9528","shared_citers":8},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":8},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":8},{"title":"arXiv preprint arXiv:2509.25050 , year=","work_id":"b71cd776-fb9d-4e2f-b936-7105099de5b7","shared_citers":7},{"title":"arXiv preprint arXiv:2510.22319 (2025) 3","work_id":"ca09b893-7353-413f-b231-16f132d92aee","shared_citers":7},{"title":"Directly fine-tuning diffusion models on differentiable rewards","work_id":"cd2b3c6d-1ce2-434e-b114-5567026b9b82","shared_citers":7},{"title":"ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment","work_id":"94248955-4bc5-4517-98a0-66224a36d865","shared_citers":7}],"time_series":[{"n":1,"year":2024},{"n":4,"year":2025},{"n":50,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"f594eee8-e9d2-41c5-9e75-8e58323846de","orcid":null,"display_name":"Feng Zhu","source":"manual","import_confidence":0.72},{"id":"2f1f493e-c401-45dd-9d2f-ae7e55e497fc","orcid":null,"display_name":"Keqiang Sun","source":"manual","import_confidence":0.72},{"id":"ffcd7d23-d2f6-4ae8-b9ac-f42695a19a5e","orcid":null,"display_name":"Rui Zhao","source":"manual","import_confidence":0.72},{"id":"b2ebe93a-2133-4dcf-858e-dcb2a33fe2c1","orcid":null,"display_name":"Xiaoshi Wu","source":"manual","import_confidence":0.72},{"id":"0155e1bb-6d4a-4831-9ead-081b8f1be231","orcid":null,"display_name":"Yiming Hao","source":"manual","import_confidence":0.72},{"id":"46feace2-279d-43a8-8200-3076f1fd6905","orcid":null,"display_name":"Yixiong Chen","source":"manual","import_confidence":0.72}]}}