arxiv: 2605.07861 · v1 · submitted 2026-05-08 · 💻 cs.CV

Recognition: no theorem link

From Synthetic to Real: Toward Identity-Consistent Makeup Transfer with Synthetic and Real Data

Yue Yu , Jiayu Wang , Jiajia Shi , Jingjing Chen , Yu-Gang Jiang

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:19 UTC · model grok-4.3

classification 💻 cs.CV

keywords makeup transferidentity preservationsynthetic data curationreinforcement learningreal-world adaptationimage-to-image translationcomputer vision

0 comments

The pith

Curated synthetic data combined with reinforcement learning on real images improves identity preservation in makeup transfer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to fix two problems in makeup transfer: synthetic training data often changes the source person's identity, and models trained only on synthetics fail on complex real photos. It introduces ConsistentBeauty, a curation pipeline that builds synthetic pairs with strict identity matching and accurate makeup application. It then adds RealBeauty, a post-training step that uses reinforcement learning with task-specific verifiable rewards to adapt the model to real data patterns. A new benchmark with varied skin tones, ages, poses, and styles is released for testing. If the approach holds, transfer systems could apply makeup styles reliably to everyday photos without distorting faces or requiring heavy manual fixes.

Core claim

ConsistentBeauty creates synthetic training data that enforces both makeup fidelity and identity consistency, while RealBeauty performs supervised learning on this data followed by reinforcement learning with novel verifiable rewards that measure identity preservation and makeup accuracy on real images, allowing the model to generalize beyond synthetic supervision.

What carries the argument

RealBeauty post-training framework, which adapts a model from supervised training on identity-consistent synthetic data to real scenarios via reinforcement learning driven by makeup-transfer-specific verifiable rewards.

If this is right

State-of-the-art performance on existing makeup transfer benchmarks.
Stronger identity preservation across diverse skin tones, ages, genders, and poses.
Improved handling of complex real-world makeup styles and backgrounds.
The new diverse benchmark supports more thorough testing of real-world robustness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The curation-plus-RL pattern may transfer to other image-to-image tasks that suffer from synthetic-to-real gaps, such as face editing or style transfer.
Verifiable rewards could reduce reliance on paired real-world annotations in beauty-related computer vision applications.
Consumer tools for virtual makeup try-on might become more accurate for varied user demographics without additional manual calibration.

Load-bearing premise

The verifiable rewards used in reinforcement learning reliably quantify and improve both identity consistency and makeup fidelity on real data without creating new distortions or overfitting to the new benchmark.

What would settle it

A comparison on the new benchmark showing that the reinforcement-learning-adapted model produces lower identity preservation scores or more visible artifacts than the version trained only on the curated synthetic data.

Figures

Figures reproduced from arXiv: 2605.07861 by Jiajia Shi, Jiayu Wang, Jingjing Chen, Yue Yu, Yu-Gang Jiang.

**Figure 2.** Figure 2: Overview of the proposed ConsistentBeauty data curation pipeline. By constructing makeup layers on a standard face and transferring them to other [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Examples from our ConsistentBeauty dataset. Different non-makeup [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of the proposed Makeup Perceptual Verifier for extracting a makeup-exclusive layer from the input image. By removing background regions [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison with representative makeup transfer methods. Existing methods mainly exhibit two limitations: (1) insufficient identity [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: The zoomed-in regions highlight fine-grained details, such as decorative elements and intricate makeup patterns. The close-up comparisons show that [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of different makeup transfer methods in the two-dimensional space of min-max normalized CLIP-I and Face Similarity. For each metric, [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: The results of the user study comparing our method with SHMT, [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative comparison between the SFT-only and SFT+RL models. [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Makeup transfer aims to apply the makeup style of a reference portrait to a source portrait while preserving identity and background. Early methods formulate this task as unsupervised image-to-image translation, relying on surrogate objectives and often yielding limited performance. Recent diffusion- and flow-based approaches instead exploit synthetic data for supervised training, leading to significant improvements. However, these methods still face two critical challenges: synthetic supervision frequently fails to faithfully preserve identity, and the domain gap between synthetic and real data limits generalization, resulting in degraded performance in complex real-world scenarios. To address these issues, this paper first proposes ConsistentBeauty, a novel data curation pipeline that ensures makeup fidelity and strict identity consistency within the synthesized data. Second, we propose RealBeauty, a synthetic-to-real post-training framework. Beyond supervised learning on curated synthetic data, we further adapt the model to real-world scenarios through reinforcement learning and design novel verifiable rewards tailored to the makeup transfer task. It allows the model to further benefit from real makeup patterns beyond synthetic supervision. In addition, we establish a new diverse benchmark for makeup transfer, covering a wide range of skin tones, ages, genders, poses, and makeup styles, thereby enabling a more comprehensive evaluation of model performance under diverse real-world conditions. Extensive experiments show that our method achieves state-of-the-art performance on multiple benchmarks and demonstrates clear advantages in identity preservation and performance on complex real-world cases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core move is a two-stage pipeline that curates identity-consistent synthetic makeup pairs then refines the model on real images via RL with task-specific rewards, plus a new diverse benchmark.

read the letter

The main takeaway is that this work gives a concrete way to move from synthetic supervision to real-world makeup transfer without losing identity. They start with ConsistentBeauty, a curation pipeline that generates synthetic data where both identity and makeup style are tightly controlled. Then RealBeauty adds a reinforcement learning stage that uses verifiable rewards to adapt the model to unpaired real photos, targeting identity consistency and makeup fidelity directly. They also release a new benchmark covering wider ranges of skin tones, ages, poses, and styles than prior sets. This combination addresses the two problems the abstract flags: synthetic data often drifts on identity, and the domain gap hurts real cases. The approach is straightforward and matches the goals without obvious circularity in the claims. The RL rewards are the part that stands out as new, since they let the model learn from real patterns without paired supervision. Experiments reportedly reach SOTA on existing benchmarks while showing gains on complex real examples and identity metrics. That said, the rewards likely need some weighting choices, and any sensitivity there could affect how well the gains hold up outside their new benchmark. The paper does not appear to overclaim from self-referential metrics. This is the kind of targeted engineering paper that helps people building consumer editing tools or similar translation tasks. Readers working on domain adaptation or supervised image synthesis will find the data pipeline and reward design useful to examine. It is solid enough on its own terms to merit peer review; the experiments and ablations should be checked in detail, but the framing and components are worth referee time.

Referee Report

1 major / 2 minor

Summary. The paper claims that prior synthetic-supervised makeup transfer methods suffer from poor identity preservation and domain gaps to real data; it addresses this via ConsistentBeauty, a curation pipeline producing identity-consistent synthetic pairs, followed by RealBeauty, a supervised pre-training plus RL post-training stage that uses novel task-specific verifiable rewards on real images to adapt the model, plus a new diverse benchmark spanning skin tones, ages, genders, poses and styles, ultimately reporting SOTA performance on identity preservation and complex real-world cases.

Significance. If the empirical claims hold, the work would be significant for the makeup-transfer and broader synthetic-to-real adaptation literature by demonstrating a practical pipeline that leverages curated synthetic supervision then refines with RL on real data without full labels. The new benchmark is a clear positive contribution that enables more representative evaluation across demographics. The verifiable-reward RL idea is a strength that could transfer to other image-editing tasks if the rewards prove robust.

major comments (1)

[RealBeauty framework] RealBeauty section (RL adaptation stage): the central claim that the novel verifiable rewards reliably improve identity consistency and makeup fidelity on real data without new artifacts or extensive tuning is load-bearing for the synthetic-to-real contribution, yet the manuscript provides insufficient detail on their exact formulation, weighting coefficients, or verification procedure; this directly relates to the reader's noted weakest assumption and must be clarified with equations or pseudocode before the SOTA and generalization claims can be fully assessed.

minor comments (2)

Table captions and figure legends should explicitly state which metrics correspond to identity preservation versus makeup fidelity so readers can directly map quantitative gains to the two stated advantages.
The abstract and introduction mention 'multiple benchmarks'; a consolidated table listing all evaluated datasets, metrics, and prior methods would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The single major comment identifies a valid need for greater transparency in the RealBeauty framework, which we will address directly in the revision to strengthen the synthetic-to-real claims.

read point-by-point responses

Referee: [RealBeauty framework] RealBeauty section (RL adaptation stage): the central claim that the novel verifiable rewards reliably improve identity consistency and makeup fidelity on real data without new artifacts or extensive tuning is load-bearing for the synthetic-to-real contribution, yet the manuscript provides insufficient detail on their exact formulation, weighting coefficients, or verification procedure; this directly relates to the reader's noted weakest assumption and must be clarified with equations or pseudocode before the SOTA and generalization claims can be fully assessed.

Authors: We agree that the verifiable rewards are central to RealBeauty and that the current manuscript description is insufficient for full evaluation. In the revised manuscript we will add the exact mathematical formulations of the identity-consistency and makeup-fidelity reward terms, the specific weighting coefficients used in the composite reward, and pseudocode for the verification step and RL update loop. These additions will clarify how the rewards are computed from verifiable signals on real images and why they improve performance without introducing artifacts or requiring extensive retuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes an empirical pipeline: ConsistentBeauty for curating identity-consistent synthetic makeup data, followed by RealBeauty which applies supervised training on that data and then RL adaptation to real images using task-specific verifiable rewards. Performance is evaluated on external benchmarks plus a newly introduced diverse real-world benchmark. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear; all claims reduce to experimental results measured against independent metrics rather than quantities defined in terms of the method's own outputs.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claims rest on standard computer vision assumptions about diffusion/flow models being trainable with paired supervision and on the existence of verifiable scalar rewards that can proxy human judgments of identity and makeup fidelity. No new physical entities are postulated. A small number of reward weighting hyperparameters are expected but not detailed in the abstract.

free parameters (1)

reward weighting coefficients
Weights balancing the verifiable makeup, identity, and realism rewards in the RL stage; these are typically tuned on validation data and directly affect the final adaptation performance.

axioms (2)

domain assumption Paired synthetic data with strict identity consistency can be generated at scale without introducing artifacts that harm downstream real-world generalization.
Invoked when claiming ConsistentBeauty solves the identity preservation problem of prior synthetic supervision.
domain assumption Verifiable rewards can be defined that accurately reflect human-perceived identity consistency and makeup fidelity on real images.
Central to the RealBeauty RL stage; if false, the reinforcement learning adaptation cannot be guaranteed to improve real-world performance.

invented entities (1)

verifiable rewards for makeup transfer no independent evidence
purpose: Provide scalar feedback signals during RL post-training that measure identity preservation, makeup fidelity, and realism on real data.
Newly designed reward functions introduced to enable the synthetic-to-real adaptation step.

pith-pipeline@v0.9.0 · 5558 in / 1779 out tokens · 40938 ms · 2026-05-11T02:19:49.421361+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 15 internal anchors

[1]

Unpaired image-to-image translation using cycle-consistent adversarial networks,

J.-Y . Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” inProceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232

work page 2017
[2]

Beautygan: Instance-level facial makeup transfer with deep generative adversarial network,

T. Li, R. Qian, C. Dong, S. Liu, Q. Yan, W. Zhu, and L. Lin, “Beautygan: Instance-level facial makeup transfer with deep generative adversarial network,” inProceedings of the 26th ACM international conference on Multimedia, 2018, pp. 645–653

work page 2018
[3]

Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer,

W. Jiang, S. Liu, C. Gao, J. Cao, R. He, J. Feng, and S. Yan, “Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5194–5202

work page 2020
[4]

Beautyrec: Ro- bust, efficient, and component-specific makeup transfer,

Q. Yan, C. Guo, J. Zhao, Y . Dai, C. C. Loy, and C. Li, “Beautyrec: Ro- bust, efficient, and component-specific makeup transfer,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 1102–1110

work page 2023
[5]

Elegant: Exquisite and locally editable gan for makeup transfer,

C. Yang, W. He, Y . Xu, and Y . Gao, “Elegant: Exquisite and locally editable gan for makeup transfer,” inEuropean conference on computer vision. Springer, 2022, pp. 737–754

work page 2022
[6]

Ssat: A symmetric semantic-aware transformer network for makeup transfer and removal,

Z. Sun, Y . Chen, and S. Xiong, “Ssat: A symmetric semantic-aware transformer network for makeup transfer and removal,” inProceedings of the AAAI Conference on artificial intelligence, vol. 36, no. 2, 2022, pp. 2325–2334

work page 2022
[7]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695. PREPRINT. 10

work page 2022
[8]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

D. Podell, Z. English, K. Lacey, A. Blattmann, T. Dockhorn, J. M ¨uller, J. Penna, and R. Rombach, “Sdxl: Improving latent diffusion models for high-resolution image synthesis,”arXiv preprint arXiv:2307.01952, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Scaling rectified flow transformers for high-resolution image synthesis,

P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. M ¨uller, H. Saini, Y . Levi, D. Lorenz, A. Sauer, F. Boeselet al., “Scaling rectified flow transformers for high-resolution image synthesis,” inForty-first international conference on machine learning, 2024

work page 2024
[10]

FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space

B. F. Labs, S. Batifol, A. Blattmann, F. Boesel, S. Consul, C. Diagne, T. Dockhorn, J. English, Z. English, P. Esseret al., “Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space,”arXiv preprint arXiv:2506.15742, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[11]

Cyclediff: Cycle diffusion models for unpaired image-to-image translation,

S. Zou, Y . Huang, R. Yi, C. Zhu, and K. Xu, “Cyclediff: Cycle diffusion models for unpaired image-to-image translation,”IEEE Transactions on Image Processing, 2026

work page 2026
[12]

Consistent image layout editing with diffusion models,

T. Xia, Y . Zhang, T. Liu, and L. Zhang, “Consistent image layout editing with diffusion models,”IEEE Transactions on Image Processing, vol. 34, pp. 6978–6992, 2025

work page 2025
[13]

Stablemakeup: When real-world makeup transfer meets diffusion model,

Y . Zhang, Y . Yuan, Y . Song, and J. Liu, “Stablemakeup: When real-world makeup transfer meets diffusion model,” inProceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Con- ference Conference Papers, 2025, pp. 1–9

work page 2025
[14]

Ffhq- makeup: Paired synthetic makeup dataset with facial consistency across multiple styles,

X. Yang, S. Ueda, Y . Huang, T. Akiyama, and T. Taketomi, “Ffhq- makeup: Paired synthetic makeup dataset with facial consistency across multiple styles,”arXiv preprint arXiv:2508.03241, 2025

work page arXiv 2025
[15]

FLUX-Makeup: High-fidelity, identity-consistent, and robust makeup transfer via diffusion transformer, 2025

J. Zhu, S. Liu, L. Li, Y . Gong, H. Wang, B. Cheng, Y . Ma, L. Wu, X. Wu, D. Lenget al., “Flux-makeup: High-fidelity, identity-consistent, and robust makeup transfer via diffusion transformer,”arXiv preprint arXiv:2508.05069, 2025

work page arXiv 2025
[16]

Evomakeup: High-fidelity and controllable makeup editing with makeupquad,

H. Wu, Y . Fu, Y . Li, Y . Gao, and K. Du, “Evomakeup: High-fidelity and controllable makeup editing with makeupquad,”arXiv preprint arXiv:2508.05994, 2025

work page arXiv 2025
[17]

Ladn: Local adversarial disentangling network for facial makeup and de-makeup,

Q. Gu, G. Wang, M. T. Chiu, Y .-W. Tai, and C.-K. Tang, “Ladn: Local adversarial disentangling network for facial makeup and de-makeup,” inProceedings of the IEEE/CVF International conference on computer vision, 2019, pp. 10 481–10 490

work page 2019
[18]

Generative adversarial networks,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020

work page 2020
[19]

Shmt: Self-supervised hierarchical makeup transfer via latent diffusion models,

Z. Sun, S. Xiong, Y . Chen, F. Du, W. Chen, F. Wang, and Y . Rong, “Shmt: Self-supervised hierarchical makeup transfer via latent diffusion models,”Advances in Neural Information Processing Systems, vol. 37, pp. 16 016–16 042, 2024

work page 2024
[20]

Seededit 3.0: Fast and high-quality generative image editing.arXiv preprint arXiv:2506.05083, 2025

P. Wang, Y . Shi, X. Lian, Z. Zhai, X. Xia, X. Xiao, W. Huang, and J. Yang, “Seededit 3.0: Fast and high-quality generative image editing,” arXiv preprint arXiv:2506.05083, 2025

work page arXiv 2025
[21]

Generative modeling by estimating gradients of the data distribution,

Y . Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,”Advances in neural information processing systems, vol. 32, 2019

work page 2019
[22]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840– 6851, 2020

work page 2020
[23]

Score-Based Generative Modeling through Stochastic Differential Equations

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differ- ential equations,”arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[24]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[25]

Hierarchical Text-Conditional Image Generation with CLIP Latents

A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with clip latents,”arXiv preprint arXiv:2204.06125, vol. 1, no. 2, p. 3, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[26]

Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,

H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,”

work page
[27]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

[Online]. Available: https://arxiv.org/abs/2308.06721

work page internal anchor Pith review Pith/arXiv arXiv
[28]

Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,

N. Ruiz, Y . Li, V . Jampani, Y . Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject- driven generation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 22 500–22 510

work page 2023
[29]

Instructpix2pix: Learning to follow image editing instructions,

T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 18 392–18 402

work page 2023
[30]

Conditional Generative Adversarial Nets

M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014

work page internal anchor Pith review arXiv 2014
[31]

Perceptual adversarial networks for image-to-image transformation,

C. Wang, C. Xu, C. Wang, and D. Tao, “Perceptual adversarial networks for image-to-image transformation,”IEEE Transactions on Image Pro- cessing, vol. 27, no. 8, pp. 4066–4079, 2018

work page 2018
[32]

Scalable diffusion models with transformers,

W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4195–4205

work page 2023
[33]

Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

X. Liu, C. Gong, and Q. Liu, “Flow straight and fast: Learning to generate and transfer data with rectified flow,”arXiv preprint arXiv:2209.03003, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[34]

Flow Matching for Generative Modeling

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow matching for generative modeling,”arXiv preprint arXiv:2210.02747, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Aligning Text-to-Image Models using Human Feedback

K. Lee, H. Liu, M. Ryu, O. Watkins, Y . Du, C. Boutilier, P. Abbeel, M. Ghavamzadeh, and S. S. Gu, “Aligning text-to-image models using human feedback,”arXiv preprint arXiv:2302.12192, 2023

work page internal anchor Pith review arXiv 2023
[36]

Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models,

Y . Fan, O. Watkins, Y . Du, H. Liu, M. Ryu, C. Boutilier, P. Abbeel, M. Ghavamzadeh, K. Lee, and K. Lee, “Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models,”Advances in Neural Information Processing Systems, vol. 36, pp. 79 858–79 885, 2023

work page 2023
[37]

Training Diffusion Models with Reinforcement Learning

K. Black, M. Janner, Y . Du, I. Kostrikov, and S. Levine, “Train- ing diffusion models with reinforcement learning,”arXiv preprint arXiv:2305.13301, 2023

work page internal anchor Pith review arXiv 2023
[38]

Diffusion model alignment using direct preference optimization,

B. Wallace, M. Dang, R. Rafailov, L. Zhou, A. Lou, S. Purushwalkam, S. Ermon, C. Xiong, S. Joty, and N. Naik, “Diffusion model alignment using direct preference optimization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8228–8238

work page 2024
[39]

Direct preference optimization: Your language model is secretly a reward model,

R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,”Advances in neural information processing systems, vol. 36, pp. 53 728–53 741, 2023

work page 2023
[40]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y . Li, Y . Wuet al., “Deepseekmath: Pushing the limits of mathematical reasoning in open language models,”arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Flow-GRPO: Training Flow Matching Models via Online RL

J. Liu, G. Liu, J. Liang, Y . Li, J. Liu, X. Wang, P. Wan, D. Zhang, and W. Ouyang, “Flow-grpo: Training flow matching models via online rl,” arXiv preprint arXiv:2505.05470, 2025

work page internal anchor Pith review arXiv 2025
[42]

DanceGRPO: Unleashing GRPO on Visual Generation

Z. Xue, J. Wu, Y . Gao, F. Kong, L. Zhu, M. Chen, Z. Liu, W. Liu, Q. Guo, W. Huanget al., “Dancegrpo: Unleashing grpo on visual generation,”arXiv preprint arXiv:2505.07818, 2025

work page internal anchor Pith review arXiv 2025
[43]

MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE

J. Li, Y . Cui, T. Huang, Y . Ma, C. Fan, M. Yang, and Z. Zhong, “Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde,” arXiv preprint arXiv:2507.21802, 2025

work page internal anchor Pith review arXiv 2025
[44]

A style-based generator architecture for generative adversarial networks,

T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401– 4410

work page 2019
[45]

Facenet: A unified embed- ding for face recognition and clustering,

F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embed- ding for face recognition and clustering,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 815– 823

work page 2015
[46]

Imagereward: Learning and evaluating human preferences for text-to- image generation,

J. Xu, X. Liu, Y . Wu, Y . Tong, Q. Li, M. Ding, J. Tang, and Y . Dong, “Imagereward: Learning and evaluating human preferences for text-to- image generation,”Advances in Neural Information Processing Systems, vol. 36, pp. 15 903–15 935, 2023

work page 2023
[47]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[48]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

work page 2021
[49]

Lipstick ain’t enough: Beyond color matching for in-the-wild makeup transfer,

T. Nguyen, A. T. Tran, and M. Hoai, “Lipstick ain’t enough: Beyond color matching for in-the-wild makeup transfer,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2021, pp. 13 305–13 314

work page 2021
[50]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

work page 2022
[51]

Photomaker: Customizing realistic human photos via stacked id embedding,

Z. Li, M. Cao, X. Wang, Z. Qi, M.-M. Cheng, and Y . Shan, “Photomaker: Customizing realistic human photos via stacked id embedding,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 8640–8650

work page 2024
[52]

Bisenet: Bilateral segmentation network for real-time semantic segmentation,

C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 325–341. PREPRINT. 11 VI. BIOGRAPHYSECTION Yue Yureceived the Bachelor’s degree and M.S. degree from the Chinese University of Hong Kong, Hong Kon...

work page 2018