pith. machine review for the scientific record. sign in

arxiv: 2604.08526 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.GR

Recognition: 2 theorem links

· Lean Theorem

FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On

Ira Kemelmacher-Shlizerman, Johanna Karras, Yingwei Li, Yuanhao Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:42 UTC · model grok-4.3

classification 💻 cs.CV cs.GR
keywords virtual try-onfit-awaredatasetsynthetic datagarment simulationre-texturingphysics simulationill-fit
0
0 comments X

The pith

The FIT dataset supplies 1.13 million try-on image triplets with exact body and garment measurements so virtual try-on models can depict accurate fit instead of defaulting to well-fitted results.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces FIT, a large-scale dataset of over 1.13 million virtual try-on image triplets, each accompanied by precise measurements of the wearer's body and the garment itself. Existing virtual try-on methods excel at transferring garment appearance but ignore fit, so they produce well-fitted images even when the garment is far too large or too small for the person. To create realistic ill-fit examples at scale, the authors generate 3D garments, drape them with physics simulation, and apply a re-texturing pipeline that converts the renders into photorealistic images while keeping the underlying geometry and fit information intact. They also add person-identity preservation so the dataset contains paired images of the same person in different garments. A baseline model trained on FIT achieves new state-of-the-art performance on fit-aware virtual try-on and supplies a public benchmark for future work.

Core claim

The central contribution is the FIT dataset, built by a fully synthetic pipeline that first uses GarmentCode to create 3D garments, drapes them on 3D bodies via physics simulation to produce realistic drape and wrinkles for both good and ill fit, then applies a re-texturing framework that turns the synthetic renderings into photorealistic images while strictly preserving the original geometry and fit details; person identity is maintained across garment changes to enable supervised training. This yields more than 1.13 million triplets of person image, garment image, and target try-on image together with exact numeric body and garment measurements. Models trained on the data can now generate试

What carries the argument

The FIT dataset of 1.13M triplets with precise body and garment measurements, generated by physics-based garment draping followed by a geometry-preserving re-texturing framework that also maintains person identity across pairs.

If this is right

  • Virtual try-on outputs will vary visibly with body size and garment size instead of always showing a perfect fit.
  • Models can be trained to handle extreme size mismatches without hallucinating incorrect drape or fabric behavior.
  • The dataset serves as a public benchmark that future fit-aware methods can be measured against.
  • Paired person images enable supervised learning of identity-preserving transformations across different garments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Retail platforms could integrate fit-aware try-on to reduce sizing-related returns by letting customers see how a garment actually sits on their measured body.
  • The same synthetic pipeline could be extended to other geometry-sensitive synthesis tasks such as fitting furniture or medical devices.
  • Adding real 3D body scans or garment scans to the pipeline would further close the gap between synthetic training data and real-world deployment.

Load-bearing premise

Physics simulation of garment drape plus the re-texturing step produce realistic wrinkle and fold patterns for ill-fit cases that are close enough to real photographs for models to generalize.

What would settle it

Side-by-side visual comparison or quantitative metric on a held-out set of real photographs showing people wearing garments that are deliberately too large or too small, checking whether the synthesized fit matches the actual drape and wrinkles observed in the photos.

Figures

Figures reproduced from arXiv: 2604.08526 by Ira Kemelmacher-Shlizerman, Johanna Karras, Yingwei Li, Yuanhao Wang.

Figure 1
Figure 1. Figure 1: The FIT Dataset. We present FIT, a dataset and benchmark designed for fit-aware virtual try-on, featuring diverse garment fits (e.g., tight, loose) and precise size annotations. Left: Sample dataset triplets showing the conditioning garment image (top), the conditioning person image (middle), and the target try-on image (bottom). Right: Visualization of the corresponding person and garment measurement anno… view at source ↗
Figure 2
Figure 2. Figure 2: FIT dataset generation. (a) Overall pipeline: For each sample, we first simulate garment draping in 3D via GarmentCode, rendering a synthetic try-on image 𝐼𝑠 (see (b)). Then, we generate a text prompt 𝑝 (via VLM) describing the person and garment appearance, as well as a composite normal map 𝐼𝑛 based on 𝐼𝑠 . We use 𝑝 and 𝐼𝑛 to condition our re-texturing model 𝑓texture to generate the photorealistic try-on … view at source ↗
Figure 3
Figure 3. Figure 3: Fit-VTO architecture. Our architecture is a flow-based diffusion model based on Flux.1-dev [Black Forest Labs 2024] and finetuned with LoRA [Hu et al [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Paired Image Generation Comparison. VLM methods struggle with [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results. We show examples of Fit-VTO results on the synthetic FIT test dataset. Fit-VTO respects person and garment inputs, while also [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparisons. We compare Fit-VTO to related and ablated methods using synthetic FIT test data (top two rows) and real-world VITON-HD [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Independent size control. Fit-VTO realistically visualizes garment fit across various sizes on a fixed person size. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: During cross-body draping, the initial boxmeshes are often mis [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Additional dataset examples. In each example, we show the paired person image (top left), garment image (lower left), and the target try-on image (right). , Vol. 1, No. 1, Article . Publication date: April 2026 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative resizing results. Fit-VTO adapts garment fit according to individual garment measurements. We show results of independently shrinking [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Failure cases. (a) GarmentCode [Korosteleva and Sorkine-Hornung 2023] simulation does not model varying degrees of tightness well, leading to [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Garment fit distribution. We plot the frequency of each (body size, garment size) pairing in our dataset according to the size classification in Table 5. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Real-world resizing results. We show Fit-VTO try-on performance on real-world person images using varying garment sizes. Fit-VTO realistically [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
read the original abstract

Given a person and a garment image, virtual try-on (VTO) aims to synthesize a realistic image of the person wearing the garment, while preserving their original pose and identity. Although recent VTO methods excel at visualizing garment appearance, they largely overlook a crucial aspect of the try-on experience: the accuracy of garment fit -- for example, depicting how an extra-large shirt looks on an extra-small person. A key obstacle is the absence of datasets that provide precise garment and body size information, particularly for "ill-fit" cases, where garments are significantly too large or too small. Consequently, current VTO methods default to generating well-fitted results regardless of the garment or person size. In this paper, we take the first steps towards solving this open problem. We introduce FIT (Fit-Inclusive Try-on), a large-scale VTO dataset comprising over 1.13M try-on image triplets accompanied by precise body and garment measurements. We overcome the challenges of data collection via a scalable synthetic strategy: (1) We programmatically generate 3D garments using GarmentCode and drape them via physics simulation to capture realistic garment fit. (2) We employ a novel re-texturing framework to transform synthetic renderings into photorealistic images while strictly preserving geometry. (3) We introduce person identity preservation into our re-texturing model to generate paired person images (same person, different garments) for supervised training. Finally, we leverage our FIT dataset to train a baseline fit-aware virtual try-on model. Our data and results set the new state-of-the-art for fit-aware virtual try-on, as well as offer a robust benchmark for future research. We will make all data and code publicly available on our project page: https://johannakarras.github.io/FIT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces FIT, a large-scale dataset of over 1.13M virtual try-on image triplets paired with precise body and garment measurements to support fit-aware VTO. It describes a synthetic pipeline: GarmentCode for 3D garment generation, physics simulation for draping (including ill-fit), a re-texturing framework claimed to convert renders to photorealistic images while strictly preserving geometry, and an identity-preservation mechanism for generating paired person images. A baseline fit-aware VTO model is trained on the data and reported to achieve SOTA results, with plans to release data and code publicly.

Significance. A dataset providing explicit fit measurements at this scale would be a meaningful contribution to VTO, where existing methods typically default to well-fitted outputs due to lack of ill-fit supervision. The synthetic approach enables controlled generation of size-mismatch cases and the public release supports reproducibility. Credit is due for the scale and the attempt to include identity-preserving pairs for supervised training. However, the significance is conditional on empirical confirmation that the re-texturing step preserves the fit and drape information from the physics simulation.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (synthetic pipeline): The central claim that the 1.13M triplets accurately encode real fit behavior (including ill-fit drape and wrinkles) rests on the re-texturing framework 'strictly preserving geometry.' No quantitative metrics (e.g., garment mask IoU, surface normal error, or wrinkle pattern correlation) are reported comparing the physics-simulated renders to the final photorealistic outputs for size-mismatch cases; this is load-bearing for supervised fit-aware training.
  2. [§4] §4 (baseline model and evaluation): The SOTA claim for fit-aware VTO is supported only by the new dataset; without reported comparisons on real photographs (e.g., fit accuracy metrics or user studies on perceived tightness/looseness), it is unclear whether models trained on FIT generalize beyond the synthetic distribution.
minor comments (2)
  1. [§2] The exact breakdown of the 1.13M triplets (train/val/test splits, number of unique identities vs. garments) should be stated explicitly in the main text rather than only in the abstract.
  2. [Figure 2] Figure captions for the pipeline overview should include the specific GarmentCode parameters and physics solver settings used for ill-fit generation to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the corresponding revisions.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (synthetic pipeline): The central claim that the 1.13M triplets accurately encode real fit behavior (including ill-fit drape and wrinkles) rests on the re-texturing framework 'strictly preserving geometry.' No quantitative metrics (e.g., garment mask IoU, surface normal error, or wrinkle pattern correlation) are reported comparing the physics-simulated renders to the final photorealistic outputs for size-mismatch cases; this is load-bearing for supervised fit-aware training.

    Authors: We agree that quantitative validation of geometry preservation is important for supporting the use of the dataset in fit-aware training. The re-texturing framework conditions on the physics-simulated depth, normal, and segmentation maps to enforce geometric fidelity, but we did not report explicit metrics in the original submission. In the revision we will add quantitative comparisons (garment mask IoU, surface normal error, and wrinkle pattern correlation) between the physics-simulated renders and the final re-textured images, with separate analysis for size-mismatch cases. revision: yes

  2. Referee: [§4] §4 (baseline model and evaluation): The SOTA claim for fit-aware VTO is supported only by the new dataset; without reported comparisons on real photographs (e.g., fit accuracy metrics or user studies on perceived tightness/looseness), it is unclear whether models trained on FIT generalize beyond the synthetic distribution.

    Authors: We acknowledge that all reported results are on the synthetic FIT benchmark. Because no prior real-world VTO dataset provides precise body/garment size labels at scale, direct quantitative comparison on real photographs is not currently possible. The baseline establishes a new reference point on the first large-scale fit-supervised dataset. We will add an expanded limitations section discussing the synthetic-to-real gap and the value of the public release for enabling future real-world studies and user evaluations. revision: partial

Circularity Check

0 steps flagged

No circularity: dataset construction via external simulation pipeline

full rationale

The paper describes an empirical data-generation pipeline (GarmentCode for 3D garments, physics simulation for draping, re-texturing to photorealism while preserving geometry, and identity preservation for paired images). No mathematical derivations, equations, predictions, or first-principles results are claimed that reduce to fitted inputs or self-referential definitions. No self-citation load-bearing uniqueness theorems, ansatzes smuggled via citation, or renaming of known results appear. The central contribution is the scale and utility of the 1.13M triplets as a benchmark, which stands independently of any internal circular reduction. This is a standard non-circular data-construction effort.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the fidelity of the synthetic generation pipeline; the paper introduces no new physical entities but depends on domain assumptions about simulation accuracy and re-texturing invariance.

free parameters (1)
  • re-texturing model hyperparameters
    Parameters in the novel re-texturing framework are likely tuned to balance photorealism against geometry preservation, though specific values are not stated in the abstract.
axioms (2)
  • domain assumption Physics simulation via GarmentCode produces realistic garment drape and wrinkles for a wide range of body-garment size mismatches
    Invoked as the mechanism to capture ill-fit cases in the data generation step.
  • ad hoc to paper The re-texturing process strictly preserves 3D geometry and fit cues when converting synthetic renderings to photorealistic images
    Central premise of the scalable data collection strategy.

pith-pipeline@v0.9.0 · 5638 in / 1606 out tokens · 147024 ms · 2026-05-10T17:42:53.800727+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Reference graph

Works this paper leans on

21 extracted references · 14 canonical work pages · 1 internal anchor

  1. [1]

    arXiv preprint arXiv:1801.01401 (2018)

    Demystifying MMD GANs.ArXivabs/1801.01401 (2018). https://api.semanticscholar. org/CorpusID:3531856 Black Forest Labs

  2. [2]

    Zheng Chong, Yanwei Lei, Shiyue Zhang, Zhuandi He, Zhen Wang, Xujie Zhang, Xiao Dong, Yiling Wu, Dongmei Jiang, and Xiaodan Liang

    CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models. arXiv:2407.15886 [cs.CV] https: //arxiv.org/abs/2407.15886 Aiyu Cui, Jay Mahajan, Viraj Shah, Preeti Gomathinayagam, Chang Liu, and Svetlana Lazebnik

  3. [3]

    Chenghu Du, Shuqing Liu, Shengwu Xiong, et al

    Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images.arXiv preprint arXiv:2311.16094(2023). Chenghu Du, Shuqing Liu, Shengwu Xiong, et al

  4. [4]

    Chenghu Du, Shengwu Xiong, Junyin Wang, Yi Rong, and Shili Xiong

    Greatness in simplicity: Unified self-cycle consistency for parser-free virtual try-on.Advances in Neural Information Processing Systems36 (2023), 20287–20298. Chenghu Du, Shengwu Xiong, Junyin Wang, Yi Rong, and Shili Xiong

  5. [5]

    https://proceedings.neurips.cc/ paper_files/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf Edward J

    Curran Associates, Inc. https://proceedings.neurips.cc/ paper_files/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Wang, Yu Chen, Lu Li, Xiangru Wang, Lu Wang, Yan Zhou, et al

  6. [6]

    LoRA-FA: Efficient and Effective Low Rank Representation Fine-tuning

    LoRA: Low-Rank Adaptation of Large Language Models.arXiv preprint arXiv:2308.03303(2023). Thibaut Issenhuth, Jérémie Mary, and Clément Calauzenes

  7. [7]

    In European Conference on Computer Vision (ECCV)

    Garment- CodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns. In European Conference on Computer Vision (ECCV). https://arxiv.org/abs/2405.17609 Maria Korosteleva and Olga Sorkine-Hornung

  8. [8]

    doi:10.1145/3618351 Minoru Kuribayashi, Koki Nakai, and Nobuo Funabiki

    GarmentCode: Programming Parametric Sewing Patterns.ACM Transactions on Graphics (TOG)42, 6, Article 197 (2023). doi:10.1145/3618351 Minoru Kuribayashi, Koki Nakai, and Nobuo Funabiki

  9. [9]

    Siran Li, Ruiyang Liu, Chen Liu, Zhendong Wang, Gaofeng He, Yong-Lu Li, Xiaogang Jin, and Huamin Wang

    Image-based virtual try-on system with clothing-size adjustment.arXiv preprint arXiv:2302.14197(2023). Siran Li, Ruiyang Liu, Chen Liu, Zhendong Wang, Gaofeng He, Yong-Lu Li, Xiaogang Jin, and Huamin Wang

  10. [10]

    Graph.44, 6, Article 216 (Dec

    GarmageNet: A Multimodal Generative Framework for Sewing Pattern Design and Generic Garment Modeling.ACM Trans. Graph.44, 6, Article 216 (Dec. 2025), 23 pages. doi:10.1145/3763271 Lijuan Liu, Xiangyu Xu, Zhijie Lin, Jiabin Liang, and Shuicheng Yan

  11. [11]

    Towards Garment Sewing Pattern Reconstruction from a Single Image.ACM Trans. Graph. 42, 6, Article 200 (Dec. 2023), 15 pages. doi:10.1145/3618319 Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, and Xiaoou Tang

  12. [12]

    Garvita Tiwari, Bharat Lal Bhatnagar, Tony Tung, and Gerard Pons-Moll

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.Journal of Machine Learning Research21, 140 (2020), 1–67. Garvita Tiwari, Bharat Lal Bhatnagar, Tony Tung, and Gerard Pons-Moll

  13. [13]

    Lu Yang, Yicheng Liu, Yanan Li, Xiang Bai, and Hao Lu

    Size-Variable Virtual Try-On with Physical Clothes Size.arXiv preprint arXiv:2412.06201(2024). Lu Yang, Yicheng Liu, Yanan Li, Xiang Bai, and Hao Lu

  14. [14]

    arXiv:arXiv:2512.24016 Richard Zhang, Phillip Isola, Alexei A

    FitControler: Toward Fit-Aware Virtual Try-On. arXiv:arXiv:2512.24016 Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang

  15. [15]

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:arXiv:1801.03924 Xuanpu Zhang, Dan Song, Pengxin Zhan, Tianyu Chang, Jianhao Zeng, Qingguo Chen, Weihua Luo, and An-An Liu

  16. [16]

    arXiv:arXiv:2003.12753 Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, and Ira Kemelmacher- Shlizerman

    Deep Fashion3D: A Dataset and Benchmark for 3D Garment Reconstruction from Single Images. arXiv:arXiv:2003.12753 Luyang Zhu, Yingwei Li, Nan Liu, Hao Peng, Dawei Yang, and Ira Kemelmacher- Shlizerman

  17. [17]

    arXiv:2406.04542 doi:10.48550/ARXIV.2406.04542 Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, and Ira Kemelmacher-Shlizerman

    M&M VTO: Multi-Garment Virtual Try-On and Editing.CoRR abs/2406.04542 (2024). arXiv:2406.04542 doi:10.48550/ARXIV.2406.04542 Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, and Ira Kemelmacher-Shlizerman

  18. [18]

    We use Sapiens [Khirodkar et al

    scraped online images featuring modeling posing in front of the cam- era with a studio background. We use Sapiens [Khirodkar et al. 2024] to estimate normal and segmentation maps and Gemini [Google 2025a] to generate prompts describing the garment textures and designs. We enforce a structured prompt format containing two sen- tences (one per garment piece...

  19. [19]

    dev_lora_any2any_multi

    These include a limited ability to represent varying degrees of garment tightness in GarmentCode [Korosteleva and Sorkine-Hornung 2023], which we leave to future work. Another limitation is that garment measurements are often correlated in our FIT data (e.g. larger width correlates positively with larger length). As a result, with our Fit-VTO model, chang...

  20. [20]

    Failure cases. (a) GarmentCode [Korosteleva and Sorkine-Hornung 2023] simulation does not model varying degrees of tightness well, leading to similar-looking fit for all garment sizes smaller than the body size. (b) As a result of (a), it is difficult to tell the level of tightness in FIT try-on images. (c) Due to the correlations in measurements across s...

  21. [21]

    If not, return ’fail’

    6•Johanna Karras, Yuanhao Wang, Yingwei Li, and Ira Kemelmacher-Shlizerman Does the garment cover the person’s chest? If so, return ’pass’. If not, return ’fail’. Does the image contain a bottom garment (skirt, pants, underwear, boxers, leggings, or shorts) that covers the person’s groin area? If so, return ’pass’. If not, return ’fail’. G Usage of LLM’s ...