pith. machine review for the scientific record. sign in

arxiv: 2604.07409 · v1 · submitted 2026-04-08 · 💻 cs.LG · eess.IV

Recognition: 1 theorem link

· Lean Theorem

GAN-based Domain Adaptation for Image-aware Layout Generation in Advertising Poster Design

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:52 UTC · model grok-4.3

classification 💻 cs.LG eess.IV
keywords layout generationGANdomain adaptationadvertising postersimage-aware layoutscontent-aware metricsposter designCGL-Dataset
0
0 comments X

The pith

A GAN with pixel-level discriminator adapts to clean product images and generates image-aware layouts for advertising posters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a dataset pairing inpainted advertising posters with clean product images to train models that place graphic elements around image content. It shows that standard training fails because inpainting leaves artifacts, so it builds PDA-GAN, an unsupervised domain-adaptation GAN whose discriminator judges each pixel in early feature maps. This lets the generator learn from the flawed posters yet produce layouts that respect the textures and shapes of untouched input photos. The work also defines three new metrics that score how well layouts interact with image content rather than just measuring box positions. A reader would care because the method turns an imperfect training source into usable output for real-world poster design without needing perfectly paired clean data.

Core claim

PDA-GAN introduces a pixel-level discriminator attached to shallow feature maps that computes per-pixel GAN loss, allowing the layout generator trained on inpainted posters to produce high-quality image-aware layouts when given clean product images; quantitative and qualitative results on the CGL-Dataset show this outperforms prior models.

What carries the argument

The pixel-level discriminator (PD) in PDA-GAN, which applies the adversarial loss to individual pixels of shallow feature maps to close the domain gap between inpainted training posters and clean input images.

If this is right

  • Layouts respect fine visual details such as product shape and texture instead of treating the image as a uniform background.
  • The three new content-aware metrics give a more relevant score for how graphic elements interact with image content than position-only measures.
  • The same unsupervised adaptation step can be added to other conditional layout generators that must train on imperfect or synthetic data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The pixel discriminator may help any image-conditioned generative model when the training images contain localized artifacts.
  • The new metrics could be reused to benchmark layout quality in mobile UI or magazine design tasks.
  • If the inpainting method improves, the remaining domain gap shrinks and the adaptation network may become simpler or unnecessary.

Load-bearing premise

The visual artifacts left by inpainting create a domain gap that unsupervised pixel-level adaptation can close without any paired clean training examples.

What would settle it

Run PDA-GAN and the Gaussian-blur baseline on a held-out set of real clean product images never seen during training; if human raters or the new content-aware metrics show no improvement in layout quality or content alignment, the adaptation claim is false.

Figures

Figures reproduced from arXiv: 2604.07409 by Chenchen Xu, Min Zhou, Tiezheng Ge, Weiwei Xu.

Figure 1
Figure 1. Figure 1: Examples of image-aware graphic layout generation for advertising posters. Our model generates graphic layouts (middle) with multiple elements conditioned on product im￾ages (left). Designers or even automatic rendering programs can use these layouts to render advertising posters (right). The core challenge lies in modeling the relationship between the image content and the layout elements, enabling the ne… view at source ↗
Figure 2
Figure 2. Figure 2: Examples of CGL-Dataset. The left part represents the six components of information contained in each sample of the dataset. In the upper right corner are examples of different product types, and in the lower right corner are examples with different layout positions. siderable challenge. This is primarily because clean product images typically require manual design and post-processing by professional desig… view at source ↗
Figure 3
Figure 3. Figure 3: Domain gap visualization. To illustrate the domain gap, we manually design several posters based on clean product images (target data), and then inpaint the graphic elements to generate the corresponding source data. The visual content in the inpainted areas (highlighted with yellow boxes) appears distorted and blurred compared to the original content (highlighted with red boxes). updating the discriminato… view at source ↗
Figure 4
Figure 4. Figure 4: The architecture of our network. Annotated posters (source domain data) must be inpainted before input to the model. The model has both reconstruction and GAN loss when training with source domain data, while only a GAN loss is used when training with target domain data. Please refer to Sec. IV for the definition of each loss term: LPD, L G PD, and Lrec. During the discriminator or generator pass, both inp… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative evaluation for different models. Layouts in each column are conditioned on the same image, while those in each row are generated by the same model. This figure provides a qualitative comparison and analysis of different models from three perspectives: background complexity of text elements, subject overlap, and product overlap attention maps. TABLE V: Comprehensive comparison between CGL-GAN an… view at source ↗
Figure 6
Figure 6. Figure 6: More qualitative comparisons between CGL-GAN and PDA-GAN. [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Layouts generated by different models using source and target domain data. Inpainted images (source domain) and clean images (target domain) are both fed into PDA-GAN and CGL-GAN. The results produced by PDA-GAN are relatively consistent, indicating that our method achieves better feature alignment between the two domains. 0.1289, and 0.1263. The corresponding mean values calculated by PDA-GAN are 0.0293, … view at source ↗
Figure 8
Figure 8. Figure 8: Impact of Gaussian blur. Layouts in each row are generated using the same input image, while those in each column correspond to different input samples. “PDA-GAN with Gaussian blur” indicates that the input data is preprocessed with Gaussian blurring. The blue-numbered boxes in the middle show enlarged views of the regions marked with yellow numbers. The left part of the vertical dotted line displays the i… view at source ↗
Figure 9
Figure 9. Figure 9: Generalization ability of PDA-GAN on PKU-Dataset [33]. Gaussian blur tends to generate layout bounding boxes that occlude subject or product regions. Such layouts diminish the visibility of subjects and the clarity of layout elements in advertising posters. These quantitative and qualitative results demonstrate that the loss of image details caused by Gaussian blur degrades the quality of the generated ima… view at source ↗
Figure 10
Figure 10. Figure 10: Language-guided layout generation with varying element counts and types. Each row corresponds to the same background image. The left part shows layouts generated from prompts specifying the number of elements (e.g., “generate three elements”, 2 to 6 from left to right), while the right part shows layouts guided by prompts specifying element types (e.g., “generate text and underlay”). G D P a t c h D P D … view at source ↗
Figure 11
Figure 11. Figure 11: Failure cases. The first to third rows show layouts generated by models with the global discriminator (GD), patch discriminator (PatchD), and our PD module, respectively. ACKNOWLEDGMENTS We sincerely thank the anonymous reviewers for their professional and constructive comments, which have helped us improve the quality and clarity of this paper. Weiwei Xu is partially supported by “Pioneer” and “Leading G… view at source ↗
read the original abstract

Layout plays a crucial role in graphic design and poster generation. Recently, the application of deep learning models for layout generation has gained significant attention. This paper focuses on using a GAN-based model conditioned on images to generate advertising poster graphic layouts, requiring a dataset of paired product images and layouts. To address this task, we introduce the Content-aware Graphic Layout Dataset (CGL-Dataset), consisting of 60,548 paired inpainted posters with annotations and 121,000 clean product images. The inpainting artifacts introduce a domain gap between the inpainted posters and clean images. To bridge this gap, we design two GAN-based models. The first model, CGL-GAN, uses Gaussian blur on the inpainted regions to generate layouts. The second model combines unsupervised domain adaptation by introducing a GAN with a pixel-level discriminator (PD), abbreviated as PDA-GAN, to generate image-aware layouts based on the visual texture of input images. The PD is connected to shallow-level feature maps and computes the GAN loss for each input-image pixel. Additionally, we propose three novel content-aware metrics to assess the model's ability to capture the intricate relationships between graphic elements and image content. Quantitative and qualitative evaluations demonstrate that PDA-GAN achieves state-of-the-art performance and generates high-quality image-aware layouts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Content-aware Graphic Layout Dataset (CGL-Dataset) with 60,548 paired inpainted posters and 121,000 clean product images to support image-conditioned layout generation for advertising posters. It proposes CGL-GAN (using Gaussian blur on inpainted regions) and PDA-GAN (an unsupervised domain-adaptation GAN with a pixel-level discriminator attached to shallow feature maps that computes per-pixel GAN loss). Three new content-aware metrics are defined to evaluate how well generated layouts respect image content, and the authors claim that PDA-GAN achieves state-of-the-art quantitative and qualitative results on clean product images.

Significance. If the domain adaptation demonstrably aligns the inpainted training distribution with clean test images and the new metrics are shown to be non-redundant with existing layout metrics, the work would provide a practical advance for automated poster design and a useful public dataset. The explicit handling of the inpainting artifact gap is a relevant engineering contribution in conditional layout generation.

major comments (3)
  1. [Method / PDA-GAN architecture description] The central claim that PDA-GAN produces image-aware layouts on clean inputs rests on the pixel-level discriminator successfully bridging the inpainting domain gap. No domain-discrepancy metrics (e.g., MMD, Fréchet distance on feature distributions, or discriminator accuracy on held-out clean vs. inpainted pairs), t-SNE visualizations, or ablation removing the PD are reported. Without such evidence the performance gains on clean images could be attributable to the base conditional generator rather than adaptation.
  2. [Evaluation / Metrics section] The three proposed content-aware metrics are introduced to quantify relationships between graphic elements and image content, yet the manuscript provides neither their exact mathematical definitions nor a correlation analysis showing they capture information orthogonal to standard layout metrics (e.g., IoU, overlap, or alignment scores). This weakens the assertion that PDA-GAN is superior specifically in content awareness.
  3. [Experiments / Quantitative comparison] Table of quantitative results (presumably in §5) reports SOTA numbers for PDA-GAN, but the baselines listed do not include any other domain-adaptation or inpainting-robust layout models. Consequently it is impossible to isolate whether the reported improvement stems from the pixel discriminator or from other architectural choices.
minor comments (2)
  1. [Abstract] The abstract introduces the abbreviation PDA-GAN without first spelling out “Pixel-level Discriminator Adaptation GAN.”
  2. [Method] Notation for the per-pixel GAN loss (how the discriminator output is aggregated over pixels) is not defined in the provided description of the PD attachment to shallow feature maps.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. We address each major point below, providing clarifications and committing to revisions where appropriate to strengthen the evidence for our domain adaptation approach and metrics.

read point-by-point responses
  1. Referee: The central claim that PDA-GAN produces image-aware layouts on clean inputs rests on the pixel-level discriminator successfully bridging the inpainting domain gap. No domain-discrepancy metrics (e.g., MMD, Fréchet distance on feature distributions, or discriminator accuracy on held-out clean vs. inpainted pairs), t-SNE visualizations, or ablation removing the PD are reported. Without such evidence the performance gains on clean images could be attributable to the base conditional generator rather than adaptation.

    Authors: We appreciate this observation. The superior quantitative and qualitative results of PDA-GAN over CGL-GAN on clean images provide supporting evidence that the pixel discriminator contributes to bridging the domain gap. To directly address the concern and strengthen the claim, we will add an ablation study comparing PDA-GAN with and without the pixel discriminator, along with domain discrepancy metrics such as MMD on feature distributions and t-SNE visualizations of inpainted vs. clean image features in the revised manuscript. revision: yes

  2. Referee: The three proposed content-aware metrics are introduced to quantify relationships between graphic elements and image content, yet the manuscript provides neither their exact mathematical definitions nor a correlation analysis showing they capture information orthogonal to standard layout metrics (e.g., IoU, overlap, or alignment scores). This weakens the assertion that PDA-GAN is superior specifically in content awareness.

    Authors: The exact mathematical definitions of the three content-aware metrics are detailed in Section 4.3 of the manuscript. We agree that an explicit correlation analysis would better demonstrate their value. In the revised version, we will add a correlation study (including Pearson or Spearman coefficients) between our metrics and standard layout metrics such as IoU, overlap, and alignment scores to show they capture orthogonal information related to content awareness. revision: yes

  3. Referee: Table of quantitative results (presumably in §5) reports SOTA numbers for PDA-GAN, but the baselines listed do not include any other domain-adaptation or inpainting-robust layout models. Consequently it is impossible to isolate whether the reported improvement stems from the pixel discriminator or from other architectural choices.

    Authors: We note that, to the best of our knowledge at submission time, no prior domain-adaptation methods existed specifically for image-conditioned layout generation on inpainted advertising posters. The comparison between CGL-GAN and PDA-GAN is designed to isolate the effect of the pixel discriminator. We will expand the related work and discussion sections to cover relevant domain adaptation techniques from other vision tasks and clarify the rationale for our baseline selection in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical GAN training and evaluation chain

full rationale

The paper presents a standard empirical pipeline: creation of a paired inpainted/clean image dataset, training of CGL-GAN and PDA-GAN variants (with pixel-level discriminator attached to shallow features for unsupervised domain adaptation), and evaluation via three proposed content-aware metrics plus quantitative/qualitative results. No derivation step reduces a claimed prediction or result to its inputs by construction, no fitted parameters are relabeled as independent predictions, and no load-bearing self-citations or uniqueness theorems are invoked to force the architecture or metrics. The central claims rest on experimental outcomes rather than tautological redefinitions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Abstract-only view limits visibility; main unstated elements are standard GAN training assumptions and the effectiveness of pixel discriminator for domain shift in this domain.

free parameters (1)
  • GAN training hyperparameters
    Learning rates, loss weights, and architecture details for CGL-GAN and PDA-GAN are not specified.
axioms (1)
  • domain assumption GANs with pixel-level discriminators can adapt layouts to image visual texture
    Core modeling choice for bridging inpainting artifacts to clean images.

pith-pipeline@v0.9.0 · 5532 in / 1190 out tokens · 34813 ms · 2026-05-10T17:52:36.592166+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 6 canonical work pages

  1. [1]

    Generative adversarial nets,

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y . Bengio, “Generative adversarial nets,” inAnnual Conference on Neural Information Processing Systems, 2014, pp. 2672–2680

  2. [2]

    Layoutgan: Generating graphic layouts with wireframe discriminators

    J. Li, J. Yang, A. Hertzmann, J. Zhang, and T. Xu, “Layoutgan: Generating graphic layouts with wireframe discriminators.” ICLR, 2019

  3. [3]

    Attribute- conditioned layout GAN for automatic graphic design,

    J. Li, J. Yang, J. Zhang, C. Liu, C. Wang, and T. Xu, “Attribute- conditioned layout GAN for automatic graphic design,”IEEE Trans. Vis. Comput. Graph., vol. 27, no. 10, pp. 4039–4048, 2021

  4. [4]

    Content-aware generative modeling of graphic design layouts,

    X. Zheng, X. Qiao, Y . Cao, and R. W. H. Lau, “Content-aware generative modeling of graphic design layouts,”ACM Trans. Graph., vol. 38, no. 4, pp. 133:1–133:15, 2019

  5. [5]

    End-to-end object detection with transformers,

    N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” inECCV (1), ser. Lecture Notes in Computer Science, vol. 12346. Springer, 2020, pp. 213–229

  6. [6]

    Resolution-robust large mask inpainting with fourier convolutions,

    R. Suvorov, E. Logacheva, A. Mashikhin, A. Remizova, A. Ashukha, A. Silvestrov, N. Kong, H. Goka, K. Park, and V . Lempitsky, “Resolution-robust large mask inpainting with fourier convolutions,” inIEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, January 3-8, 2022. IEEE, 2022, pp. 3172–3182

  7. [7]

    Unsupervised domain adaptation for zero-shot learning,

    E. Kodirov, T. Xiang, Z. Fu, and S. Gong, “Unsupervised domain adaptation for zero-shot learning,” inICCV. IEEE Computer Society, 2015, pp. 2452–2460

  8. [8]

    Simultaneous deep transfer across domains and tasks,

    E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous deep transfer across domains and tasks,” inICCV. IEEE Computer Society, 2015, pp. 4068–4076

  9. [9]

    Adversarial multiple source domain adaptation,

    H. Zhao, S. Zhang, G. Wu, J. M. F. Moura, J. P. Costeira, and G. J. Gordon, “Adversarial multiple source domain adaptation,” inNeurIPS, 2018, pp. 8568–8579

  10. [10]

    A brief review of domain adaptation,

    A. Farahani, S. V oghoei, K. Rasheed, and H. R. Arabnia, “A brief review of domain adaptation,”CoRR, vol. abs/2010.03978, 2020

  11. [11]

    Cross-modal learning for domain adaptation in 3d semantic segmentation,

    M. Jaritz, T. Vu, R. de Charette, ´E. Wirbel, and P. P ´erez, “Cross-modal learning for domain adaptation in 3d semantic segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1533–1544, 2023

  12. [12]

    Image-to-image translation with conditional adversarial networks,

    P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inCVPR. IEEE Computer Society, 2017, pp. 5967–5976

  13. [13]

    Layoutgan: Generating graphic layouts with wireframe discriminators,

    J. Li, J. Yang, A. Hertzmann, J. Zhang, and T. Xu, “Layoutgan: Generating graphic layouts with wireframe discriminators,”CoRR, vol. abs/1901.06767, 2019

  14. [14]

    Layoutvae: Stochastic scene layout generation from a label set

    A. A. Jyothi, T. Durand, J. He, L. Sigal, and G. Mori, “Layoutvae: Stochastic scene layout generation from a label set.” ICCV , 2019, pp. 9894–9903

  15. [15]

    Variational transformer networks for layout generation,

    D. M. Arroyo, J. Postels, and F. Tombari, “Variational transformer networks for layout generation,” inCVPR, 2021, pp. 13 642–13 652

  16. [16]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium,

    M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,”Advances in neural information processing systems, vol. 30, 2017

  17. [17]

    Retrieval-augmented layout transformer for content-aware layout gen- eration,

    D. Horita, N. Inoue, K. Kikuchi, K. Yamaguchi, and K. Aizawa, “Retrieval-augmented layout transformer for content-aware layout gen- eration,” inCVPR, 2024, pp. 67–76

  18. [18]

    Composition- aware graphic layout GAN for visual-textual presentation designs,

    M. Zhou, C. Xu, Y . Ma, T. Ge, Y . Jiang, and W. Xu, “Composition- aware graphic layout GAN for visual-textual presentation designs,” in IJCAI. ijcai.org, 2022, pp. 4995–5001

  19. [19]

    Unsupervised domain adaption with pixel-level discriminator for image-aware layout genera- tion,

    C. Xu, M. Zhou, T. Ge, Y . Jiang, and W. Xu, “Unsupervised domain adaption with pixel-level discriminator for image-aware layout genera- tion,” inCVPR. IEEE, 2023, pp. 10 114–10 123

  20. [20]

    Adaptive grid-based document layout,

    C. E. Jacobs, W. Li, E. Schrier, D. Bargeron, and D. Salesin, “Adaptive grid-based document layout,”ACM Trans. Graph., vol. 22, no. 3, pp. 838–847, 2003

  21. [21]

    Stochastic language models for style-directed layout analysis of document images,

    T. Kanungo and S. Mao, “Stochastic language models for style-directed layout analysis of document images,”IEEE Trans. Image Process., vol. 12, no. 5, pp. 583–596, 2003

  22. [22]

    Bricolage: example-based retargeting for web design,

    R. Kumar, J. O. Talton, S. Ahmad, and S. R. Klemmer, “Bricolage: example-based retargeting for web design,” inProceedings of the Inter- national Conference on Human Factors in Computing Systems, 2011, pp. 2197–2206

  23. [23]

    Automatic stylistic manga layout,

    Y . Cao, A. B. Chan, and R. W. H. Lau, “Automatic stylistic manga layout,”ACM Trans. Graph., vol. 31, no. 6, pp. 141:1–141:10, 2012

  24. [24]

    Learning layouts for single-pagegraphic designs,

    P. O’Donovan, A. Agarwala, and A. Hertzmann, “Learning layouts for single-pagegraphic designs,”IEEE Trans. Vis. Comput. Graph., vol. 20, no. 8, pp. 1200–1213, 2014

  25. [25]

    Influence of color-to-gray conversion on the performance of document image binarization: Toward a novel optimization problem,

    R. Hedjam, H. Z. Nafchi, M. Kalacska, and M. Cheriet, “Influence of color-to-gray conversion on the performance of document image binarization: Toward a novel optimization problem,”IEEE Trans. Image Process., vol. 24, no. 11, pp. 3637–3651, 2015

  26. [26]

    Layoutdm: Discrete diffusion model for controllable layout generation,

    N. Inoue, K. Kikuchi, E. Simo-Serra, M. Otani, and K. Yamaguchi, “Layoutdm: Discrete diffusion model for controllable layout generation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 10 167–10 176

  27. [27]

    Layoutdiffusion: Improving graphic layout generation by discrete diffusion probabilistic models,

    J. Zhang, J. Guo, S. Sun, J.-G. Lou, and D. Zhang, “Layoutdiffusion: Improving graphic layout generation by discrete diffusion probabilistic models,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 7226–7236

  28. [28]

    Neural design network: Graphic layout generation with constraints

    H. Lee, L. Jiang, I. Essa, P. B. Le, H. Gong, M. Yang, and W. Yang, “Neural design network: Graphic layout generation with constraints.” ECCV , 2020, pp. 491–506

  29. [29]

    Layouttransformer: Scene layout generation with conceptual and spatial diversity

    C. Yang, W. Fan, F. Yang, and Y . F. Wang, “Layouttransformer: Scene layout generation with conceptual and spatial diversity.” CVPR, 2021, pp. 3732–3741

  30. [30]

    Constrained graphic layout generation via latent optimization

    K. Kikuchi, E. Simo-Serra, M. Otani, and K. Yamaguchi, “Constrained graphic layout generation via latent optimization.” ACM Multimedia Conference, 2021, pp. 88–96

  31. [31]

    Layouttransformer: Layout generation and completion with self-attention,

    K. Gupta, J. Lazarow, A. Achille, L. Davis, V . Mahadevan, and A. Shri- vastava, “Layouttransformer: Layout generation and completion with self-attention,” inICCV. IEEE, 2021, pp. 984–994

  32. [32]

    Layout- prompter: awaken the design ability of large language models,

    J. Lin, J. Guo, S. Sun, Z. Yang, J.-G. Lou, and D. Zhang, “Layout- prompter: awaken the design ability of large language models,”Advances in Neural Information Processing Systems, vol. 36, pp. 43 852–43 879, 2023

  33. [33]

    Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout,

    H. Hsu, X. He, Y . Peng, H. Kong, and Q. Zhang, “Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout,” inCVPR. IEEE, 2023, pp. 6018–6026

  34. [34]

    Unsupervised domain adaptation of deep object detectors,

    D. Majumdar and V . P. Namboodiri, “Unsupervised domain adaptation of deep object detectors,” inESANN, 2018

  35. [35]

    Domain adaptation for object detection using SE adaptors and center loss,

    S. Nagesh, S. Rajesh, A. Baig, and S. Srinivasan, “Domain adaptation for object detection using SE adaptors and center loss,”CoRR, vol. abs/2205.12923, 2022

  36. [36]

    Domain adaptation for object recognition using subspace sampling demons,

    Y . Zhang and B. D. Davison, “Domain adaptation for object recognition using subspace sampling demons,”Multim. Tools Appl., vol. 80, no. 15, pp. 23 255–23 274, 2021

  37. [37]

    Spectral unsupervised domain adaptation for visual recognition,

    J. Zhang, J. Huang, Z. Tian, and S. Lu, “Spectral unsupervised domain adaptation for visual recognition,” inCVPR. IEEE, 2022, pp. 9819– 9830

  38. [38]

    Unsupervised pixel-level domain adaptation with generative adversarial networks,

    K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan, “Unsupervised pixel-level domain adaptation with generative adversarial networks,” inCVPR. IEEE Computer Society, 2017, pp. 95–104

  39. [39]

    Multi-adversarial domain adaptation,

    Z. Pei, Z. Cao, M. Long, and J. Wang, “Multi-adversarial domain adaptation,” inAAAI. AAAI Press, 2018, pp. 3934–3941

  40. [40]

    Learning to model relationships for zero-shot video classification,

    J. Gao, T. Zhang, and C. Xu, “Learning to model relationships for zero-shot video classification,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 10, pp. 3476–3491, 2021. [Online]. Available: https://doi.org/10.1109/TPAMI.2020.2985708

  41. [41]

    Multi-source unsupervised domain adaptation via pseudo target domain,

    C. Ren, Y . H. Liu, X. Zhang, and K. Huang, “Multi-source unsupervised domain adaptation via pseudo target domain,”IEEE Trans. Image Process., vol. 31, pp. 2122–2135, 2022

  42. [42]

    Madav2: Advanced multi-anchor based active domain adaptation segmentation,

    M. Ning, D. Lu, Y . Xie, D. Chen, D. Wei, Y . Zheng, Y . Tian, S. Yan, and L. Yuan, “Madav2: Advanced multi-anchor based active domain adaptation segmentation,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 13 553–13 566, 2023

  43. [43]

    Domain-adversarial training of neural networks,

    Y . Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V . S. Lempitsky, “Domain-adversarial training of neural networks,” inDomain Adaptation in Computer Vision Applications, ser. Advances in Computer Vision and Pattern Recognition, G. Csurka, Ed. Springer, 2017, pp. 189–209. [Online]. Available: https://doi.org/10....

  44. [44]

    Geometry aligned variational transformer for image-conditioned layout generation,

    Y . Cao, Y . Ma, M. Zhou, C. Liu, H. Xie, T. Ge, and Y . Jiang, “Geometry aligned variational transformer for image-conditioned layout generation,” inACM Multimedia. ACM, 2022, pp. 1561–1571

  45. [45]

    Microsoft COCO: common objects in context,

    T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft COCO: common objects in context,” inECCV (5), ser. Lecture Notes in Computer Science, vol

  46. [46]

    Springer, 2014, pp. 740–755

  47. [47]

    Rico: A mobile app dataset for building data-driven design applications,

    B. Deka, Z. Huang, C. Franzen, J. Hibschman, D. Afergan, Y . Li, J. Nichols, and R. Kumar, “Rico: A mobile app dataset for building data-driven design applications,” inUIST. ACM, 2017, pp. 845–854. 18

  48. [48]

    Learning design semantics for mobile apps,

    T. F. Liu, M. Craft, J. Situ, E. Yumer, R. Mech, and R. Kumar, “Learning design semantics for mobile apps,” inUIST. ACM, 2018, pp. 569–579

  49. [49]

    Publaynet: Largest dataset ever for document layout analysis,

    X. Zhong, J. Tang, and A. Jimeno-Yepes, “Publaynet: Largest dataset ever for document layout analysis,” inICDAR, 2019, pp. 1015–1022

  50. [50]

    Progressive feature polishing network for salient object detection,

    B. Wang, Q. Chen, M. Zhou, Z. Zhang, X. Jin, and K. Gai, “Progressive feature polishing network for salient object detection,” inAAAI. AAAI Press, 2020, pp. 12 128–12 135

  51. [51]

    Rethinking the inception architecture for computer vision,

    C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR. IEEE Computer Society, 2016, pp. 2818–2826

  52. [52]

    NIPS 2016 Tutorial: Generative Adversarial Networks

    I. J. Goodfellow, “NIPS 2016 tutorial: Generative adversarial networks,” CoRR, vol. abs/1701.00160, 2017

  53. [53]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR. IEEE Computer Society, 2016, pp. 770–778

  54. [54]

    Feature pyramid networks for object detection,

    T. Lin, P. Doll ´ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” inCVPR. IEEE Computer Society, 2017, pp. 936–944

  55. [55]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” inNIPS, 2017, pp. 5998–6008

  56. [56]

    Feature pyramid networks for object detection,

    T. Lin, P. Doll ´ar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection,” inCVPR, 2017, pp. 936–944

  57. [57]

    Very deep convolutional networks for large-scale image recognition,

    K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” inICLR, 2015

  58. [58]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inICML, vol. 139, 2021, pp. 8748–8763

  59. [59]

    Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers,

    H. Chefer, S. Gur, and L. Wolf, “Generic attention-model explainability for interpreting bi-modal and encoder-decoder transformers,” inICCV. IEEE, 2021, pp. 387–396

  60. [60]

    Layoutgan: Syn- thesizing graphic layouts with vector-wireframe adversarial networks,

    J. Li, J. Yang, A. Hertzmann, J. Zhang, and T. Xu, “Layoutgan: Syn- thesizing graphic layouts with vector-wireframe adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 7, pp. 2388–2399, 2021

  61. [61]

    Adam: A method for stochastic optimization,

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inICLR, San Diego, CA, USA, May 7-9, 2015, Y . Bengio and Y . LeCun, Eds., 2015

  62. [62]

    Language mod- els are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askellet al., “Language mod- els are few-shot learners,”Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020. Chenchen Xureceived the B.Sc. and M.Sc. degrees from Anhui Normal University, Wuhu, China, in 2016 and 2020, res...