arxiv: 2604.04859 · v1 · submitted 2026-04-06 · 💻 cs.CV

Recognition: no theorem link

Unified Vector Floorplan Generation via Markup Representation

Kaede Shiohara , Toshihiko Yamasaki

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords floorplan generationtransformer modelmarkup languagevector representationconditional generationRPLAN datasetgenerative modelunified architecture

0 comments

The pith

A markup language encodes floorplans as token sequences so one transformer model can handle every conditional generation task.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Floorplan Markup Language (FML), a structured grammar that represents all floorplan information in a single sequence format. This representation converts the generation problem into next-token prediction, allowing a single transformer-based model called FMLM to accept inputs such as site boundaries, adjacency graphs, or partial layouts and produce complete vector floorplans. Experiments on the RPLAN dataset show that this unified model exceeds the performance of prior methods built for individual tasks. A sympathetic reader would care because it removes the need for separate specialized systems while preserving functionality and geometric precision.

Core claim

Encoding floorplan data in FML casts generation as autoregressive token prediction, so a single transformer produces high-fidelity, functional vector floorplans from heterogeneous conditions without task-specific retraining or architectures.

What carries the argument

Floorplan Markup Language (FML), a grammar that serializes rooms, walls, doors, and constraints into a token sequence for transformer prediction.

If this is right

One model replaces multiple task-specific generators while maintaining or improving output quality on each task.
Floorplans remain vector-based and editable rather than raster approximations.
New conditional inputs can be incorporated by extending the grammar without redesigning the model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-sequence approach could be tested on 3D building layouts or non-residential spaces if an extended grammar is defined.
Training cost drops because a single model serves all conditions instead of one per task.
Interactive design tools become simpler since users can supply mixed constraints in the same format.

Load-bearing premise

The FML grammar can represent every spatial relationship, functional constraint, and geometric detail of any valid floorplan without ambiguity or information loss.

What would settle it

Any valid residential floorplan that cannot be losslessly encoded in FML, or any input condition for which the trained model produces invalid or non-functional outputs.

Figures

Figures reproduced from arXiv: 2604.04859 by Kaede Shiohara, Toshihiko Yamasaki.

**Figure 1.** Figure 1: Our Floorplan Markup Language Model directly generates vector floorplans under the wide range of situations. [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Floorplan Markup Language (FML). We represent floorplans, boundaries, and graphs in a markup manner, which unifies the various floorplan generation tasks into a single task of FML sequence generation. in advance specifying the number of rooms and the number of vertices each room should have [11, 26], or to require additional networks to infer the underlying structures including edge extraction and room t… view at source ↗

**Figure 3.** Figure 3: Overview of FMLM. Similarly to LLMs, FMLM is based on a simple transformer model that is trained by next token prediction and performs autoregressive inference. spond to the beginning and ending of condition information, respectively. FML can use <boundary> and <graph> to start to describe the information about boundary and graph condition, respectively. FML always describes in the order of boundary → grap… view at source ↗

**Figure 4.** Figure 4: Floorplan completion and editing. (a) Our model complements incomplete floorplans by just starting with an incomplete sequence. (b) With this capability, our model can be incorporated into interactive editing with users. a length of L and f be the transformer that takes {xi} l i=1 and outputs embedding f({xi} l i=1) ∈ R C in the teacherforcing manner to predict the next token xl+1. The embedding f({xi} … view at source ↗

**Figure 5.** Figure 5: Qualitative comparison with Graph2Plan and HouseDiffusion. Our model generates more consistent and realistic examples from various types of conditions, (a) boundary, (b) graph, (c) boundary and graph, than task-specific models such as Graph2Plan and HouseDiffusion. Note that Graph2Plan generates only a single floorplan for each condition. Best viewed in zoom. distribution distance between real floorplan im… view at source ↗

**Figure 6.** Figure 6: Floorplan completion. Condition Setting FID (↓) GED (↓) IoU (↑) B & G w/o Permutation 24.36 2.35 95.82 Ours 14.17 1.24 97.59 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 8.** Figure 8: Failure cases. Dataset Sample FMLM (Ours) FMLM (Ours) 8 Rooms 7 Rooms Dataset Sample FMLM (Ours) Dataset Sample FMLM (Ours) 6 Rooms 5 Rooms Dataset Sample [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

**Figure 9.** Figure 9: Number-conditional generation. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_9.png] view at source ↗

**Figure 10.** Figure 10: Additional generated examples. 13 [PITH_FULL_IMAGE:figures/full_fig_p013_10.png] view at source ↗

read the original abstract

Automatic residential floorplan generation has long been a central challenge bridging architecture and computer graphics, aiming to make spatial design more efficient and accessible. While early methods based on constraint satisfaction or combinatorial optimization ensure feasibility, they lack diversity and flexibility. Recent generative models achieve promising results but struggle to generalize across heterogeneous conditional tasks, such as generation from site boundaries, room adjacency graphs, or partial layouts, due to their suboptimal representations. To address this gap, we introduce Floorplan Markup Language (FML), a general representation that encodes floorplan information within a single structured grammar, which casts the entire floorplan generation problem into a next token prediction task. Leveraging FML, we develop a transformer-based generative model, FMLM, capable of producing high-fidelity and functional floorplans under diverse conditions. Comprehensive experiments on the RPLAN dataset demonstrate that FMLM, despite being a single model, surpasses the previous task-specific state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FML turns floorplan tasks into one sequence model but the paper needs to show the markup actually preserves geometry and constraints without hidden losses.

read the letter

The punchline is that this paper casts several conditional floorplan problems as next-token prediction over a custom markup language called FML, then trains a single transformer (FMLM) that reportedly beats prior task-specific models on RPLAN. That unification step is the real move; most earlier work built separate pipelines for boundary inputs, adjacency graphs, or partial layouts, so one model that ingests any of them is practically useful if it holds up.

Referee Report

2 major / 2 minor

Summary. The paper introduces Floorplan Markup Language (FML), a structured grammar that encodes floorplan information (including geometry, topology, and conditioning signals such as site boundaries or adjacency graphs) as a sequence, thereby reducing diverse floorplan generation tasks to autoregressive next-token prediction. It then presents FMLM, a transformer model trained on this representation, and claims that a single FMLM instance produces high-fidelity, functional floorplans under heterogeneous conditions and outperforms prior task-specific state-of-the-art methods on the RPLAN dataset.

Significance. If the FML representation is shown to be lossless and unambiguous and the quantitative superiority is demonstrated with proper controls, the work would offer a genuinely unifying framework that replaces multiple specialized pipelines with one sequence model. This could simplify research and deployment in architectural generative modeling.

major comments (2)

[Abstract and §5] Abstract and §5 (Experiments): The central claim that FMLM 'surpasses the previous task-specific state-of-the-art methods' is unsupported by any reported metrics, baselines, ablation studies, or error analysis. Without these data the performance assertion cannot be evaluated and may rest on post-hoc tuning or dataset-specific advantages.
[§3] §3 (Floorplan Markup Language): The claim that FML provides a lossless, unambiguous encoding of all spatial relationships, functional constraints, and geometric details is load-bearing for the unification argument, yet the manuscript supplies no reconstruction-fidelity metrics, failure-case analysis on non-Manhattan geometries, or verification that every valid RPLAN floorplan can be round-tripped without discretization or ordering ambiguity.

minor comments (2)

[§4] Clarify the exact token vocabulary size, maximum sequence length, and any special tokens used for conditioning signals; these details are needed to reproduce the next-token prediction setup.
[§3] Ensure that all conditioning inputs (site boundaries, graphs, partial layouts) are illustrated with concrete FML examples in the same figure or table for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional quantitative evidence is needed to support our claims and will revise the manuscript to include the requested metrics, baselines, and verification results.

read point-by-point responses

Referee: [Abstract and §5] Abstract and §5 (Experiments): The central claim that FMLM 'surpasses the previous task-specific state-of-the-art methods' is unsupported by any reported metrics, baselines, ablation studies, or error analysis. Without these data the performance assertion cannot be evaluated and may rest on post-hoc tuning or dataset-specific advantages.

Authors: We acknowledge that the current manuscript does not report sufficient quantitative metrics, baselines, ablation studies, or error analysis to fully substantiate the performance claim. In the revised version we will add comprehensive tables comparing FMLM against the cited task-specific methods on the RPLAN dataset, including standard metrics, ablation studies on conditioning signals, and error analysis to demonstrate that the gains are not due to post-hoc tuning. revision: yes
Referee: [§3] §3 (Floorplan Markup Language): The claim that FML provides a lossless, unambiguous encoding of all spatial relationships, functional constraints, and geometric details is load-bearing for the unification argument, yet the manuscript supplies no reconstruction-fidelity metrics, failure-case analysis on non-Manhattan geometries, or verification that every valid RPLAN floorplan can be round-tripped without discretization or ordering ambiguity.

Authors: We agree that explicit verification of FML's lossless and unambiguous properties is required. The revised manuscript will include reconstruction-fidelity metrics (e.g., exact geometry and topology recovery rates), failure-case analysis covering non-Manhattan layouts, and round-trip experiments confirming that every valid RPLAN floorplan can be serialized and deserialized without discretization or ordering ambiguity. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical proposal of representation and model

full rationale

The paper introduces FML as an external structured grammar that encodes floorplan data, then trains a transformer (FMLM) via next-token prediction on RPLAN data to generate outputs under varied conditions. No step derives a result from parameters fitted to the target metric, renames a known result, or reduces a central claim to a self-citation chain or self-definitional loop. The unification and performance claims rest on experimental comparison rather than construction from the inputs themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are described. The FML representation itself functions as a new encoding scheme whose completeness is an unstated assumption.

pith-pipeline@v0.9.0 · 5455 in / 1139 out tokens · 39618 ms · 2026-05-10T19:56:01.376072+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 5 canonical work pages · 4 internal anchors

[1]

An exact graph edit distance algorithm for solving pattern recognition problems

Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, and Patrick Martineau. An exact graph edit distance algorithm for solving pattern recognition problems. InICPRAM, 2015. 2, 6

2015
[2]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv:1607.06450, 2016. 4

work page internal anchor Pith review Pith/arXiv arXiv 2016
[3]

Lan- guage models are few-shot learners.NeurIPS, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.NeurIPS, 2020. 3

2020
[4]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InCVPR,
[5]

Generative adversarial nets

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InNeurIPS,
[6]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv:2407.21783, 2024. 3, 4

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

arXiv preprint arXiv:1308.0850 (2013) 4, 5

Alex Graves. Generating sequences with recurrent neural networks.arXiv:1308.0850, 2013. 3

work page arXiv 2013
[8]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017. 2, 5

2017
[9]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 2

2020
[10]

Long short-term memory.Neural Computation, 1997

Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory.Neural Computation, 1997. 3

1997
[11]

Cons2plan: Vector floorplan genera- tion from various conditions via a learning framework based on conditional diffusion models

Shibo Hong, Xuhong Zhang, Tianyu Du, Sheng Cheng, Xun Wang, and Jianwei Yin. Cons2plan: Vector floorplan genera- tion from various conditions via a learning framework based on conditional diffusion models. InMM, 2024. 2, 3, 5

2024
[12]

Floorplan restoration by structure hallucinating transformer cascades

Sepidehsadat Hosseini and Yasutaka Furukawa. Floorplan restoration by structure hallucinating transformer cascades. InBMVC, 2023. 12

2023
[13]

Puzzlefusion: Unleash- ing the power of diffusion models for spatial puzzle solving

Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, and Yasutaka Furukawa. Puzzlefusion: Unleash- ing the power of diffusion models for spatial puzzle solving. InNeurIPS, 2023. 12

2023
[14]

Graph2plan: Learning floorplan generation from layout graphs.TOG, 2020

Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick, Hao Zhang, and Hui Huang. Graph2plan: Learning floorplan generation from layout graphs.TOG, 2020. 2, 7

2020
[15]

Gsdiff: Synthesizing vector floorplans via geometry-enhanced structural graph generation

Sizhe Hu, Wenming Wu, Yuntao Wang, Benzhu Xu, and Liping Zheng. Gsdiff: Synthesizing vector floorplans via geometry-enhanced structural graph generation. InAAAI,
[16]

Adam: A method for stochastic optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 7

2015
[17]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv:1312.6114, 2013. 2

work page internal anchor Pith review Pith/arXiv arXiv 2013
[18]

Floor plan gen- eration through a mixed constraint programming-genetic op- timization approach.Automation in Construction, 2021

Graziella Laignel, Nicolas Pozin, Xavier Geffrier, Loukas Delevaux, Florian Brun, and Bastien Dolla. Floor plan gen- eration through a mixed constraint programming-genetic op- timization approach.Automation in Construction, 2021. 2

2021
[19]

Tell2design: A dataset for language-guided floor plan generation

Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Joyce, and Wei Lu. Tell2design: A dataset for language-guided floor plan generation. InACL,
[20]

Han Liu, Yong-Liang Yang, Sawsan Alhalawani, and Niloy J. Mitra. Constraint-aware interior layout exploration for pre-cast concrete-based buildings.Vis. Comput., 2013. 2

2013
[21]

Floorplangan: Vector resi- dential floorplan adversarial generation.Automation in Con- struction, 2022

Ziniu Luo and Weixin Huang. Floorplangan: Vector resi- dential floorplan adversarial generation.Automation in Con- struction, 2022. 2

2022
[22]

House-gan: Relational gener- ative adversarial networks for graph-constrained house lay- out generation

Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Furukawa. House-gan: Relational gener- ative adversarial networks for graph-constrained house lay- out generation. InECCV, 2020. 2, 3

2020
[23]

House- gan++: Generative adversarial layout refinement network to- wards intelligent computational agent for professional archi- tects

Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa. House- gan++: Generative adversarial layout refinement network to- wards intelligent computational agent for professional archi- tects. InCVPR, 2021. 2, 5, 6, 7

2021
[24]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zem- ing Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019. 7

2019
[25]

Ex- treme structure from motion for indoor panoramas without visual overlaps

Mohammad Amin Shabani, Weilian Song, Makoto Odamaki, Hirochika Fujiki, and Yasutaka Furukawa. Ex- treme structure from motion for indoor panoramas without visual overlaps. InICCV, 2021. 12

2021
[26]

Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising

Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- sutaka Furukawa. Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising. InCVPR, 2023. 2, 3, 5, 7

2023
[27]

Sequence to sequence learning with neural networks.NeurIPS, 2014

Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks.NeurIPS, 2014. 3

2014
[28]

MAGI-1: Autoregressive Video Generation at Scale

Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, WQ Zhang, Weifeng Luo, et al. Magi-1: Autoregressive video genera- tion at scale.arXiv:2505.13211, 2025. 3

work page internal anchor Pith review arXiv 2025
[29]

Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 2024

Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Li- wei Wang. Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 2024. 3

2024
[30]

Conditional image gen- eration with pixelcnn decoders.NeurIPS, 2016

Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image gen- eration with pixelcnn decoders.NeurIPS, 2016. 3

2016
[31]

Attention is all you need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 3, 4

2017
[32]

Continuous 3d per- ception model with persistent state

Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InCVPR, 2025. 3 9

2025
[33]

MIQP-based Layout Design for Building Interiors.Com- puter Graphics Forum, 2018

Wenming Wu, Lubin Fan, Ligang Liu, and Peter Wonka. MIQP-based Layout Design for Building Interiors.Com- puter Graphics Forum, 2018. 2

2018
[34]

Miqp-based layout design for building interiors.TOG, 2019

Wenming Wu, Lubin Fan, Ligang Liu, and Peter Wonka. Miqp-based layout design for building interiors.TOG, 2019. 2, 5

2019
[35]

Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction

Fuyang Zhang, Nelson Nauata, and Yasutaka Furukawa. Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction. InCVPR,
[36]

More Implementation Details Hyper-parameters.We give the additional informa- tion about the hyper-parameters of our model in Table 8

2 10 Unified Vector Floorplan Generation via Markup Representation Supplementary Material A. More Implementation Details Hyper-parameters.We give the additional informa- tion about the hyper-parameters of our model in Table 8. Hyperparameter Value Transformer Dimension512 MLP Dimension 2048 Num Heads 32 Num Layers 24 Temperature 0.6 Top P 0.8 Table 8.Hype...

2048
[37]

An interior and front door should have just two ver- tices
[38]

Room vertices should be placed outside the previously generated rooms
[39]

Interior door vertices should be placed on a edge be- tween two different rooms
[40]

Front door vertices should be placed on a edge be- tween a room and the outside region
[41]

select the most functional and natu- ral floorplan

A room should have four or more vertices. Table 11.Constrains used in decoding. Pre-processing.We follow the pre-processing code pro- vided by HouseGAN++. Also, since FML requires that ev- ery two rooms supposed to be adjacent must share an edge for a door, we inflated the room polygons so that they have the shared edge. We also re-computed the adjacency ...