Recognition: no theorem link
Unified Vector Floorplan Generation via Markup Representation
Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3
The pith
A markup language encodes floorplans as token sequences so one transformer model can handle every conditional generation task.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Encoding floorplan data in FML casts generation as autoregressive token prediction, so a single transformer produces high-fidelity, functional vector floorplans from heterogeneous conditions without task-specific retraining or architectures.
What carries the argument
Floorplan Markup Language (FML), a grammar that serializes rooms, walls, doors, and constraints into a token sequence for transformer prediction.
If this is right
- One model replaces multiple task-specific generators while maintaining or improving output quality on each task.
- Floorplans remain vector-based and editable rather than raster approximations.
- New conditional inputs can be incorporated by extending the grammar without redesigning the model.
Where Pith is reading between the lines
- The same token-sequence approach could be tested on 3D building layouts or non-residential spaces if an extended grammar is defined.
- Training cost drops because a single model serves all conditions instead of one per task.
- Interactive design tools become simpler since users can supply mixed constraints in the same format.
Load-bearing premise
The FML grammar can represent every spatial relationship, functional constraint, and geometric detail of any valid floorplan without ambiguity or information loss.
What would settle it
Any valid residential floorplan that cannot be losslessly encoded in FML, or any input condition for which the trained model produces invalid or non-functional outputs.
Figures
read the original abstract
Automatic residential floorplan generation has long been a central challenge bridging architecture and computer graphics, aiming to make spatial design more efficient and accessible. While early methods based on constraint satisfaction or combinatorial optimization ensure feasibility, they lack diversity and flexibility. Recent generative models achieve promising results but struggle to generalize across heterogeneous conditional tasks, such as generation from site boundaries, room adjacency graphs, or partial layouts, due to their suboptimal representations. To address this gap, we introduce Floorplan Markup Language (FML), a general representation that encodes floorplan information within a single structured grammar, which casts the entire floorplan generation problem into a next token prediction task. Leveraging FML, we develop a transformer-based generative model, FMLM, capable of producing high-fidelity and functional floorplans under diverse conditions. Comprehensive experiments on the RPLAN dataset demonstrate that FMLM, despite being a single model, surpasses the previous task-specific state-of-the-art methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Floorplan Markup Language (FML), a structured grammar that encodes floorplan information (including geometry, topology, and conditioning signals such as site boundaries or adjacency graphs) as a sequence, thereby reducing diverse floorplan generation tasks to autoregressive next-token prediction. It then presents FMLM, a transformer model trained on this representation, and claims that a single FMLM instance produces high-fidelity, functional floorplans under heterogeneous conditions and outperforms prior task-specific state-of-the-art methods on the RPLAN dataset.
Significance. If the FML representation is shown to be lossless and unambiguous and the quantitative superiority is demonstrated with proper controls, the work would offer a genuinely unifying framework that replaces multiple specialized pipelines with one sequence model. This could simplify research and deployment in architectural generative modeling.
major comments (2)
- [Abstract and §5] Abstract and §5 (Experiments): The central claim that FMLM 'surpasses the previous task-specific state-of-the-art methods' is unsupported by any reported metrics, baselines, ablation studies, or error analysis. Without these data the performance assertion cannot be evaluated and may rest on post-hoc tuning or dataset-specific advantages.
- [§3] §3 (Floorplan Markup Language): The claim that FML provides a lossless, unambiguous encoding of all spatial relationships, functional constraints, and geometric details is load-bearing for the unification argument, yet the manuscript supplies no reconstruction-fidelity metrics, failure-case analysis on non-Manhattan geometries, or verification that every valid RPLAN floorplan can be round-tripped without discretization or ordering ambiguity.
minor comments (2)
- [§4] Clarify the exact token vocabulary size, maximum sequence length, and any special tokens used for conditioning signals; these details are needed to reproduce the next-token prediction setup.
- [§3] Ensure that all conditioning inputs (site boundaries, graphs, partial layouts) are illustrated with concrete FML examples in the same figure or table for direct comparison.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We agree that additional quantitative evidence is needed to support our claims and will revise the manuscript to include the requested metrics, baselines, and verification results.
read point-by-point responses
-
Referee: [Abstract and §5] Abstract and §5 (Experiments): The central claim that FMLM 'surpasses the previous task-specific state-of-the-art methods' is unsupported by any reported metrics, baselines, ablation studies, or error analysis. Without these data the performance assertion cannot be evaluated and may rest on post-hoc tuning or dataset-specific advantages.
Authors: We acknowledge that the current manuscript does not report sufficient quantitative metrics, baselines, ablation studies, or error analysis to fully substantiate the performance claim. In the revised version we will add comprehensive tables comparing FMLM against the cited task-specific methods on the RPLAN dataset, including standard metrics, ablation studies on conditioning signals, and error analysis to demonstrate that the gains are not due to post-hoc tuning. revision: yes
-
Referee: [§3] §3 (Floorplan Markup Language): The claim that FML provides a lossless, unambiguous encoding of all spatial relationships, functional constraints, and geometric details is load-bearing for the unification argument, yet the manuscript supplies no reconstruction-fidelity metrics, failure-case analysis on non-Manhattan geometries, or verification that every valid RPLAN floorplan can be round-tripped without discretization or ordering ambiguity.
Authors: We agree that explicit verification of FML's lossless and unambiguous properties is required. The revised manuscript will include reconstruction-fidelity metrics (e.g., exact geometry and topology recovery rates), failure-case analysis covering non-Manhattan layouts, and round-trip experiments confirming that every valid RPLAN floorplan can be serialized and deserialized without discretization or ordering ambiguity. revision: yes
Circularity Check
No circularity; empirical proposal of representation and model
full rationale
The paper introduces FML as an external structured grammar that encodes floorplan data, then trains a transformer (FMLM) via next-token prediction on RPLAN data to generate outputs under varied conditions. No step derives a result from parameters fitted to the target metric, renames a known result, or reduces a central claim to a self-citation chain or self-definitional loop. The unification and performance claims rest on experimental comparison rather than construction from the inputs themselves.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
An exact graph edit distance algorithm for solving pattern recognition problems
Zeina Abu-Aisheh, Romain Raveaux, Jean-Yves Ramel, and Patrick Martineau. An exact graph edit distance algorithm for solving pattern recognition problems. InICPRAM, 2015. 2, 6
2015
-
[2]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization.arXiv:1607.06450, 2016. 4
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[3]
Lan- guage models are few-shot learners.NeurIPS, 2020
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakan- tan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Lan- guage models are few-shot learners.NeurIPS, 2020. 3
2020
-
[4]
Taming transformers for high-resolution image synthesis
Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. InCVPR,
-
[5]
Generative adversarial nets
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. InNeurIPS,
-
[6]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Ab- hinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv:2407.21783, 2024. 3, 4
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[7]
arXiv preprint arXiv:1308.0850 (2013) 4, 5
Alex Graves. Generating sequences with recurrent neural networks.arXiv:1308.0850, 2013. 3
-
[8]
Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium.NeurIPS, 2017. 2, 5
2017
-
[9]
Denoising diffu- sion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 2
2020
-
[10]
Long short-term memory.Neural Computation, 1997
Sepp Hochreiter and J ¨urgen Schmidhuber. Long short-term memory.Neural Computation, 1997. 3
1997
-
[11]
Cons2plan: Vector floorplan genera- tion from various conditions via a learning framework based on conditional diffusion models
Shibo Hong, Xuhong Zhang, Tianyu Du, Sheng Cheng, Xun Wang, and Jianwei Yin. Cons2plan: Vector floorplan genera- tion from various conditions via a learning framework based on conditional diffusion models. InMM, 2024. 2, 3, 5
2024
-
[12]
Floorplan restoration by structure hallucinating transformer cascades
Sepidehsadat Hosseini and Yasutaka Furukawa. Floorplan restoration by structure hallucinating transformer cascades. InBMVC, 2023. 12
2023
-
[13]
Puzzlefusion: Unleash- ing the power of diffusion models for spatial puzzle solving
Sepidehsadat Hosseini, Mohammad Amin Shabani, Saghar Irandoust, and Yasutaka Furukawa. Puzzlefusion: Unleash- ing the power of diffusion models for spatial puzzle solving. InNeurIPS, 2023. 12
2023
-
[14]
Graph2plan: Learning floorplan generation from layout graphs.TOG, 2020
Ruizhen Hu, Zeyu Huang, Yuhan Tang, Oliver Van Kaick, Hao Zhang, and Hui Huang. Graph2plan: Learning floorplan generation from layout graphs.TOG, 2020. 2, 7
2020
-
[15]
Gsdiff: Synthesizing vector floorplans via geometry-enhanced structural graph generation
Sizhe Hu, Wenming Wu, Yuntao Wang, Benzhu Xu, and Liping Zheng. Gsdiff: Synthesizing vector floorplans via geometry-enhanced structural graph generation. InAAAI,
-
[16]
Adam: A method for stochastic optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 7
2015
-
[17]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding varia- tional bayes.arXiv:1312.6114, 2013. 2
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[18]
Floor plan gen- eration through a mixed constraint programming-genetic op- timization approach.Automation in Construction, 2021
Graziella Laignel, Nicolas Pozin, Xavier Geffrier, Loukas Delevaux, Florian Brun, and Bastien Dolla. Floor plan gen- eration through a mixed constraint programming-genetic op- timization approach.Automation in Construction, 2021. 2
2021
-
[19]
Tell2design: A dataset for language-guided floor plan generation
Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Joyce, and Wei Lu. Tell2design: A dataset for language-guided floor plan generation. InACL,
-
[20]
Han Liu, Yong-Liang Yang, Sawsan Alhalawani, and Niloy J. Mitra. Constraint-aware interior layout exploration for pre-cast concrete-based buildings.Vis. Comput., 2013. 2
2013
-
[21]
Floorplangan: Vector resi- dential floorplan adversarial generation.Automation in Con- struction, 2022
Ziniu Luo and Weixin Huang. Floorplangan: Vector resi- dential floorplan adversarial generation.Automation in Con- struction, 2022. 2
2022
-
[22]
House-gan: Relational gener- ative adversarial networks for graph-constrained house lay- out generation
Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Furukawa. House-gan: Relational gener- ative adversarial networks for graph-constrained house lay- out generation. InECCV, 2020. 2, 3
2020
-
[23]
House- gan++: Generative adversarial layout refinement network to- wards intelligent computational agent for professional archi- tects
Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa. House- gan++: Generative adversarial layout refinement network to- wards intelligent computational agent for professional archi- tects. InCVPR, 2021. 2, 5, 6, 7
2021
-
[24]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zem- ing Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019. 7
2019
-
[25]
Ex- treme structure from motion for indoor panoramas without visual overlaps
Mohammad Amin Shabani, Weilian Song, Makoto Odamaki, Hirochika Fujiki, and Yasutaka Furukawa. Ex- treme structure from motion for indoor panoramas without visual overlaps. InICCV, 2021. 12
2021
-
[26]
Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising
Mohammad Amin Shabani, Sepidehsadat Hosseini, and Ya- sutaka Furukawa. Housediffusion: Vector floorplan genera- tion via a diffusion model with discrete and continuous de- noising. InCVPR, 2023. 2, 3, 5, 7
2023
-
[27]
Sequence to sequence learning with neural networks.NeurIPS, 2014
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks.NeurIPS, 2014. 3
2014
-
[28]
MAGI-1: Autoregressive Video Generation at Scale
Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, Shuai Han, Tianning Zhang, WQ Zhang, Weifeng Luo, et al. Magi-1: Autoregressive video genera- tion at scale.arXiv:2505.13211, 2025. 3
work page internal anchor Pith review arXiv 2025
-
[29]
Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 2024
Keyu Tian, Yi Jiang, Zehuan Yuan, Bingyue Peng, and Li- wei Wang. Visual autoregressive modeling: Scalable image generation via next-scale prediction.NeurIPS, 2024. 3
2024
-
[30]
Conditional image gen- eration with pixelcnn decoders.NeurIPS, 2016
Aaron Van den Oord, Nal Kalchbrenner, Lasse Espeholt, Oriol Vinyals, Alex Graves, et al. Conditional image gen- eration with pixelcnn decoders.NeurIPS, 2016. 3
2016
-
[31]
Attention is all you need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InNeurIPS, 2017. 3, 4
2017
-
[32]
Continuous 3d per- ception model with persistent state
Qianqian Wang, Yifei Zhang, Aleksander Holynski, Alexei A Efros, and Angjoo Kanazawa. Continuous 3d per- ception model with persistent state. InCVPR, 2025. 3 9
2025
-
[33]
MIQP-based Layout Design for Building Interiors.Com- puter Graphics Forum, 2018
Wenming Wu, Lubin Fan, Ligang Liu, and Peter Wonka. MIQP-based Layout Design for Building Interiors.Com- puter Graphics Forum, 2018. 2
2018
-
[34]
Miqp-based layout design for building interiors.TOG, 2019
Wenming Wu, Lubin Fan, Ligang Liu, and Peter Wonka. Miqp-based layout design for building interiors.TOG, 2019. 2, 5
2019
-
[35]
Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction
Fuyang Zhang, Nelson Nauata, and Yasutaka Furukawa. Conv-mpn: Convolutional message passing neural network for structured outdoor architecture reconstruction. InCVPR,
-
[36]
More Implementation Details Hyper-parameters.We give the additional informa- tion about the hyper-parameters of our model in Table 8
2 10 Unified Vector Floorplan Generation via Markup Representation Supplementary Material A. More Implementation Details Hyper-parameters.We give the additional informa- tion about the hyper-parameters of our model in Table 8. Hyperparameter Value Transformer Dimension512 MLP Dimension 2048 Num Heads 32 Num Layers 24 Temperature 0.6 Top P 0.8 Table 8.Hype...
2048
-
[37]
An interior and front door should have just two ver- tices
-
[38]
Room vertices should be placed outside the previously generated rooms
-
[39]
Interior door vertices should be placed on a edge be- tween two different rooms
-
[40]
Front door vertices should be placed on a edge be- tween a room and the outside region
-
[41]
select the most functional and natu- ral floorplan
A room should have four or more vertices. Table 11.Constrains used in decoding. Pre-processing.We follow the pre-processing code pro- vided by HouseGAN++. Also, since FML requires that ev- ery two rooms supposed to be adjacent must share an edge for a door, we inflated the room polygons so that they have the shared edge. We also re-computed the adjacency ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.