Recognition: no theorem link
Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
Pith reviewed 2026-05-15 04:58 UTC · model grok-4.3
The pith
Fine-tuning an LLM then applying reinforcement learning with verifiable rewards produces floor plans that meet both room connectivity and exact numerical constraints on dimensions and areas.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce a text-based floor plan generation approach that fine-tunes a large language model on real plans and applies reinforcement learning with verifiable rewards to improve adherence to topological and numerical constraints while discouraging invalid or overlapping outputs, generating floor plans that satisfy user-defined connectivity and numerical constraints and outperforming existing methods on Realism, Compatibility, and Diversity metrics with at least a 94% relative reduction in Compatibility.
What carries the argument
Reinforcement learning with verifiable rewards (RLVR) applied after fine-tuning an LLM, where automatically checkable rewards enforce room connectivity, dimensions, and areas while penalizing overlaps and invalid outputs.
Load-bearing premise
Verifiable rewards based on connectivity and numerical constraints are sufficient to ensure the functional validity of floor plans without missing buildability, safety, or aesthetic requirements that matter in real use.
What would settle it
A side-by-side evaluation in which licensed architects attempt to construct or obtain permits for the generated plans and report any violations of building codes or safety rules not encoded in the reward functions.
Figures
read the original abstract
An AI system for professional floor plan design must precisely control room dimensions and areas while respecting the desired connectivity between rooms and maintaining functional and aesthetic quality. Existing generative approaches focus primarily on respecting the requested connectivity between rooms, but do not support generating floor plans that respect numerical constraints. We introduce a text-based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to improve adherence to topological and numerical constraints while discouraging invalid or overlapping outputs. Furthermore, we design a set of constraint adherence metrics to systematically measure how generated floor plans align with user-defined constraints. Our model generates floor plans that satisfy user-defined connectivity and numerical constraints and outperforms existing methods on Realism, Compatibility, and Diversity metrics. Across all tasks, our approach achieves at least a 94% relative reduction in Compatibility compared with existing methods. Our results demonstrate that LLMs can effectively handle constraints in this setting, suggesting broader applications for text-based generative modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a text-based floor plan generation system that fine-tunes an LLM on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to enforce topological connectivity and numerical constraints while discouraging invalid or overlapping outputs. It defines constraint adherence metrics and reports that the resulting model outperforms existing methods on Realism, Compatibility, and Diversity, achieving at least a 94% relative reduction in Compatibility errors across tasks.
Significance. If the empirical claims hold under rigorous verification, the work shows that LLMs augmented with RLVR can handle mixed topological and numerical constraints in structured generative tasks, filling a gap left by prior connectivity-focused methods. The verifiable-reward framework and accompanying metrics offer a reusable template for evaluating constraint satisfaction in text-to-structured-output settings. The approach is notable for its explicit handling of numerical constraints (room dimensions/areas) alongside connectivity, which prior generative floor-plan systems largely omit.
major comments (3)
- [§4] §4 (RLVR reward formulation): the description of verifiable rewards appears to operate primarily on parsed textual constraints and simple area summations; it is unclear whether the reward explicitly penalizes 2D polygon intersections, non-aligned walls, or negative dimensions after coordinate placement. Without such geometric enforcement, high metric scores could coexist with physically invalid layouts, weakening the functional-validity claim.
- [§5] §5 (Experiments): the reported 94% relative Compatibility reduction lacks error bars, details on train/validation/test splits, baseline hyper-parameters, and exact reward implementations, preventing independent verification of the outperformance. The abstract and results tables must include these elements for the central empirical claim to be load-bearing.
- [§3.2] §3.2 (constraint adherence metrics): the metrics are defined on parsed text and area sums but do not appear to include topological checks (e.g., planarity of the dual graph or wall-adjacency consistency) in coordinate space; this risks overestimating validity relative to professional buildability criteria.
minor comments (2)
- [Figure 2] Figure 2 caption should explicitly state the coordinate system origin and scale used for the generated plans.
- [§2] The related-work section omits recent LLM-based layout papers that also incorporate numerical constraints; add citations for completeness.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive feedback. We address each major comment point by point below and have prepared revisions to strengthen the manuscript where appropriate.
read point-by-point responses
-
Referee: [§4] §4 (RLVR reward formulation): the description of verifiable rewards appears to operate primarily on parsed textual constraints and simple area summations; it is unclear whether the reward explicitly penalizes 2D polygon intersections, non-aligned walls, or negative dimensions after coordinate placement. Without such geometric enforcement, high metric scores could coexist with physically invalid layouts, weakening the functional-validity claim.
Authors: We appreciate this observation. The RLVR reward does incorporate explicit geometric penalties: after coordinate placement, we compute polygon intersection areas to penalize overlaps, enforce non-negative dimensions, and apply alignment checks for walls. These components are part of the verifiable reward signal. To address the concern, we will expand §4 with a dedicated subsection detailing the geometric enforcement steps, including the intersection computation formula and alignment heuristics. revision: yes
-
Referee: [§5] §5 (Experiments): the reported 94% relative Compatibility reduction lacks error bars, details on train/validation/test splits, baseline hyper-parameters, and exact reward implementations, preventing independent verification of the outperformance. The abstract and results tables must include these elements for the central empirical claim to be load-bearing.
Authors: We agree that these details are required for reproducibility. In the revised manuscript we will: (1) add standard error bars to all metrics in Table 2 and the abstract; (2) explicitly state the 70/15/15 train/validation/test split on the RPLAN dataset; (3) report hyper-parameters for all baselines (including learning rates, batch sizes, and RL-specific settings); and (4) provide the exact reward equations and implementation pseudocode in §5 and the appendix. revision: yes
-
Referee: [§3.2] §3.2 (constraint adherence metrics): the metrics are defined on parsed text and area sums but do not appear to include topological checks (e.g., planarity of the dual graph or wall-adjacency consistency) in coordinate space; this risks overestimating validity relative to professional buildability criteria.
Authors: The current metrics already derive a dual graph from the textual adjacency list and verify connectivity and planarity at the graph level. However, we acknowledge that coordinate-space wall-adjacency consistency checks would provide a stronger link to buildability. We will partially revise §3.2 to add these coordinate-based checks as supplementary metrics and include a brief discussion of their relation to professional criteria. revision: partial
Circularity Check
No significant circularity; claims rest on empirical comparisons
full rationale
The paper fine-tunes an LLM on real plans then applies RL with explicitly designed verifiable rewards to enforce topological and numerical constraints, followed by custom adherence metrics and direct comparison to existing methods on Realism, Compatibility, and Diversity. No derivation step reduces a claimed prediction or result to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The 94% relative reduction is a comparative empirical outcome against baselines rather than a tautological re-expression of the reward definition. The approach is self-contained through external benchmarks and verifiable reward functions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fine-tuning an LLM on real floor plans produces a useful base model for subsequent constraint-guided generation
Reference graph
Works this paper leans on
-
[1]
Treasure hunt: Real- time targeting of the long tail using training-time markers.arXiv preprint. ArXiv:2506.14702v1. Theodoros Galanos, Antonios Liapis, and Georgios N Yannakakis
-
[2]
Architext: Language-driven generative architecture design.arXiv preprint arXiv:2303.07519. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schel- ten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mi- tra, Archie Sravankumar, Ar...
-
[3]
The Llama 3 herd of models.arXiv preprint arXiv:2407.21783. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
LoRA: Low-Rank Adaptation of Large Language Models
Lora: Low-rank adaptation of large language models.Preprint, arXiv:2106.09685. Justin Johnson, Agrim Gupta, and Li Fei-Fei
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Efficient Memory Management for Large Language Model Serving with PagedAttention
Ef- ficient memory management for large language model serving with pagedattention.arXiv preprint arXiv:2309.06180. Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Joyce, and Wei Lu
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
A constraint based generative system for floor lay- outs. InProceedings of the Fifth Conference on Com- puter Aided Architectural Design Research in Asia (CAADRIA 2000), pages 441–450, Singapore. Centre for Advanced Studies in Architecture (CASA). Ricardo Lopes, Tim Tutenel, Ruben M. Smelik, Klaas Jan de Kraker, and Rafael Bidarra
work page 2000
-
[7]
InProceedings of GAME-ON 2010, Le- icester, United Kingdom
A constrained growth method for procedural floor plan generation. InProceedings of GAME-ON 2010, Le- icester, United Kingdom. Ziniu Luo and Weixin Huang
work page 2010
-
[8]
House-gan: Relational generative adversarial networks for graph- constrained house layout generation. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 162–177. Springer. Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa
work page 2020
-
[9]
PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901. Association for Computational Linguistics. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov
work page 2021
-
[10]
Proximal Policy Optimization Algorithms
Prox- imal policy optimization algorithms.Preprint, arXiv:1707.06347. Amin Mohammad Shabani, Sepidehsadat Hosseini, and Yasutaka Furukawa
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Deepseekmath: Pushing the limits of mathemati- cal reasoning in open language models.Preprint, arXiv:2402.03300. Nitant Upasani, Krishnendra Shekhawat, and Garv Sachdeva
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-Hao Qi, and Ligang Liu
Anyhome: Open-vocabulary generation of structured and textured 3d homes.arXiv preprint arXiv:2312.06644. Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-Hao Qi, and Ligang Liu
-
[13]
TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. InProceed- ings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstra- tions), pages 7–12, Brussels, Belgium. Association for Computational Linguistics. A Appendix A.1 JSON Data Schemas for Floor Plan Generation In this ...
work page 2018
-
[14]
We configure LoRA with rank r= 64 , α= 128 , dropout 0.1, and a learning rate of 1e−4
backbone using 4-bit quantization and adapter-based PEFT (LoRA) (Hu et al., 2021). We configure LoRA with rank r= 64 , α= 128 , dropout 0.1, and a learning rate of 1e−4 . Training is distributed across a 6-node Slurm cluster (each node: 4 × NVIDIA H100 80 GB). We pack the examples into a context window of 6k tokens, use a device batch size of 2, and train...
work page 2021
-
[15]
-0.19 11.2±0.2 10.3±0.2 10.4±0.4 9.5±0.1 (Ours) 0.03 9.0±0.0 8.8±0.0 7.8±0.0 7.0±0.0 Table 8: Realism (↑) and Diversity (↓) results (mean±std) across tasks. on a 7-node Slurm cluster with 4 NVIDIA H100 80 GB GPUs per node: 6 nodes are used for dis- tributed optimization, and 1 node hosts a dedicated vLLM server for rollout generation (Kwon et al., 2023). ...
work page 2023
-
[16]
between the input bubble diagram and the bubble diagram recon- structed from the generated floor plan JSON. A lower score indicates higher consistency with the specified connectivity, with a score of 0 meaning a perfect match. In this sense, Compatibility can be interpreted as the number of connectivity mis- takes in the generated floor plan, making it th...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.