arxiv: 2605.14117 · v1 · submitted 2026-05-13 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards

Luis Lara , Aristides Milios , Zhi Hao Luo , Aditya Sharma , Ge Ya Luo , Christopher Beckham , Florian Golemo , Christopher Pal

Authors on Pith no claims yet

Pith reviewed 2026-05-15 04:58 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords floor plan generationlarge language modelsreinforcement learningverifiable rewardsgenerative designconstraint satisfactionarchitectural AIRLVR

0 comments

The pith

Fine-tuning an LLM then applying reinforcement learning with verifiable rewards produces floor plans that meet both room connectivity and exact numerical constraints on dimensions and areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a text-based system for generating floor plans that must respect both the desired connections between rooms and precise numerical requirements such as room sizes and areas. Earlier generative methods handled connectivity but failed to enforce the numerical specifications. The new approach first fine-tunes a large language model on real plans and then uses reinforcement learning driven by automatically verifiable rewards to penalize invalid or overlapping layouts. The resulting designs score higher on realism, compatibility, and diversity than prior techniques. Across tasks the method delivers at least a 94 percent relative reduction in compatibility error compared with existing approaches.

Core claim

The authors introduce a text-based floor plan generation approach that fine-tunes a large language model on real plans and applies reinforcement learning with verifiable rewards to improve adherence to topological and numerical constraints while discouraging invalid or overlapping outputs, generating floor plans that satisfy user-defined connectivity and numerical constraints and outperforming existing methods on Realism, Compatibility, and Diversity metrics with at least a 94% relative reduction in Compatibility.

What carries the argument

Reinforcement learning with verifiable rewards (RLVR) applied after fine-tuning an LLM, where automatically checkable rewards enforce room connectivity, dimensions, and areas while penalizing overlaps and invalid outputs.

Load-bearing premise

Verifiable rewards based on connectivity and numerical constraints are sufficient to ensure the functional validity of floor plans without missing buildability, safety, or aesthetic requirements that matter in real use.

What would settle it

A side-by-side evaluation in which licensed architects attempt to construct or obtain permits for the generated plans and report any violations of building codes or safety rules not encoded in the reward functions.

Figures

Figures reproduced from arXiv: 2605.14117 by Aditya Sharma, Aristides Milios, Christopher Beckham, Christopher Pal, Florian Golemo, Ge Ya Luo, Luis Lara, Zhi Hao Luo.

**Figure 1.** Figure 1: Inference Process Overview. Given a bubble diagram (input connectivity graph) and a JSON specification of design requirements (e.g., desired room sizes), our model generates a complete floor plan in JSON format. Our key contributions are as follows: 1. We fine-tune an LLM in two stages, first via supervised learning and then via reinforcement learning, to transform constraint inputs into valid structured … view at source ↗

**Figure 2.** Figure 2: Qualitative comparison against HouseDiffusion. We replicate HouseDiffusion’s visualization to enable direct visual comparison. From left to right: dataset sample (reference), input bubble diagram, and layouts generated by our method and HouseDiffusion from the same bubble diagram. Numbers under the outputs report Compatibility (graph edit distance; lower is better). Our method better matches the reference … view at source ↗

**Figure 3.** Figure 3: Illustration of Post-Training Results. Both Stage 1 (supervised fine-tuning, SFT) and Stage 2 (reinforcement learning with verifiable rewards, RLVR) use best-of-10 sampling. After Stage 1, the model generally follows the input bubble diagram but can still produce overlapping polygons. Stage 2 improves Compatibility and reduces Overlap. perfect Room ID. However, as the room count increases, overlaps and c… view at source ↗

**Figure 4.** Figure 4: Example comparison from our Realism feedback exercise, showing a ground truth floor plan and a [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

read the original abstract

An AI system for professional floor plan design must precisely control room dimensions and areas while respecting the desired connectivity between rooms and maintaining functional and aesthetic quality. Existing generative approaches focus primarily on respecting the requested connectivity between rooms, but do not support generating floor plans that respect numerical constraints. We introduce a text-based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to improve adherence to topological and numerical constraints while discouraging invalid or overlapping outputs. Furthermore, we design a set of constraint adherence metrics to systematically measure how generated floor plans align with user-defined constraints. Our model generates floor plans that satisfy user-defined connectivity and numerical constraints and outperforms existing methods on Realism, Compatibility, and Diversity metrics. Across all tasks, our approach achieves at least a 94% relative reduction in Compatibility compared with existing methods. Our results demonstrate that LLMs can effectively handle constraints in this setting, suggesting broader applications for text-based generative modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a text-based floor plan generation system that fine-tunes an LLM on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to enforce topological connectivity and numerical constraints while discouraging invalid or overlapping outputs. It defines constraint adherence metrics and reports that the resulting model outperforms existing methods on Realism, Compatibility, and Diversity, achieving at least a 94% relative reduction in Compatibility errors across tasks.

Significance. If the empirical claims hold under rigorous verification, the work shows that LLMs augmented with RLVR can handle mixed topological and numerical constraints in structured generative tasks, filling a gap left by prior connectivity-focused methods. The verifiable-reward framework and accompanying metrics offer a reusable template for evaluating constraint satisfaction in text-to-structured-output settings. The approach is notable for its explicit handling of numerical constraints (room dimensions/areas) alongside connectivity, which prior generative floor-plan systems largely omit.

major comments (3)

[§4] §4 (RLVR reward formulation): the description of verifiable rewards appears to operate primarily on parsed textual constraints and simple area summations; it is unclear whether the reward explicitly penalizes 2D polygon intersections, non-aligned walls, or negative dimensions after coordinate placement. Without such geometric enforcement, high metric scores could coexist with physically invalid layouts, weakening the functional-validity claim.
[§5] §5 (Experiments): the reported 94% relative Compatibility reduction lacks error bars, details on train/validation/test splits, baseline hyper-parameters, and exact reward implementations, preventing independent verification of the outperformance. The abstract and results tables must include these elements for the central empirical claim to be load-bearing.
[§3.2] §3.2 (constraint adherence metrics): the metrics are defined on parsed text and area sums but do not appear to include topological checks (e.g., planarity of the dual graph or wall-adjacency consistency) in coordinate space; this risks overestimating validity relative to professional buildability criteria.

minor comments (2)

[Figure 2] Figure 2 caption should explicitly state the coordinate system origin and scale used for the generated plans.
[§2] The related-work section omits recent LLM-based layout papers that also incorporate numerical constraints; add citations for completeness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback. We address each major comment point by point below and have prepared revisions to strengthen the manuscript where appropriate.

read point-by-point responses

Referee: [§4] §4 (RLVR reward formulation): the description of verifiable rewards appears to operate primarily on parsed textual constraints and simple area summations; it is unclear whether the reward explicitly penalizes 2D polygon intersections, non-aligned walls, or negative dimensions after coordinate placement. Without such geometric enforcement, high metric scores could coexist with physically invalid layouts, weakening the functional-validity claim.

Authors: We appreciate this observation. The RLVR reward does incorporate explicit geometric penalties: after coordinate placement, we compute polygon intersection areas to penalize overlaps, enforce non-negative dimensions, and apply alignment checks for walls. These components are part of the verifiable reward signal. To address the concern, we will expand §4 with a dedicated subsection detailing the geometric enforcement steps, including the intersection computation formula and alignment heuristics. revision: yes
Referee: [§5] §5 (Experiments): the reported 94% relative Compatibility reduction lacks error bars, details on train/validation/test splits, baseline hyper-parameters, and exact reward implementations, preventing independent verification of the outperformance. The abstract and results tables must include these elements for the central empirical claim to be load-bearing.

Authors: We agree that these details are required for reproducibility. In the revised manuscript we will: (1) add standard error bars to all metrics in Table 2 and the abstract; (2) explicitly state the 70/15/15 train/validation/test split on the RPLAN dataset; (3) report hyper-parameters for all baselines (including learning rates, batch sizes, and RL-specific settings); and (4) provide the exact reward equations and implementation pseudocode in §5 and the appendix. revision: yes
Referee: [§3.2] §3.2 (constraint adherence metrics): the metrics are defined on parsed text and area sums but do not appear to include topological checks (e.g., planarity of the dual graph or wall-adjacency consistency) in coordinate space; this risks overestimating validity relative to professional buildability criteria.

Authors: The current metrics already derive a dual graph from the textual adjacency list and verify connectivity and planarity at the graph level. However, we acknowledge that coordinate-space wall-adjacency consistency checks would provide a stronger link to buildability. We will partially revise §3.2 to add these coordinate-based checks as supplementary metrics and include a brief discussion of their relation to professional criteria. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical comparisons

full rationale

The paper fine-tunes an LLM on real plans then applies RL with explicitly designed verifiable rewards to enforce topological and numerical constraints, followed by custom adherence metrics and direct comparison to existing methods on Realism, Compatibility, and Diversity. No derivation step reduces a claimed prediction or result to its own inputs by construction, no fitted parameter is relabeled as a prediction, and no load-bearing uniqueness theorem or ansatz is imported via self-citation. The 94% relative reduction is a comparative empirical outcome against baselines rather than a tautological re-expression of the reward definition. The approach is self-contained through external benchmarks and verifiable reward functions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are detailed. The approach assumes standard LLM fine-tuning and RL techniques transfer to this domain.

axioms (1)

domain assumption Fine-tuning an LLM on real floor plans produces a useful base model for subsequent constraint-guided generation
Invoked by the two-stage training process described in the abstract.

pith-pipeline@v0.9.0 · 5492 in / 1202 out tokens · 39288 ms · 2026-05-15T04:58:11.327114+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 5 internal anchors

[1]

ArXiv:2506.14702v1

Treasure hunt: Real- time targeting of the long tail using training-time markers.arXiv preprint. ArXiv:2506.14702v1. Theodoros Galanos, Antonios Liapis, and Georgios N Yannakakis

work page arXiv
[2]

Architext: Language-driven generative architecture design.arXiv preprint arXiv:2303.07519. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al- Dahle, Aiesha Letman, Akhil Mathur, Alan Schel- ten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mi- tra, Archie Sravankumar, Ar...

work page arXiv
[3]

The Llama 3 Herd of Models

The Llama 3 herd of models.arXiv preprint arXiv:2407.21783. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter

work page internal anchor Pith review Pith/arXiv arXiv
[4]

LoRA: Low-Rank Adaptation of Large Language Models

Lora: Low-rank adaptation of large language models.Preprint, arXiv:2106.09685. Justin Johnson, Agrim Gupta, and Li Fei-Fei

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Efficient Memory Management for Large Language Model Serving with PagedAttention

Ef- ficient memory management for large language model serving with pagedattention.arXiv preprint arXiv:2309.06180. Sicong Leng, Yang Zhou, Mohammed Haroon Dupty, Wee Sun Lee, Sam Joyce, and Wei Lu

work page internal anchor Pith review Pith/arXiv arXiv
[6]

InProceedings of the Fifth Conference on Com- puter Aided Architectural Design Research in Asia (CAADRIA 2000), pages 441–450, Singapore

A constraint based generative system for floor lay- outs. InProceedings of the Fifth Conference on Com- puter Aided Architectural Design Research in Asia (CAADRIA 2000), pages 441–450, Singapore. Centre for Advanced Studies in Architecture (CASA). Ricardo Lopes, Tim Tutenel, Ruben M. Smelik, Klaas Jan de Kraker, and Rafael Bidarra

work page 2000
[7]

InProceedings of GAME-ON 2010, Le- icester, United Kingdom

A constrained growth method for procedural floor plan generation. InProceedings of GAME-ON 2010, Le- icester, United Kingdom. Ziniu Luo and Weixin Huang

work page 2010
[8]

InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 162–177

House-gan: Relational generative adversarial networks for graph- constrained house layout generation. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 162–177. Springer. Nelson Nauata, Sepidehsadat Hosseini, Kai-Hung Chang, Hang Chu, Chin-Yi Cheng, and Yasutaka Furukawa

work page 2020
[9]

InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901

PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9895–9901. Association for Computational Linguistics. John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov

work page 2021
[10]

Proximal Policy Optimization Algorithms

Prox- imal policy optimization algorithms.Preprint, arXiv:1707.06347. Amin Mohammad Shabani, Sepidehsadat Hosseini, and Yasutaka Furukawa

work page internal anchor Pith review Pith/arXiv arXiv
[11]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Deepseekmath: Pushing the limits of mathemati- cal reasoning in open language models.Preprint, arXiv:2402.03300. Nitant Upasani, Krishnendra Shekhawat, and Garv Sachdeva

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-Hao Qi, and Ligang Liu

Anyhome: Open-vocabulary generation of structured and textured 3d homes.arXiv preprint arXiv:2312.06644. Wenming Wu, Xiao-Ming Fu, Rui Tang, Yuhan Wang, Yu-Hao Qi, and Ligang Liu

work page arXiv
[13]

InProceed- ings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstra- tions), pages 7–12, Brussels, Belgium

TRANX: A transition-based neural abstract syntax parser for semantic parsing and code generation. InProceed- ings of the 2018 Conference on Empirical Methods in Natural Language Processing (System Demonstra- tions), pages 7–12, Brussels, Belgium. Association for Computational Linguistics. A Appendix A.1 JSON Data Schemas for Floor Plan Generation In this ...

work page 2018
[14]

We configure LoRA with rank r= 64 , α= 128 , dropout 0.1, and a learning rate of 1e−4

backbone using 4-bit quantization and adapter-based PEFT (LoRA) (Hu et al., 2021). We configure LoRA with rank r= 64 , α= 128 , dropout 0.1, and a learning rate of 1e−4 . Training is distributed across a 6-node Slurm cluster (each node: 4 × NVIDIA H100 80 GB). We pack the examples into a context window of 6k tokens, use a device batch size of 2, and train...

work page 2021
[15]

bedroom|2

-0.19 11.2±0.2 10.3±0.2 10.4±0.4 9.5±0.1 (Ours) 0.03 9.0±0.0 8.8±0.0 7.8±0.0 7.0±0.0 Table 8: Realism (↑) and Diversity (↓) results (mean±std) across tasks. on a 7-node Slurm cluster with 4 NVIDIA H100 80 GB GPUs per node: 6 nodes are used for dis- tributed optimization, and 1 node hosts a dedicated vLLM server for rollout generation (Kwon et al., 2023). ...

work page 2023
[16]

realis- tic

between the input bubble diagram and the bubble diagram recon- structed from the generated floor plan JSON. A lower score indicates higher consistency with the specified connectivity, with a score of 0 meaning a perfect match. In this sense, Compatibility can be interpreted as the number of connectivity mis- takes in the generated floor plan, making it th...

work page 2017