Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans

Aleksandar Cvejic; Fedor Rodionov; John Femiani; Michael Birsak; Peter Wonka

arxiv: 2606.10953 · v1 · pith:HDCOYNMVnew · submitted 2026-06-09 · 💻 cs.AI · cs.CV

Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans

Fedor Rodionov , Aleksandar Cvejic , Michael Birsak , John Femiani , Peter Wonka This is my paper

Pith reviewed 2026-06-27 12:55 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords automatic floor plan furnishingvision-language modelarchitectural floor plansdomain-specific languageprocedural reasoning tracespreference optimizationinterior layout generationAntPlan-270 dataset

0 comments

The pith

A fine-tuned vision-language model with a coordinate DSL and procedural reasoning traces generates editable, constraint-respecting furniture layouts for architectural floor plans.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies the absence of large annotated datasets as the main barrier to automatic furniture arrangement and responds by releasing AntPlan-270, a set of 270 professionally annotated residential floor plans. It then presents Architect-Ant, which encodes layouts in a compact coordinate-based DSL and trains a vision-language model on procedurally generated reasoning traces that encode wall alignment, clearance, circulation, and fixture rules. Preference optimization is applied to candidate placements to raise layout quality, after which the symbolic output can be edited or rasterized to condition an image renderer. The experiments indicate that the resulting layouts satisfy geometric and functional requirements, opening a route to furnish much larger structure-only plan collections.

Core claim

Architect-Ant is an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language that encodes object categories and placements relative to room geometry. Procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific inventories are generated to supervise fine-tuning, after which preference optimization further refines placement quality. The DSL output can be rasterized into semantic masks to condition a Flux-based LoRA renderer while remaining directly editable.

What carries the argument

The compact coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to room geometry, together with procedural reasoning traces that encode architectural constraints for model supervision.

If this is right

The symbolic DSL layouts remain directly editable after generation, supporting iterative design workflows.
Rasterization of the DSL into semantic masks allows conditioning of a separate image model to produce realistic blueprint-style furnished plans.
The approach provides a scalable route to annotate and furnish large existing collections of structure-only floor plans.
Preference optimization over candidate placements improves functional plausibility beyond the initial supervised outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same trace-generation and preference steps could be adapted to non-residential building types if new room-category inventories are supplied.
Because the DSL is coordinate-based and relative to walls, it could serve as an intermediate representation for exporting layouts into CAD or 3D modeling tools.
The separation between the symbolic layout stage and the image renderer stage allows independent improvement of either component without retraining the other.

Load-bearing premise

The procedurally generated reasoning traces accurately encode every relevant architectural constraint without introducing biases absent from real professional designs.

What would settle it

A side-by-side evaluation of Architect-Ant outputs against a held-out collection of human-designed professional floor plans, measuring the rate of violations in wall alignment, circulation clearance, and fixture compatibility.

Figures

Figures reproduced from arXiv: 2606.10953 by Aleksandar Cvejic, Fedor Rodionov, John Femiani, Michael Birsak, Peter Wonka.

**Figure 1.** Figure 1: Architect-Ant turns empty structured floor plans (left) into multiple plausible furnished, blueprint-style renderings (right, [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Build-time pipeline (data preparation and training). Raw floor plans are processed by RT-DETR-X into per-room structural primitives and furniture [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Run-time pipeline (inference and rendering). Using the per-room [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Rule-score examples for variants of the same bedroom layout. Scores start from a base value of [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Representative per-room qualitative comparison in the schematic DSL view. Each row corresponds to a room type: bedroom, kitchen, bathroom, and [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Examples of six original architectural floor plan drawings from the AntPlan-270 dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Representative full-floor-plan qualitative comparison on CubiCasa5K. Each example shows the input floor plan, the extracted structural input, and [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Variations within Architect-Ant produced by our final model. Four examples of different floor plans, each showing the structural input, the generated [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Bedroom examples comparing strict-pair and synthetic-pair DPO on CubiCasa rooms. The examples illustrate that higher rule score does not always [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: Representative kitchen judge-calibration cases. The examples illustrate both a likely judge failure on dense but valid kitchen pairings and a correct [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

read the original abstract

Furnished floor plans are fundamental to real estate visualization, interior design, and architectural workflows. However, progress in automatic furniture arrangement has been limited by the lack of real, professionally designed floor-plan datasets with object-level furniture annotations. To address this gap, we introduce AntPlan-270, a curated dataset of 270 architectural floor plans with per-room furniture bounding box annotations across ten residential room categories. Building on this dataset, we present Architect-Ant, an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to the room geometry. To improve spatial reasoning, we generate procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific furniture inventories, and use them to supervise fine-tuning of the model. We then apply preference optimization over candidate object placements to further refine layout quality. The generated DSL can be rasterized into semantic masks and used to condition a Flux-based LoRA renderer, producing realistic blueprint-style furnished floor-plan images while preserving the editable symbolic layout. Experiments on layout furnishing show that Architect-Ant produces geometrically valid and functionally plausible layouts, and suggest a scalable path for furnishing larger structure-only floor-plan datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New dataset of 270 annotated plans plus a DSL-based VLM pipeline for editable furnishing, but the evaluation section gives no metrics or external checks.

read the letter

The paper's main point is a new dataset, AntPlan-270, with per-room furniture bounding boxes on 270 real architectural plans, plus the Architect-Ant system that fine-tunes a vision-language model to output layouts in a compact coordinate DSL.

They represent furniture placements relative to walls and openings, generate procedural reasoning traces that list constraints like clearances and circulation, use those traces for supervision, run preference optimization on candidates, and finally rasterize the DSL into images via a Flux LoRA. The DSL keeps the output editable and the whole pipeline is end-to-end.

The dataset itself is useful. Annotated floor plans with object-level furniture are still rare, so releasing 270 curated examples across ten room types is a concrete step that others can build on. The DSL choice is practical for editing and avoids pixel-level messiness. Generating synthetic traces to teach spatial rules is a reasonable way to create training signal without needing thousands of human annotations.

The soft spot is the lack of evidence on whether the outputs actually work. The abstract states that the layouts are geometrically valid and functionally plausible, yet supplies no quantitative scores, no baseline comparisons, and no description of how validity was tested. If the procedural traces omit real constraints such as code minima or ergonomic preferences that are not purely geometric, the model can satisfy the synthetic objective while producing layouts that professionals would reject. There is no mention of expert review or held-out real designs to check this.

This is for groups working on interior visualization tools or automatic furnishing pipelines. Readers who need a starting dataset or an editable symbolic representation will find something usable here.

It should go to peer review. The data release and the DSL-plus-traces approach are concrete enough to merit referee time, even if the current results section needs substantial strengthening on measurement and validation.

Referee Report

2 major / 2 minor

Summary. The paper introduces AntPlan-270, a dataset of 270 annotated residential floor plans, and Architect-Ant, a framework that represents layouts via a coordinate-based DSL, generates procedural reasoning traces encoding constraints such as wall alignment and circulation to supervise VLM fine-tuning, applies preference optimization, and renders outputs with a Flux LoRA. It claims that the resulting layouts are geometrically valid and functionally plausible and that the approach scales to larger structure-only datasets.

Significance. If the central claims hold, the work supplies a missing annotated dataset and a practical pipeline for automatic furnishing that preserves editability through the DSL while producing renderable outputs. The procedural-trace supervision and preference-optimization steps are concrete technical contributions that could be adopted by others working on spatial reasoning for floor plans.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: the headline claim that Architect-Ant 'produces geometrically valid and functionally plausible layouts' is unsupported by any quantitative metrics, baseline comparisons, or description of how validity/plausibility were measured or scored. Without these, it is impossible to determine whether the outputs improve on prior methods or merely satisfy the authors' own synthetic objective.
[Method (procedural reasoning traces)] Method section on procedural reasoning traces: the central assumption that the generated traces faithfully encode all relevant architectural constraints (wall alignment, circulation minima, fixture compatibility, room-specific inventories) is not externally validated against real professional designs, code-compliance checkers, or held-out expert layouts. This leaves the functional-plausibility claim dependent on an untested mapping from synthetic supervision to real-world acceptability.

minor comments (2)

[Dataset] The paper should clarify the exact size and split of AntPlan-270 (training/validation/test) and whether any overlap exists with the procedural-trace generation process.
[DSL definition] Notation for the DSL should be formalized with a grammar or BNF in an appendix so that reproducibility of the coordinate-based representation is unambiguous.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on evaluation and validation. We address each major comment below and will revise the manuscript accordingly where feasible.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the headline claim that Architect-Ant 'produces geometrically valid and functionally plausible layouts' is unsupported by any quantitative metrics, baseline comparisons, or description of how validity/plausibility were measured or scored. Without these, it is impossible to determine whether the outputs improve on prior methods or merely satisfy the authors' own synthetic objective.

Authors: We agree the current manuscript does not report quantitative metrics, baseline comparisons, or explicit scoring procedures for geometric validity and functional plausibility. The experiments section focuses on qualitative demonstration of the DSL outputs and rendering pipeline. In revision we will add a dedicated evaluation subsection with concrete metrics (e.g., wall-alignment error, minimum circulation clearance, object-overlap ratios) computed on held-out AntPlan-270 rooms, plus comparison against a simple rule-based baseline. This will directly support or qualify the headline claim. revision: yes
Referee: [Method (procedural reasoning traces)] Method section on procedural reasoning traces: the central assumption that the generated traces faithfully encode all relevant architectural constraints (wall alignment, circulation minima, fixture compatibility, room-specific inventories) is not externally validated against real professional designs, code-compliance checkers, or held-out expert layouts. This leaves the functional-plausibility claim dependent on an untested mapping from synthetic supervision to real-world acceptability.

Authors: The traces are procedurally derived from the AntPlan-270 annotations and standard architectural heuristics (wall proximity, clearance rules, room-type inventories). We will expand the method section to state these sources explicitly and add a limitations paragraph acknowledging the absence of external validation against professional code checkers or expert layouts. Full external validation would require new data collection and is noted as future work rather than a claim of the present study. revision: partial

Circularity Check

0 steps flagged

No circularity; new dataset and experimental evaluation are self-contained.

full rationale

The paper introduces AntPlan-270 as a new curated dataset and generates procedural reasoning traces to supervise VLM fine-tuning, followed by preference optimization and rendering. The central claim of geometrically valid and functionally plausible layouts rests on experiments performed on this newly introduced dataset rather than on any fitted parameter renamed as a prediction or on a self-citation chain. No load-bearing step reduces by construction to the authors' prior outputs or definitions; the derivation chain is therefore independent of the patterns that would trigger a positive circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that a vision-language model can be reliably supervised by procedurally generated reasoning traces that capture domain constraints; the DSL representation and preference optimization step are additional modeling choices whose correctness is not independently verified in the provided text.

axioms (1)

domain assumption Procedural reasoning traces accurately encode architectural constraints such as wall alignment, door clearance, and room-specific furniture inventories.
Invoked in the abstract as the supervision signal for fine-tuning.

invented entities (1)

Coordinate-based domain-specific language (DSL) for furniture layouts no independent evidence
purpose: Compact symbolic representation of object categories and placements relative to room geometry
New representation introduced to make layouts editable and rasterizable; no independent evidence of prior use in the cited literature.

pith-pipeline@v0.9.1-grok · 5773 in / 1323 out tokens · 17998 ms · 2026-06-27T12:55:00.945573+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 1 linked inside Pith

[1]

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation.Ad- vances in Neural Information Processing Systems35 (2022), 5982–5994. https: //api.semanticscholar.org/CorpusID:249642405 Shihan Dou, Yan Liu, Haoxiang Jia, Enyu Zhou, Limao Xiong, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, et al. 2024. StepCoder: Improving Code Genera...

2022
[2]

https://api.semanticscholar.org/CorpusID:269043104 Chenguo Lin and Yadong Mu. 2024. InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=LtuRgL03pI Yuanqing Liu, Ziming Yang, Yulong Li, and Yue Yang. 2026. FloorplanVLM: A V...

arXiv 2024
[3]

arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer. arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C. Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun

arXiv
[4]

https://api.semanticscholar.org/CorpusID:53246134 Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Fu- rukawa

Interactive furniture layout using interior design guidelines.ACM SIGGRAPH 2011 papers(2011). https://api.semanticscholar.org/CorpusID:53246134 Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Fu- rukawa. 2020. House-GAN: Relational Generative Adversarial Networks for Graph- constrained House Layout Generation. InEuropean Conference o...

Pith/arXiv arXiv 2011
[5]

Samples” denotes the number of room samples that retain at least two whitelisted objects. “Classes

https://api.semanticscholar.org/CorpusID:266844416 Martin Weyssow, Aton Kamanda, Xin Zhou, and Houari Sahraoui. 2026. CodeUltraFeed- back: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences.ACM Transactions on Software Engineering and Methodology35, 3 (2026), 1–36. https://api.semanticscholar.org/CorpusID:268385144 Wenming ...

arXiv 2026

[1] [1]

ProcTHOR: Large-Scale Embodied AI Using Procedural Generation.Ad- vances in Neural Information Processing Systems35 (2022), 5982–5994. https: //api.semanticscholar.org/CorpusID:249642405 Shihan Dou, Yan Liu, Haoxiang Jia, Enyu Zhou, Limao Xiong, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, et al. 2024. StepCoder: Improving Code Genera...

2022

[2] [2]

https://api.semanticscholar.org/CorpusID:269043104 Chenguo Lin and Yadong Mu. 2024. InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=LtuRgL03pI Yuanqing Liu, Ziming Yang, Yulong Li, and Yue Yang. 2026. FloorplanVLM: A V...

arXiv 2024

[3] [3]

arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C

RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer. arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C. Merrell, Eric Schkufza, Zeyang Li, Maneesh Agrawala, and Vladlen Koltun

arXiv

[4] [4]

https://api.semanticscholar.org/CorpusID:53246134 Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Fu- rukawa

Interactive furniture layout using interior design guidelines.ACM SIGGRAPH 2011 papers(2011). https://api.semanticscholar.org/CorpusID:53246134 Nelson Nauata, Kai-Hung Chang, Chin-Yi Cheng, Greg Mori, and Yasutaka Fu- rukawa. 2020. House-GAN: Relational Generative Adversarial Networks for Graph- constrained House Layout Generation. InEuropean Conference o...

Pith/arXiv arXiv 2011

[5] [5]

Samples” denotes the number of room samples that retain at least two whitelisted objects. “Classes

https://api.semanticscholar.org/CorpusID:266844416 Martin Weyssow, Aton Kamanda, Xin Zhou, and Houari Sahraoui. 2026. CodeUltraFeed- back: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences.ACM Transactions on Software Engineering and Methodology35, 3 (2026), 1–36. https://api.semanticscholar.org/CorpusID:268385144 Wenming ...

arXiv 2026