arxiv: 2605.02537 · v1 · submitted 2026-05-04 · 💻 cs.RO · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation

Meisheng Zhang , Shizhao Sun , Yang Zhao , Ziyuan Liu , Zhijun Gao , Jiang Bian

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:25 UTC · model grok-4.3

classification 💻 cs.RO cs.AI

keywords indoor scene generationzone graphspatial semantics3D synthesisreinforcement learningscene orchestrationtopological constraints

0 comments

The pith

ZoneMaestro converts high-level semantic intent into functional zones and topological constraints to generate coherent intricate indoor scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Autonomous 3D indoor scene synthesis struggles in non-convex rooms because data-driven methods lack topological planning and iterative agents fragment semantics or violate geometry. ZoneMaestro shifts the approach to zone-graph orchestration, translating user intent directly into a graph of functional zones and constraints that adapt to diverse room shapes. The method pairs this with a new annotated dataset and an alternating alignment process using Zone-Aware Group Relative Policy Optimization to balance semantic detail against geometric validity without external simulators. Experiments show it improves structural coherence and intent adherence over prior baselines on complex layouts.

Core claim

By internalizing zone-based logic, ZoneMaestro translates high-level semantic intent into functional zones and topological constraints via Zone-Graph Orchestration. This is supported by the Zone-Scene-10K dataset with explicit annotations and an Alternating Alignment Strategy that alternates reasoning internalization with Z-GRPO to reconcile semantic richness and geometric validity without physics engines or post-hoc fixes.

What carries the argument

The Zone-Graph, a representation of functional zones connected by topological constraints that guides the synthesis process from intent to layout.

If this is right

Generation succeeds in non-convex rooms with tight spatial relations where prior methods fragment or collide.
Semantic adherence and structural coherence both improve without needing external physics validation.
A new evaluation task of Intricate Spatial Orchestration is defined along with the SCALE benchmark for irregular scenarios.
The framework supports adaptation to varied architectural forms through internalized zone logic.
No reliance on post-processing fixes is required to achieve valid outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The zone abstraction could extend to other spatial tasks such as robot path planning in cluttered environments.
It may reduce the need for full simulation loops in generative models by encoding constraints at the zone level.
Human designers could provide intent at the zone level rather than object-by-object for more controllable outputs.

Load-bearing premise

High-level semantic intent can be reliably converted into functional zones and topological constraints through zone-graph logic while preserving geometric validity.

What would settle it

A scene generation test on an irregular non-convex room with dense overlapping functional requirements where the output layout violates physical placement rules or ignores stated user intent.

Figures

Figures reproduced from arXiv: 2605.02537 by Jiang Bian, Meisheng Zhang, Shizhao Sun, Yang Zhao, Zhijun Gao, Ziyuan Liu.

**Figure 1.** Figure 1: Overview of ZoneMaestro. Left: Existing methods suffer from geometric inflexibility and spatial fragmentation. Middle: Our Zone-Graph framework decomposes the task into compositional reasoning and spatial orchestration. Right: ZoneMaestro achieves superior coherence and density on the SCALE benchmark. Abstract Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatia… view at source ↗

**Figure 2.** Figure 2: The Zone-Graph Paradigm. Our framework decomposes complex scene generation into four stages: (a) Zone Inventory, (b) Intra-Zone Spatial Graph, (c) Global Topology, and (d) Architecture Derivation. Synthesis of Multi-Granular Design Intents. To reflect real user variability, we synthesize instructions at three granularities: Coarse descriptions capture overall atmosphere and intent, Medium prompts specify … view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons. Rows 1–2: Zone-Scene-10K Test Set. Rows 3–5: SCALE Benchmark featuring complex nonconvex geometries and dense asset arrangements. Highlighted instruction phrases often correspond to regions where layouts fail. Agentic frameworks that fail to manage high-density intersections are marked by ✗. ZoneMaestro follows fine-grained constraints while maintaining valid physical arrangements… view at source ↗

**Figure 4.** Figure 4: The SCALE Benchmark Construction Pipeline. We employ a parametric topology sampler to generate diverse non-convex boundaries, followed by a persona-driven inverse semantics engine to synthesize multi-granular instructions. B.3. Dataset Statistics C. SCALE Benchmark Construction Details This appendix provides comprehensive details of the SCALE Benchmark construction, including all prompts used in image gene… view at source ↗

**Figure 5.** Figure 5: Additional qualitative comparisons with baselines on the Zone-Scene-10K Test Set (not shown in view at source ↗

**Figure 6.** Figure 6: Additional qualitative comparisons with baselines on the SCALE Benchmark (not shown in view at source ↗

read the original abstract

Autonomous 3D indoor scene synthesis breaks down in non-convex rooms with tightly coupled spatial constraints. Data-driven generators lack topological priors for long-horizon planning, while iterative agents fragment semantics and become geometrically brittle. We present ZoneMaestro, a unified framework that shifts the paradigm from object-centric synthesis to Zone-Graph Orchestration. By internalizing a novel zone-based logic, ZoneMaestro translates high-level semantic intent into functional zones and topological constraints, enabling robust adaptation to diverse architectural forms. To support this, we construct Zone-Scene-10K, a large-scale dataset enriched with explicit Zone-Graph annotations. We further introduce an Alternating Alignment Strategy that cycles between reasoning internalization and Zone-Aware Group Relative Policy Optimization (Z-GRPO), effectively reconciling the tension between semantic richness and geometric validity without relying on external physics engines. To rigorously evaluate spatial intelligence beyond convex primitives, we formally define the task of Intricate Spatial Orchestration and release SCALE, a stress-test benchmark for irregular indoor scenarios with complex, dense spatial relations. Extensive experiments demonstrate that ZoneMaestro resolves the density-safety dichotomy, significantly outperforming state-of-the-art baselines in both structural coherence and intent adherence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ZoneMaestro's zone-graph shift targets a real gap in handling non-convex indoor scenes, but the outperformance claims rest on details about constraint construction and optimization that the abstract does not show.

read the letter

The paper's core move is to replace object-centric placement with a zone-graph that encodes functional areas and topological relations upfront. This directly tackles the breakdown in non-convex rooms where prior generators either ignore long-range constraints or fragment semantics during iteration. That framing is useful and matches known pain points in robotics navigation and virtual scene building. The new pieces are the Zone-Scene-10K dataset carrying explicit zone annotations, the Alternating Alignment Strategy that interleaves reasoning with Z-GRPO, the SCALE benchmark for dense irregular layouts, and the formal task definition of Intricate Spatial Orchestration. These are distinct from the object-centric baselines cited and give the work something concrete to stand on. The approach also tries to reconcile semantic richness with geometric validity without external physics engines, which is a practical direction worth testing. The soft spot is that the abstract supplies no equations for zone-graph construction, overlap resolution, or the Z-GRPO objective, and no ablations or error breakdowns appear in the provided description. Without those, it is hard to tell whether the reported gains in structural coherence and intent adherence come from the paradigm itself or from careful curation of the training scenes and implicit validity checks. The central assumption—that high-level intent maps reliably into non-conflicting topological constraints—needs explicit verification on failure cases in non-convex rooms. This work is for researchers in 3D scene synthesis and spatial planning who already know the limits of current generators. A reader looking for new datasets and benchmarks would find value even before the method is fully stress-tested. It deserves a serious referee because the problem is well-posed and the proposed paradigm differs from prior lines of work. I would send it to review with a request for the missing formal definitions and targeted ablations on the alternating strategy.

Referee Report

1 major / 2 minor

Summary. The manuscript presents ZoneMaestro, a framework for 3D indoor scene synthesis that replaces object-centric generation with Zone-Graph Orchestration. High-level semantic intent is translated into functional zones and topological constraints via a novel zone-based logic; this is supported by the new Zone-Scene-10K dataset containing explicit Zone-Graph annotations. An Alternating Alignment Strategy alternates between reasoning internalization and Zone-Aware Group Relative Policy Optimization (Z-GRPO) to balance semantic richness against geometric validity without external physics engines. The authors define the Intricate Spatial Orchestration task and release the SCALE benchmark for irregular, dense indoor scenarios. Experiments claim that ZoneMaestro resolves the density-safety dichotomy and significantly outperforms state-of-the-art baselines on structural coherence and intent adherence.

Significance. If the central claims hold, the work would advance autonomous 3D scene synthesis by supplying an explicit topological prior that handles non-convex rooms and dense spatial relations. The release of Zone-Scene-10K and the SCALE benchmark constitutes a concrete contribution that could enable more rigorous future evaluation beyond convex primitives. The absence of reliance on post-hoc physics or fixes is a notable design choice if empirically validated.

major comments (1)

The central claim that the zone-graph logic reliably encodes semantic intent into valid topological constraints (without hidden assumptions or post-hoc fixes) is load-bearing for the reported outperformance on SCALE. The abstract provides no formal definition of zone-graph construction, overlap resolution, or the precise Z-GRPO objective; if non-convex boundaries or zone overlaps introduce unhandled conflicts, the gains in structural coherence could arise from dataset curation rather than the paradigm itself.

minor comments (2)

The abstract introduces the phrase 'density-safety dichotomy' without a concise definition or citation; a one-sentence clarification would improve accessibility for readers outside the immediate sub-area.
The description of the Alternating Alignment Strategy would benefit from an explicit high-level pseudocode or diagram showing the cycle between reasoning internalization and Z-GRPO, even if full algorithmic details appear later.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on the clarity of our formalisms. We address the concern point-by-point below and will revise the manuscript to strengthen the presentation of definitions and robustness arguments.

read point-by-point responses

Referee: The central claim that the zone-graph logic reliably encodes semantic intent into valid topological constraints (without hidden assumptions or post-hoc fixes) is load-bearing for the reported outperformance on SCALE. The abstract provides no formal definition of zone-graph construction, overlap resolution, or the precise Z-GRPO objective; if non-convex boundaries or zone overlaps introduce unhandled conflicts, the gains in structural coherence could arise from dataset curation rather than the paradigm itself.

Authors: We agree that the abstract is concise and omits explicit formal definitions. The full manuscript details zone-graph construction in Section 3 as a mapping from semantic intent to a graph where nodes are functional zones (with attributes for geometry and semantics) and edges encode topological relations (adjacency, containment, separation); non-convex rooms are handled by recursive decomposition into convex sub-zones with explicit boundary constraints. Overlap resolution occurs via a constraint propagation step in the zone-based logic that enforces non-overlap by adjusting zone boundaries during internalization, without external physics. The Z-GRPO objective is defined in Section 4.2 as a policy optimization maximizing a zone-aware reward combining semantic adherence (intent matching score) and geometric validity (collision-free and boundary-respecting metrics), using group-relative baselines. We acknowledge the value of a more formal presentation and will add a new subsection (3.1) with mathematical definitions of the zone-graph, overlap resolution rules (including pseudocode for conflict detection), and the Z-GRPO loss in the revision. On the source of gains, SCALE explicitly includes irregular non-convex and dense scenarios; ablations isolating the zone-graph and alternating strategy (vs. baselines on identical data) show the paradigm's contribution to coherence, indicating it is not solely from curation. revision: yes

Circularity Check

0 steps flagged

No circularity detected; framework, dataset, and strategy are independently constructed and evaluated.

full rationale

The provided abstract and description introduce ZoneMaestro as a new paradigm with Zone-Graph orchestration, a custom Zone-Scene-10K dataset with annotations, Alternating Alignment Strategy, Z-GRPO optimization, and SCALE benchmark. No equations, derivations, or load-bearing steps are described that reduce by construction to fitted inputs, self-definitions, or self-citations. Claims of outperformance rest on new components and external baselines rather than internal loops or renamed priors. The derivation chain is self-contained against the stated benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Abstract-only review limits visibility into parameters and axioms; the approach appears to rest on domain assumptions about zone semantics and graph topology being sufficient proxies for spatial constraints, plus invented zone-graph representation.

axioms (1)

domain assumption High-level semantic intent can be decomposed into functional zones and topological constraints without loss of essential spatial information
Invoked in the description of translating intent into zones for robust adaptation to architectural forms.

invented entities (2)

Zone-Graph no independent evidence
purpose: Internal representation that encodes functional zones and topological constraints to guide scene generation
Novel zone-based logic introduced as the core paradigm shift from object-centric synthesis.
Z-GRPO no independent evidence
purpose: Zone-Aware Group Relative Policy Optimization for reconciling semantic and geometric objectives
Introduced as part of the Alternating Alignment Strategy.

pith-pipeline@v0.9.0 · 5525 in / 1398 out tokens · 29527 ms · 2026-05-08T18:25:29.031781+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.AlphaCoordinateFixation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use AdamW with learning rates 1e-5 for SFT and 5e-6 for Z-GRPO, with KL coefficient β=0.04. ... λ1 (Boundary) 1.0, λ2 (Zone) 0.5, λ3 (Collision) 2.0
Foundation.AlexanderDuality alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

ZoneMaestro... reformulates layout synthesis via the Zone-Graph Paradigm. This approach enables semantic encapsulation and geometric adaptation to non-convex boundaries

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 1 canonical work pages · 1 internal anchor

[6]

InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts

doi: 10.48550/ARXIV .2509.10813. URL https: //doi.org/10.48550/arXiv.2509.10813. 12 Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation This appendix is organized as follows.Appendix Aprovides theoretical motivation for the Alternating Alignment strategy. Appendix Bdetails the Zone-Scene-10K dataset construction...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2025
[7]

Room with a diagonal wall cut; 8

Trapezoidal; 7. Room with a diagonal wall cut; 8. Room with a protruding nook/alcove; 9. Other irregular shapes. Room Types (7 categories):Bedroom, Living Room, Kitchen, Bathroom, Dining Room, Office, Study Room. Interior Styles (7 variations):
[8]

Modern interior design, fully furnished with complete amenities
[9]

Comfortable lived-in atmosphere, well-organized layout
[10]

Functional layout with distinct activity zones and ample storage 15 Orchestrating Spatial Semantics via a Zone-Graph Paradigm for Intricate Indoor Scene Generation
[11]

Spacious arrangement with multiple furniture groupings
[12]

Contemporary style with detailed decor and accessories
[13]

High-efficiency layout maximizing floor space utility
[14]

Total Images generated:9×7×7×10×5 = 22,050

Luxurious design with distinct separation of functions Repetitions:5 different seeds per combination. Total Images generated:9×7×7×10×5 = 22,050. Image Generation Prompts Prefix (Shared across all images): “Generate a single high-quality room architectural 2D floor plan image, top-down vertical view, bird’s eye view, orthographic projection, clean lines, ...
[15]

A {shape} bedroom layout. The sleeping zone is centered, with freestanding wardrobe units lining one wall. {style}

“A {shape} bedroom layout. The sleeping zone is centered, with freestanding wardrobe units lining one wall. {style}.”
[16]

Plan of a {shape} single bedroom. A study desk is positioned near the window, sharing the open space with the bed. {style}

“Plan of a {shape} single bedroom. A study desk is positioned near the window, sharing the open space with the bed. {style}.”
[17]

Top-down view of a{shape} bedroom. The room features a dressing area defined simply by a mirror and open clothing racks, not walls.{style}

“Top-down view of a{shape} bedroom. The room features a dressing area defined simply by a mirror and open clothing racks, not walls.{style}.”
[18]

A {shape} bedroom designed for two people. Twin beds are arranged symmetrically in the single open space. {style}

“A {shape} bedroom designed for two people. Twin beds are arranged symmetrically in the single open space. {style}.”
[19]

Layout of a{shape}bedroom where a lounge chair creates a reading nook in the corner of the room.{style}

“Layout of a{shape}bedroom where a lounge chair creates a reading nook in the corner of the room.{style}.”
[20]

A large{shape} master bedroom. A sofa sits at the foot of the bed, creating a sitting zone within the open floor plan. {style}

“A large{shape} master bedroom. A sofa sits at the foot of the bed, creating a sitting zone within the open floor plan. {style}.”
[21]

View of a{shape}bedroom with extensive storage cabinets arranged along the perimeter walls.{style}

“View of a{shape}bedroom with extensive storage cabinets arranged along the perimeter walls.{style}.”
[22]

A{shape}bedroom with an asymmetric furniture arrangement to fit the irregular wall geometry.{style}

“A{shape}bedroom with an asymmetric furniture arrangement to fit the irregular wall geometry.{style}.”
[23]

A compact{shape}bedroom layout where the bed is tucked into a niche of the outer wall.{style}

“A compact{shape}bedroom layout where the bed is tucked into a niche of the outer wall.{style}.”
[24]

A{shape}bedroom featuring a makeup station and dresser integrated into the main sleeping area.{style}

“A{shape}bedroom featuring a makeup station and dresser integrated into the main sleeping area.{style}.” *(Similar templates were used for other room types, focusing on their specific furniture and functional zones.)* C.2. Reverse Instruction Generation OverviewWe employ GPT-4o-mini to generate natural user instructions by analyzing the generated floor pl...
[25]

**Single Volume: ** The image depicts ONE continuous room (Bedroom, Kitchen, etc.)
[26]

**No Structural Partitions: ** Internal lines are furniture (wardrobes, screens), NOT walls
[27]

en-suite

**No Sub-Rooms: ** Never describe separate rooms like "en-suite" or "pantry". Everything is in the open plan. ### TASK PROTOCOL You will be given:
[28]

**[TARGET PERSONA]: ** A specific style of user (e.g., Casual, Technical)
[29]

**[TEMPLATE STARTER]: ** The example phrase you can refer to under TARGET PERSONA
[30]

### EXECUTION STEPS

**[CONTENT FOCUS]: ** The specific aspect of the image to highlight (Geometric Shape, Functional Zones, or Asset Density). ### EXECUTION STEPS
[31]

If casual, use simple words

**Adopt the Persona: ** Look at the [TEMPLATE STARTER]. If casual, use simple words. If technical, use precise terms
[32]

- If Focus = **Function**: Describe how furniture creates zones without walls

**Analyze the Focus: ** - If Focus = **Geometry**: Describe the L-shape, T-shape, or irregular boundary. - If Focus = **Function**: Describe how furniture creates zones without walls. - If Focus = **Assets**: List specific furniture items and describe the density
[33]

Design a layout for a

**Complete the Instruction: ** - Start exactly with the [TEMPLATE STARTER]. - Continue the sentence naturally to describe the image. - Ensure the final output is a coherent, single-sentence command or request. ### OUTPUT FORMAT Return **ONLY** the final completed instruction string. Do not add quotation marks. Content Focus CategoriesWe define 7 content f...
[34]

Quality Evaluation:GPT-4o-mini scoring with hard filters (Image Leak, Multi-Room, Template Violation, Length Checks)
[35]

4.Data Augmentation:Supplementing non-Geometry focus data from cache

Semantic Deduplication:Greedy deduplication using text-embedding-3-large with a cosine similarity threshold of 0.8. 4.Data Augmentation:Supplementing non-Geometry focus data from cache. 5.Balanced Sampling:Removing simple Rectangles and uniformly sampling irregular shapes to ensure difficulty. GPT Quality Evaluation System Prompt You are a data quality ev...
[36]

This image shows

IMAGE_LEAK: Contains phrases like "This image shows", "In the image"
[37]

en-suite

MULTI_ROOM: Mentions separate rooms like "en-suite", "pantry"
[38]

TEMPLATE_VIOLATION: Does not start with the provided template_starter
[39]

TOO_SHORT: Less than 50 characters
[40]

TOO_LONG: More than 800 characters
[41]

pass_hard_filter

ROOM_MISMATCH: Describes wrong room type. **Quality Score (1-10): ** 1-3: Poor; 4-5: Below Average; 6-7: Good; 8-9: Very Good; 10: Excellent. ### OUTPUT FORMAT (JSON only) { "pass_hard_filter": true/false, "reject_reason": "NONE" or [REASON], "quality_score": 1-10, "brief_comment": "One sentence reason" } C.4. Final Benchmark Statistics Total Instructions...
[42]

Structural Orchestration (Critical) •Focus:Hierarchy & Grouping (Handling Massive Assets). • Criteria:specifically for scenes withmassive assets (¿50 items), does the model organize them into logical functional groups/zones? Or are they scattered randomly/piled up? •Score (0-10):0 = Chaotic scattering; 10 = Clear, hierarchical zoning
[43]

Hierarchical Zoning

Geometric Grounding (Critical) •Criteria:How well does the layout adapt toirregular geometries? Why ZoneMaestro Scores Lower.SFT baselines, unconstrained by physical collision checks, often produce highly symmetric, grid-like patterns that visually maximize the “Hierarchical Zoning” score, despite lacking physical plausibility (Realism≈3.9). In contrast, ...
[44]

This count MUST exactly equal the number of objects in the input JSON

**Object Count Verification **: Count the total number of objects in your groups array. This count MUST exactly equal the number of objects in the input JSON. If the counts differ, identify and fix the discrepancy
[45]

Use the ‘jid‘ field to track each object uniquely

**Object Completeness Check **: For EVERY object in the input JSON, verify it appears exactly once in your groups array. Use the ‘jid‘ field to track each object uniquely
[46]

No modifications, rounding, or paraphrasing allowed

**Field Integrity Verification **: For EVERY object in your output, verify that ALL fields (desc, size, pos, rot, jid) are copied character-by-character identical to the input JSON. No modifications, rounding, or paraphrasing allowed
[47]

**No Duplication Check **: Verify that no object (identified by ‘jid‘) appears in multiple groups
[48]

room_type

**No Orphaned Objects **: Ensure every object from the input appears in exactly one group in your output. If any verification step fails, you MUST correct the issue before providing your final JSON output. ## Additional Quality Hints - Choose a clear anchor per group (e.g., table, bed, sofa) and gather satellites via bounding-box proximity and consistent ...
[49]

‘‘‘json <<ZONE_LAYOUT_JSON>> ‘‘‘

**Zone-Specific Layout Data (JSON) **: Contains only the objects belonging to THIS zone, with ‘desc‘, ‘pos‘, ‘size‘, ‘rot‘, ‘model_uid‘ for each. ‘‘‘json <<ZONE_LAYOUT_JSON>> ‘‘‘
[50]

## Core Task: Intra-Zone Spatial Graph Construction Analyze the isolated zone and construct a spatial graph capturing:

**Zone-Isolated Rendering **: A masked view showing ONLY this zone’s objects, with other zones removed for clarity. ## Core Task: Intra-Zone Spatial Graph Construction Analyze the isolated zone and construct a spatial graph capturing:
[51]

**Anchor Identification **: The primary defining object (e.g., Bed, Desk)
[52]

**Satellite Relations **: How secondary objects relate to the anchor
[53]

zone_id":

**Internal Spatial Constraints **: Precise geometric relationships ## Spatial Relation Taxonomy (Select Most Specific) **Support & Containment: ** - ‘supported_by‘: Object A rests on Object B (e.g., Lamp on Nightstand) - ‘embedded_in‘: Object A inside storage of B (e.g., Books in Shelf) - ‘on_top_of‘: Generic vertical stacking (e.g., Pillow on Bed) - ‘und...
[54]

‘‘‘json <<FULL_SCENE_JSON_WITH_ZONES>> ‘‘‘

**Complete Scene Layout (JSON) **: All zones with their extracted Intra-Zone Spatial Graphs from Stage 2. ‘‘‘json <<FULL_SCENE_JSON_WITH_ZONES>> ‘‘‘
[55]

## Core Task: Zone Topology Graph Construction Analyze the global scene to derive:

**Global Renderings **: Full scene perspective and top-down views showing ALL zones and their spatial relationships. ## Core Task: Zone Topology Graph Construction Analyze the global scene to derive:
[56]

**Inter-Zone Connectivity **: How zones relate to each other spatially
[57]

**Zone-Architecture Anchoring **: How zones attach to structural elements ## Zone Topology Relation Taxonomy **Connectivity Relations (Zone <-> Zone): ** - ‘adjacent_open‘: Zones touch with no barrier; uninterrupted visual flow - ‘adjacent_passageway‘: Connected via hallway or circulation path - ‘connected_via_door‘: Separated by wall but linked by door -...
[58]

**Geometric Indexing **: Start from min-X vertex, traverse clockwise
[59]

‘wall_N‘

**Wall Naming **: Sequential IDs: ‘wall_01‘, ‘wall_02‘, ... ‘wall_N‘
[60]

**Normal Calculation **: Inward-pointing normals in Z-up system
[61]

architecture

**Opening Detection **: Mark passages as ‘opening‘ or ‘virtual_boundary‘ ## Output Format ‘‘‘json { "architecture": { "boundary_polygon": [[x, y, z], ...], "height": ..., "structure_nodes": [ { "id": "wall_01", "type": "wall", "segment": [[x1,z1],[x2,z2]], "normal": [nx, ny, 0] }, { "id": "door_01", "type": "door", "pos": [x, y, z], "parent_wall": "wall_0...
[62]

All zones from Stage 2 appear in zone_topology.nodes
[63]

Every zone has at least one anchoring relation to structure
[64]

Adjacent zones have explicit connectivity edges
[65]

Synthesis of Multi-Granular Design Intents

No topology edges reference non-existent zones or walls ‘‘‘ F.2. Design Intent Synthesis Prompts These prompts are used to reverse-engineer natural user instructions from ground-truth layouts, corresponding to the “Synthesis of Multi-Granular Design Intents” described in Section 3.2. We provide templates for three granularity levels. Coarse Granularity: R...
[66]

A cluttered, L-shaped artist studio with over 50 items

**Text Instruction: ** The original prompt describing the scene (e.g., "A cluttered, L-shaped artist studio with over 50 items")
[67]

lived-in

**Visual Renderings: ** Perspective images of the generated scene. # Critical Constraints (READ CAREFULLY) - **IGNORE Rendering Quality: ** Do NOT downgrade scores for low resolution, blur, pixelation, or lighting artifacts. - **IGNORE Asset Texture: ** Do NOT evaluate the material quality or texture resolution of the furniture. - **FOCUS ONLY ON: ** Spat...