arxiv: 2604.20746 · v1 · submitted 2026-04-22 · 💻 cs.MM

Recognition: unknown

Realistic Virtual Flood Experience System Using 360{deg} Videos and 3D City Models Constructed from Building Footprints

Kiyoharu Aizawa, Koki Kawada, Masatoshi Denda, Mizuki Takenawa, Tatsuro Banno

Pith reviewed 2026-05-09 22:08 UTC · model grok-4.3

classification 💻 cs.MM

keywords virtual realityflood simulation360 video3D modelingdisaster educationbuilding footprintsrisk communication

0 comments

The pith

A virtual flood system using 360 videos and simple 3D models from building footprints helps users visualize location-specific evacuations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces a framework for creating realistic virtual flood experiences by combining 360-degree videos of real locations with 3D building models automatically generated from 2D footprints. Existing systems often use generic virtual cities that lack connection to actual places, reducing their effectiveness for personal risk awareness. The method extrudes building footprints to estimated heights and aligns the models with the videos to simulate flooding in photorealistic settings. Demonstrated in a flood-prone town in Japan, a study with local residents found the system improved their ability to imagine evacuation scenarios specific to their area.

Core claim

The proposed framework enables 3D flood visualization in photorealistic real-world environments by constructing 3D city models from widely available 2D building footprints and aligning them with 360° videos, without needing pre-existing detailed models like CityGML. In the Memuro demonstration, this allowed users to experience flooding in their actual surroundings.

What carries the argument

The spatial alignment of extruded building footprint models with 360° video frames to support 3D flood water level visualization in real locations.

If this is right

It allows creation of location-specific virtual flood experiences in regions lacking detailed 3D city data.
Users gain better understanding of personal flood evacuation routes and situations.
The system serves as an effective tool for disaster risk communication and education.
Data requirements are minimal, relying on common 2D footprints and 360 videos.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar techniques could apply to visualizing other hazards like landslides or storm surges using the same data sources.
Communities without access to professional 3D modeling resources could adopt this for local preparedness training.
Future work might test if repeated exposure improves actual evacuation behavior in real events.

Load-bearing premise

The extruded 2D building footprints aligned with 360 videos provide sufficient visual accuracy and realism for users to meaningfully improve their location-specific flood evacuation understanding.

What would settle it

If a user study with residents from the target area shows no measurable increase in their ability to describe accurate location-specific evacuation steps after using the system compared to control groups.

Figures

Figures reproduced from arXiv: 2604.20746 by Kiyoharu Aizawa, Koki Kawada, Masatoshi Denda, Mizuki Takenawa, Tatsuro Banno.

**Figure 3.** Figure 3: Comparison of discrepancies between 360° video frames and the 3D city model. (a) before optimization and (b) after optimization. Red-highlighted areas indicate the projections of the 3D building models onto the 360° camera view, based on the estimated camera poses. Note that the building heights are set uniform and inaccurate; pay attention to the boundaries between the buildings and the ground. 1. Ground… view at source ↗

**Figure 5.** Figure 5: Example views of our virtual flood experience sys [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

read the original abstract

Virtual flood experience systems, which enable users to vividly experience flooding, are attracting increasing attention as effective tools for communicating flood risks. However, existing systems typically rely on virtual cities that do not correspond to real locations and often lack sufficient photorealism, limiting users' ability to relate scenarios to their own surroundings. Although 360{\deg} video-based virtual environments offer a simple and scalable way to visually replicate real-world scenes, effective 3D flood visualization in these environments typically requires 3D building geometry of the target area, which is not readily available in many regions. To address this limitation, we propose a new virtual flood experience framework that integrates 360{\deg} videos with 3D models automatically constructed from widely available 2D building footprints. By extruding footprints to plausible heights and spatially aligning the constructed models with 360{\deg} videos, our framework enables 3D flood visualization in photorealistic environments without relying on pre-existing city models such as CityGML. We demonstrate the framework in Memuro, Hokkaido, Japan, an area vulnerable to river flooding. A user study with local residents showed that the proposed system enhances users' ability to envision location-specific flood evacuation situations, demonstrating its potential as an effective tool for disaster risk communication and education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a framework for realistic virtual flood experience systems that combines 360° videos of real locations with 3D building models automatically constructed by extruding widely available 2D building footprints to plausible heights and spatially aligning them with the videos. This enables 3D flood visualization in photorealistic environments without pre-existing detailed city models such as CityGML. The approach is demonstrated in Memuro, Hokkaido, Japan, and a user study with local residents is reported to show that the system enhances users' ability to envision location-specific flood evacuation situations, positioning it as a tool for disaster risk communication and education.

Significance. If the supporting evidence holds, the work has clear significance for disaster preparedness: it provides a scalable method to create location-specific, photorealistic virtual flood experiences in areas lacking detailed 3D data, potentially improving risk communication and education by helping users relate scenarios to their actual surroundings.

major comments (2)

[Abstract and Evaluation/User Study section] The abstract claims that 'a user study with local residents showed that the proposed system enhances users' ability to envision location-specific flood evacuation situations,' but supplies no details on study design, sample size, quantitative measures, controls, or statistical results. This is load-bearing for the central claim and must be expanded with full methodological information and results in the evaluation section.
[3D Model Construction / Framework Description] § on 3D model construction: the framework extrudes 2D footprints to 'plausible heights' and aligns them with 360° videos, but reports no validation of height accuracy, alignment precision, or comparison against real building data, detailed models, or actual flood observations. This assumption is load-bearing for the claim that the visualized flooding supports meaningful location-specific understanding, as systematic errors in water levels relative to floors or routes could undermine the reported benefits.

minor comments (1)

[Abstract and Introduction] Clarify in the abstract and introduction whether the 3D models are used only for flood visualization overlays or also for occlusion and interaction, as this affects the claimed photorealism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. These have prompted us to strengthen the presentation of the user study and to clarify the assumptions underlying the 3D model construction. We respond to each major comment below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract and Evaluation/User Study section] The abstract claims that 'a user study with local residents showed that the proposed system enhances users' ability to envision location-specific flood evacuation situations,' but supplies no details on study design, sample size, quantitative measures, controls, or statistical results. This is load-bearing for the central claim and must be expanded with full methodological information and results in the evaluation section.

Authors: We agree that the user-study details are essential to substantiate the central claim. The original manuscript contains a dedicated evaluation section describing the study with local residents of Memuro, but we acknowledge that the methodological information was insufficiently detailed. In the revised manuscript we have substantially expanded this section to report the study design (including participant recruitment from the target community, procedure, and questionnaire instruments), sample size, quantitative measures (pre/post Likert-scale ratings of ability to envision location-specific evacuation routes and situations), and the statistical results. We have also updated the abstract to include a concise reference to the study scale and outcome. These additions directly address the load-bearing nature of the claim. revision: yes
Referee: [3D Model Construction / Framework Description] § on 3D model construction: the framework extrudes 2D footprints to 'plausible heights' and aligns them with 360° videos, but reports no validation of height accuracy, alignment precision, or comparison against real building data, detailed models, or actual flood observations. This assumption is load-bearing for the claim that the visualized flooding supports meaningful location-specific understanding, as systematic errors in water levels relative to floors or routes could undermine the reported benefits.

Authors: We appreciate the referee highlighting this point. The 3D models are generated by extruding publicly available 2D building footprints to heights chosen according to typical building typologies in the region and then aligned to the 360° video frames via landmark-based registration. The original manuscript does not contain quantitative validation (e.g., RMSE against surveyed heights or comparison with CityGML). In the revision we have added an explicit discussion subsection that (i) states the rationale for the plausible-height heuristic, (ii) acknowledges the absence of formal accuracy metrics, and (iii) explains why the approach remains useful for experiential risk communication even if minor geometric discrepancies exist. We maintain that the user-study findings are still informative because participants evaluated the system in locations they know personally; however, we now clearly flag the limitation for readers. revision: partial

Circularity Check

0 steps flagged

No circularity: applied systems paper with no derivations or fitted predictions

full rationale

The paper describes a construction pipeline (extrude 2D footprints to plausible heights, align to 360° video) and reports a user study on evacuation visualization. No equations, parameters, or predictions appear anywhere in the abstract or described content. The central claim rests on the user-study outcomes rather than any self-referential reduction. No self-citations are invoked as uniqueness theorems or load-bearing premises. This is the expected non-finding for a non-mathematical systems paper.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach depends on common domain assumptions in 3D reconstruction and video registration rather than new fitted parameters or invented physical entities.

axioms (2)

domain assumption 2D building footprints can be extruded to plausible heights to produce usable 3D geometry for visualization purposes
This is the core step for constructing 3D models as described in the framework.
domain assumption The extruded 3D models can be spatially aligned with 360° videos with sufficient accuracy for integrated flood visualization
Required to combine the two data sources into a coherent virtual environment.

pith-pipeline@v0.9.0 · 5551 in / 1477 out tokens · 52314 ms · 2026-05-09T22:08:15.845349+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 1 canonical work pages

[1]

Tatsuro Banno, Mizuki Takenawa, Leslie Wöhler, Satoshi Ikehata, and Kiyoharu Aizawa. 2026. 360CityGML: Realistic and Interactive Urban Visualization System Integrating CityGML Model and 360°Videos.IEEE Transactions on Visualization and Computer Graphics32, 02 (2026), 2420–2426

2026
[2]

Kate Burningham, Jane Fielding, and Diana Thrush. 2008. ‘It’ll never happen to me’: understanding public awareness of local flood risk.Disasters32, 2 (2008), 216–238

2008
[3]

Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. 2022. Masked-attention mask transformer for universal image segmen- tation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’22). 1290–1299

2022
[4]

Alessandro D’Amico, Gabriele Bernardini, Ruggiero Lovreglio, and Enrico Quagliarini. 2023. A non-immersive virtual reality serious game application for flood safety training.International journal of disaster risk reduction96 (2023), 103940

2023
[5]

Masatoshi Denda and Masakazu Fujikane. 2024. Development of a virtual flood experience system and its suitability as a flood risk communication tool. InPro- ceedings of the 11th Scientific Assembly of International Assosication of Hydrological Sciences (IAHS’24), Vol. 386. 21–26

2024
[6]

Toshio Fujimi and Kodai Fujimura. 2020. Testing public interventions for flash flood evacuation through environmental and social cues: The merit of virtual reality experiments.International Journal of Disaster Risk Reduction50 (2020), 101690

2020
[7]

Gerhard Gröger and Lutz Plümer. 2012. CityGML–Interoperable semantic 3D city models.ISPRS Journal of Photogrammetry and Remote Sensing71 (2012), 12–33

2012
[8]

Nikolaus Hansen and Andreas Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies.Evolutionary computation9, 2 (2001), 159–195

2001
[9]

2026.Develop- ment, Provision, and Utilization of Geospatial Information

Ministry of Land, Infrastructure, Transport and Tourism, Japan. 2026.Develop- ment, Provision, and Utilization of Geospatial Information. Retrieved February 19, 2026 from https://www.mlit.go.jp/common/001033766.pdf

work page arXiv 2026
[10]

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder
[11]

InProceedings of the IEEE International Conference on Computer Vision (ICCV’17)

The mapillary vistas dataset for semantic understanding of street scenes. InProceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4990–4999
[12]

2026.Fundamental Geospatial Data Portal

Geospatial Information Authority of Japan. 2026.Fundamental Geospatial Data Portal. Retrieved February 19, 2026 from https://www.gsi.go.jp/kiban/

2026
[13]

Yusuf Sermet and Ibrahim Demir. 2019. Flood action VR: a virtual reality frame- work for disaster awareness and emergency response training. InProceedings of the ACM SIGGRAPH 2019 Posters. 1–2

2019
[14]

Naoki Sugimoto, Yoshihito Ebine, and Kiyoharu Aizawa. 2020. Building Movie Map-A Tool for Exploring Areas in a City-and its Evaluations. InProceedings of the 28th ACM International Conference on Multimedia (MM ’20). 3330–3338

2020
[15]

Naoki Sugimoto, Toru Okubo, and Kiyoharu Aizawa. 2020. Urban Movie Map for Walkers: Route View Synthesis Using 360 Videos. InProceedings of the Inter- national Conference of Multimedia Retrieval (ICMR’20). 502–508

2020
[16]

Shinya Sumikura, Mikiya Shibuya, and Ken Sakurada. 2019. OpenVSLAM: A versatile visual SLAM framework. InProceedings of the 27th ACM International Conference on Multimedia (MM ’19). 2292–2295

2019
[17]

Mizuki Takenawa, Naoki Sugimoto, Leslie Wöhler, Satoshi Ikehata, and Kiyoharu Aizawa. 2023. 360RVW: Fusing Real 360°Videos and Interactive Virtual Worlds. InProceedings of the 31st ACM International Conference on Multimedia (MM ’23). 9379–9381

2023
[18]

Mizuki Takenawa, Naoki Sugimoto, Leslie Wöhler, Satoshi Ikehata, and Kiyoharu Aizawa. 2026. Building and evaluating a realistic virtual world for large scale urban exploration from 360°videos.Multimedia Tools and Applications85, 2 (2026), 149

2026
[19]

2017.Post-Disaster Review Report on Typhoon No

Memeshiro Town. 2017.Post-Disaster Review Report on Typhoon No. 10 (2016). Re- trieved February 19, 2026 from https://www.memuro.net/administration/soshiki/ soumu/bousai-keikaku/files/kensyohoukokusyo.pdf

2017
[20]

2026.Unity

Unity Technologies. 2026.Unity. Retrieved February 19, 2026 from https: //unity.com

2026