Generative AI for Safe and Photorealistic Drone Light Shows
Pith reviewed 2026-06-25 21:31 UTC · model grok-4.3
The pith
SWAN converts text prompts into photorealistic collision-free drone trajectories via video generation and adaptive tracking.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SWAN is an end-to-end pipeline that synthesizes photorealistic, large-scale, and collision-free drone choreographies directly from text prompts. SWAN converts text into realistic reference videos and translates these pixel-space dynamics into physical swarm kinematics using a novel adaptive point-tracking algorithm. Unlike existing trackers, this method maintains spatial coherence through severe occlusions and rapid topological shifts. A dedicated planner then allocates these trajectories to individual drones, while a subsequent safety filter ensures collision-free execution. The system demonstrates scalability by safely orchestrating simulated 2000-drone formations and validates physical fe
What carries the argument
Adaptive point-tracking algorithm that maintains spatial coherence to translate pixel dynamics from generated videos into physical swarm kinematics despite occlusions and topological shifts.
If this is right
- Drone light shows can be created directly from text prompts without manual keyframing or animation.
- The pipeline scales to formations of 2000 drones while remaining collision-free in simulation.
- Physical feasibility holds for dense swarms of 49 quadcopters in real-world tests.
- All computation runs on standard consumer hardware without specialized equipment.
- Multi-robot choreography design becomes automated and accessible through generative AI.
Where Pith is reading between the lines
- The video-to-kinematics translation step could extend to other robot teams where motion is first visualized in 2D.
- Better text-to-video models would directly raise the visual quality of the generated drone patterns.
- The safety filter combined with trajectory allocation might apply to coordinating ground robot swarms or mixed aerial-ground teams.
- Lowering the design effort could enable drone displays for smaller events, education, or temporary installations.
Load-bearing premise
The adaptive point-tracking algorithm can maintain spatial coherence and accurately translate pixel-space dynamics from generated videos into physical swarm kinematics despite severe occlusions and rapid topological shifts.
What would settle it
A generated video containing many overlapping and crossing motions is fed through the full pipeline, after which the output trajectories are executed in a physics simulator to check for any collisions or loss of intended visual patterns.
read the original abstract
Drone light shows are redefining aerial entertainment, yet their widespread adoption is bottlenecked by labor-intensive, manual animation. While generative AI promises an automated alternative, current frameworks fail to provide photorealism with fluid, dynamic motion. To address this limitation, we introduce SWAN, an end-to-end pipeline that synthesizes photorealistic, large-scale, and collision-free drone choreographies directly from text prompts. SWAN converts text into realistic reference videos and translates these pixel-space dynamics into physical swarm kinematics using a novel, adaptive point-tracking algorithm. Unlike existing trackers, this method maintains spatial coherence through severe occlusions and rapid topological shifts. A dedicated planner then allocates these trajectories to individual drones, while a subsequent safety filter ensures collision-free execution. We demonstrate scalability by safely orchestrating simulated 2,000-drone formations and validate physical feasibility on a dense real-world swarm of 49 quadcopters, operating everything entirely on standard consumer hardware. Combined, this work demonstrates how generative AI can be leveraged to automate multi-robot choreography design, providing an accessible new framework for drone light shows.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SWAN, an end-to-end pipeline that generates photorealistic drone light shows from text prompts by first creating reference videos via generative AI, then using a novel adaptive point-tracking algorithm to map pixel dynamics to physical swarm trajectories (claimed to handle severe occlusions and rapid topological shifts), followed by trajectory planning and a safety filter for collision avoidance. It reports successful demonstrations scaling to 2,000 drones in simulation and 49 quadcopters in the real world, all on consumer hardware.
Significance. If the central claims hold, the work could meaningfully advance automated multi-robot choreography design for entertainment applications by reducing reliance on manual animation. The reported scale (2000/49 drones) and end-to-end text-to-execution framing are potentially impactful for the robotics and graphics communities. However, the absence of any quantitative metrics, ablations, or error analysis in the provided manuscript text makes it impossible to evaluate whether the novel tracker or overall pipeline delivers on its robustness and photorealism promises.
major comments (2)
- [Abstract] Abstract (method description): The central claim that the adaptive point-tracking algorithm 'maintains spatial coherence through severe occlusions and rapid topological shifts' is load-bearing for translating generated videos into coherent 3D swarm kinematics, yet the manuscript supplies no quantitative error metrics, occlusion-specific ablations, failure-case analysis, or comparisons against standard trackers on the cited failure modes.
- [Abstract] Abstract (demonstrations): The scalability claims rest on 'safely orchestrating simulated 2,000-drone formations' and 'dense real-world swarm of 49 quadcopters,' but no success rates, collision counts, trajectory error statistics, or baseline comparisons are reported to substantiate collision-free execution or photorealism at these scales.
minor comments (1)
- [Abstract] The abstract is concise but would benefit from one or two key quantitative highlights (e.g., tracking accuracy or collision rate) to allow readers to gauge the strength of the empirical support.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and for recognizing the potential impact of the SWAN pipeline. We agree that the current manuscript lacks the quantitative evaluations needed to fully substantiate the central claims. We will revise the manuscript to address these gaps while maintaining the focus on the end-to-end text-to-trajectory framework.
read point-by-point responses
-
Referee: [Abstract] Abstract (method description): The central claim that the adaptive point-tracking algorithm 'maintains spatial coherence through severe occlusions and rapid topological shifts' is load-bearing for translating generated videos into coherent 3D swarm kinematics, yet the manuscript supplies no quantitative error metrics, occlusion-specific ablations, failure-case analysis, or comparisons against standard trackers on the cited failure modes.
Authors: We agree that quantitative support is required to validate the adaptive point-tracking algorithm's performance under the described conditions. In the revised manuscript we will add tracking error metrics (e.g., average pixel displacement and 3D trajectory deviation), occlusion-specific ablations, a dedicated failure-case analysis, and direct comparisons against standard trackers such as KLT, SORT, and DeepSORT on sequences exhibiting severe occlusions and topological changes. These additions will be placed in a new experimental subsection. revision: yes
-
Referee: [Abstract] Abstract (demonstrations): The scalability claims rest on 'safely orchestrating simulated 2,000-drone formations' and 'dense real-world swarm of 49 quadcopters,' but no success rates, collision counts, trajectory error statistics, or baseline comparisons are reported to substantiate collision-free execution or photorealism at these scales.
Authors: We acknowledge that the scalability and safety claims would be strengthened by supporting statistics. The revised manuscript will report success rates, collision counts (zero in the presented runs), trajectory error statistics (position and velocity RMSE), and comparisons against baseline planners for both the 2,000-drone simulation and the 49-drone real-world experiments. Photorealism will be supported by additional perceptual metrics where feasible. revision: yes
Circularity Check
No circularity: sequential pipeline of independent modules
full rationale
The described SWAN derivation is a linear sequence of distinct engineering stages—text-to-video generation, adaptive point-tracking for kinematics translation, trajectory allocation by planner, and safety filtering—none of which are defined in terms of their own outputs or reduced to fitted parameters by construction. The abstract and reader's summary provide no equations, self-citations, or uniqueness theorems that would make any step tautological. The central claim therefore rests on the empirical performance of these components rather than any self-referential reduction, qualifying as self-contained.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Spectacular Intel drone light show helps bring Tokyo 2020 to life
International Olympic Committee. Spectacular Intel drone light show helps bring Tokyo 2020 to life. URL https://www.olympics.com/ ioc/news/spectacular-intel-drone- light-show-helps-bring-tokyo-2020- to-life-1
2020
-
[2]
FIFA world cup 2022 | SKYMAGIC drone shows
SKYMAGIC. FIFA world cup 2022 | SKYMAGIC drone shows. URL https://skymagic.show/ project/fifa-world-cup-2022/
2022
-
[3]
Austin new year’s drone show, 2025
Sky Elements Drones. Austin new year’s drone show, 2025. URL https: //skyelementsdrones.com/austin-new- years. Accessed: 03.04.2026
2025
-
[4]
China’s spring festival drone shows light up sky across globe, 2026
Xinhua News Agency. China’s spring festival drone shows light up sky across globe, 2026. URL https://english.news.cn/20260215/ 3c92d0d9daa845a78f53e6b194482970/ c.html. Accessed: 03.04.2026
arXiv 2026
-
[5]
Drone shows for music festivals,
CyberDrone. Drone shows for music festivals,
-
[6]
Accessed: 03.04.2026
URL https://www.cyberdrone.com/ blog/drone-shows-for-music- festivals. Accessed: 03.04.2026
2026
-
[7]
Optimal way- point assignment for designing drone light show formations
Dharna Nar and Radhika Kotecha. Optimal way- point assignment for designing drone light show formations. Results in Control and Optimization , 9:100174, 2022
2022
-
[8]
Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023
Kai-Chun Weng, Shu-Ting Lin, Chen-Chi Hu, Ru- Tai Soong, and Ming-Te Chi. Multi-view approach for drone light show.The Visual Computer, 39(11): 5797–5808, 2023
2023
-
[9]
There’s no business like drone business
James O’Malley . There’s no business like drone business. Engineering & Technology, 16(4):72–79, 2021
2021
-
[10]
On the problems of drone formation and light shows
Gene Eu Jan, Tingjun Lei, Chi-Chia Sun, Zong- Ying You, and Chaomin Luo. On the problems of drone formation and light shows. IEEE trans- actions on consumer electronics, 70(3):5259–5268, 2024
2024
-
[11]
Drone light show designer
Vimdrones. Drone light show designer. URL https://docs.vimdrones.com/designer/
-
[12]
Skybrush studio - drone show design solutions
CollMot Robotics. Skybrush studio - drone show design solutions. URL https://skybrush.io/ modules/studio/
-
[13]
Drone show software by SPH Engineering | software for drone light shows
SPH Engineering. Drone show software by SPH Engineering | software for drone light shows. URL https://www.droneshowsoftware.com/ drone-show-software
-
[14]
Clipswarm: Generating drone shows from text prompts with vision-language models
Pablo Pueyo, Eduardo Montijano, Ana C Murillo, and Mac Schwager. Clipswarm: Generating drone shows from text prompts with vision-language models. In 2024 IEEE /RSJ International Con- ference on Intelligent Robots and Systems (IROS) , pages 11917–11923. IEEE, 2024
2024
-
[15]
Gen-Swarms: Adapting deep generative models to swarms of drones
Carlos Plou, Pablo Pueyo, Ruben Martinez-Cantin, Mac Schwager, Ana C Murillo, and Eduardo Mon- tijano. Gen-Swarms: Adapting deep generative models to swarms of drones. In European Confer- ence on Computer Vision, pages 85–101. Springer, 2024
2024
-
[16]
FlockGPT: Guiding uav flocking with linguistic orchestration
Artem Lykov, Sausar Karaf, Mikhail Martynov, Va- lerii Serpiva, Aleksey Fedoseev, Mikhail Konenkov, and Dzmitry Tsetserukou. FlockGPT: Guiding uav flocking with linguistic orchestration. In 2024 IEEE International Symposium on Mixed and Aug- mented Reality Adjunct (ISMAR-Adjunct) , pages 485–488. IEEE, 2024
2024
-
[17]
Aoran Jiao, Tanmay P Patel, Sanjmi Khurana, Anna-Mariya Korol, Lukas Brunke, Vivek K Adaja- nia, Utku Culha, Siqi Zhou, and Angela P Schoel- lig. Swarm-GPT: Combining large language mod- els with safe motion planning for robot choreog- raphy design. arXiv preprint arXiv:2312.01059 , 2023. Page 13 of 15 2026 Reinhold et al
arXiv 2023
-
[18]
SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography
Martin Schuck, Dinushka Orrin Dahanagga- maarachchi, Ben Sprenger, Vedant Vyas, Siqi Zhou, and Angela P Schoellig. SwarmGPT: Com- bining large language models with safe motion planning for drone swarm choreography . IEEE Robotics and Automation Letters, 2025
2025
-
[19]
Learning transferable visual mod- els from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy , Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry , Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual mod- els from natural language supervision. In Interna- tional conference on machine learning, pages 8748–
-
[20]
Wan: Open and ad- vanced large-scale video generative models
Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and ad- vanced large-scale video generative models. arXiv preprint arXiv:2503.20314, 2025
Pith/arXiv arXiv 2025
-
[21]
axswarm: Swarm trajectory planning algorithm imple- mented in jax
Learning Systems and Robotics Lab. axswarm: Swarm trajectory planning algorithm imple- mented in jax. https://github.com/ learnsyslab/axswarm, 2024
2024
-
[22]
Cotracker: It is better to track together
Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker: It is better to track together. In European conference on computer vision , pages 18–35. Springer, 2024
2024
-
[23]
Cotracker3: Simpler and better point tracking by pseudo-labelling real videos
Nikita Karaev, Yuri Makarov, Jianyuan Wang, Na- talia Neverova, Andrea Vedaldi, and Christian Rupprecht. Cotracker3: Simpler and better point tracking by pseudo-labelling real videos. In Pro- ceedings of the IEEE /CVF International Conference on Computer Vision, pages 6013–6022, 2025
2025
-
[24]
Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P
Martin Schuck, Marcel P . Rath, Yufei Hua, Ab- hisheK Goudar, SiQi Zhou, and Angela P . Schoel- lig. Crazyflow: An accurate, gpu-accelerated, dif- ferentiable drone simulator in jax, 2026. URL https://arxiv.org/abs/2606.01478
Pith/arXiv arXiv 2026
-
[25]
Lighthouse Positioning Sys- tem
Bitcraze AB. Lighthouse Positioning Sys- tem. https://www.bitcraze.io/ documentation/system/positioning/ ligthouse-positioning-system/, 2024. Accessed: 2024-05-14
2024
-
[26]
Crazyflie Python Library V2
Bitcraze AB. Crazyflie Python Library V2. https://github.com/bitcraze/ crazyflie-lib-python-v2 , 2024. Accessed: 2026-04-14
2024
-
[27]
Color LED Deck
Bitcraze AB. Color LED Deck. https: //www.bitcraze.io/products/color- led-deck/, 2024. Accessed: 2026-04-14
2024
-
[28]
Optuna: A next-generation hyperparameter optimization framework
Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019
2019
-
[29]
Multiobjec- tive tree-structured parzen estimator
Yoshihiko Ozaki, Yuki Tanigaki, Shuhei Watanabe, Masahiro Nomura, and Masaki Onishi. Multiobjec- tive tree-structured parzen estimator. Journal of Artificial Intelligence Research, 73:1209–1250, 04
-
[30]
doi: 10.1613 /jair.1.13188
-
[31]
HY- Motion 1.0: Scaling flow matching models for text-to-motion generation
Tencent Hunyuan 3D Digital Human Team. HY- Motion 1.0: Scaling flow matching models for text-to-motion generation. arXiv preprint arXiv:2512.23464, 2025
arXiv 2025
-
[32]
HunyuanVideo 1.5 prompt handbook
Tencent Hunyuan. HunyuanVideo 1.5 prompt handbook. https:// github.com/Tencent-Hunyuan/ HunyuanVideo-1.5/blob/main/assets/ HunyuanVideo_1_5_Prompt_Handbook_EN.md,
-
[33]
Qwen3.5: Towards native multi- modal agents, February 2026
Qwen Team. Qwen3.5: Towards native multi- modal agents, February 2026. URL https:// qwen.ai/blog?id=qwen3.5
2026
-
[34]
Image Team, Huanqia Cai, Sihan Cao, Ruoyi Du, Peng Gao, Steven Hoi, Zhaohui Hou, Shijie Huang, Dengyang Jiang, Xin Jin, Liangchen Li, Zhen Li, Zhong-Yu Li, David Liu, Dongyang Liu, Junhan Shi, Qilong Wu, Feng Yu, Chi Zhang, Shifeng Zhang, and Shilin Zhou. Z-Image: An efficient image generation foundation model with single- stream diffusion transformer, 20...
Pith/arXiv arXiv 2025
-
[35]
SAM 3: Segment anything with con- cepts, 2026
Nicolas Carion, Laura Gustafson, Yuan-Ting Hu, Shoubhik Debnath, Ronghang Hu, Didac Suris, Chaitanya Ryali, Kalyan Vasudev Alwala, Haitham Khedr, Andrew Huang, Jie Lei, Tengyu Ma, Bais- han Guo, Arpit Kalla, Markus Marks, Joseph Greer, Meng Wang, Peize Sun, Roman Rädle, Tri- antafyllos Afouras, Effrosyni Mavroudi, Katherine Xu, Tsung-Han Wu, Yu Zhou, Lili...
Pith/arXiv arXiv 2026
-
[36]
Cen- troidal voronoi tessellations: Applications and al- gorithms
Qiang Du, Vance Faber, and Max Gunzburger. Cen- troidal voronoi tessellations: Applications and al- gorithms. SIAM Review , 41(4):637–676, 1999. doi: 10.1137/S0036144599352836. URL https: //doi.org/10.1137/S0036144599352836
-
[37]
Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering
Steven H Strogatz. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Chapman and Hall /CRC, 2024
2024
-
[38]
Reducibility among combinatorial problems
Richard Karp. Reducibility among combinatorial problems. volume 40, pages 85–103, 01 1972. ISBN 978-3-540-68274-5. doi: 10.1007 /978-3- 540-68279-0_8. Page 14 of 15 2026 Reinhold et al
1972
-
[39]
URL https://epubs.siam.org/doi/abs/10
Grace Wahba. SPLine models for observa- tional data . 1 1990. doi: 10.1137 / 1.9781611970128. URL https://doi.org/ 10.1137/1.9781611970128
-
[40]
AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments
Vivek K Adajania, Siqi Zhou, Arun Kumar Singh, and Angela P Schoellig. AMSwarm: An alternat- ing minimization approach for safe motion plan- ning of quadrotor swarms in cluttered environ- ments. In 2023 IEEE International Conference on Robotics and Automation (ICRA) , pages 1421–
2023
-
[41]
Minimum snap trajectory generation and control for quadrotors
Daniel Mellinger and Vijay Kumar. Minimum snap trajectory generation and control for quadrotors. pages 2520 – 2525, 06 2011. doi: 10.1109 / ICRA.2011.5980409. Page 15 of 15
arXiv 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.