FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing
Pith reviewed 2026-05-19 08:13 UTC · model grok-4.3
The pith
FaSTA* mines reusable subroutines from past toolpaths so LLMs can handle most multi-turn image edits with fast planning before falling back to A* search.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By continuously mining and refining symbolic subroutines from successful toolpaths, FaSTA* lets LLMs cover the majority of editing subtasks through fast rule-based selection, activating slow A* search only for novel cases, which reduces overall exploration cost on similar subtasks applied to new images.
What carries the argument
Adaptive fast-slow planning that first tries LLM-selected or generated subroutines mined from prior successes and falls back to per-subtask A* search only on failure.
Load-bearing premise
Large language models can reliably extract and refine subroutines from successful toolpaths that stay correct and reusable across similar images and tasks without introducing errors or losing coverage.
What would settle it
Measure whether disabling subroutine mining on a new collection of multi-turn editing tasks causes computation time to rise sharply or success rate to fall compared with the full FaSTA* system.
Figures
read the original abstract
We develop a cost-efficient neurosymbolic agent to address challenging multi-turn image editing tasks such as ``Detect the bench in the image while recoloring it to pink. Also, remove the cat for a clearer view and recolor the wall to yellow.'' It combines the fast, high-level subtask planning by large language models (LLMs) with the slow, accurate, tool-use, and local A$^*$ search per subtask to find a cost-efficient toolpath -- a sequence of calls to AI tools. To save the cost of A$^*$ on similar subtasks, we perform inductive reasoning on previously successful toolpaths via LLMs to continuously extract/refine frequently used subroutines and reuse them as new tools for future tasks in an adaptive fast-slow planning, where the higher-level subroutines are explored first, and only when they fail, the low-level A$^*$ search is activated. The reusable symbolic subroutines considerably save exploration cost on the same types of subtasks applied to similar images, yielding a human-like fast-slow toolpath agent ``FaSTA$^*$'': fast subtask planning followed by rule-based subroutine selection per subtask is attempted by LLMs at first, which is expected to cover most tasks, while slow A$^*$ search is only triggered for novel and challenging subtasks. By comparing with recent image editing approaches, we demonstrate FaSTA$^*$ is significantly more computationally efficient while remaining competitive with the state-of-the-art baseline in terms of success rate. Our code and data can be accessed at https://github.com/tianyi-lab/FaSTAR.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FaSTA*, a neurosymbolic agent for multi-turn image editing tasks that combines LLM-based high-level subtask planning with per-subtask A* search to generate cost-efficient toolpaths. It adds an inductive subroutine mining step in which LLMs extract and refine reusable symbolic subroutines from previously successful toolpaths, enabling a fast-slow adaptive planner that prefers mined subroutines and falls back to A* only for novel cases. The central claim is that this yields significantly lower computational cost than recent baselines while remaining competitive in success rate.
Significance. If the subroutine mining step produces correct, generalizable subroutines that cover most recurring subtasks, the work would demonstrate a practical way to reduce the expense of repeated A* searches in iterative vision-language tool use, advancing hybrid fast-slow neurosymbolic agents. The public release of code and data at the cited GitHub repository is a clear strength that supports reproducibility.
major comments (2)
- [Method (subroutine mining procedure)] The efficiency claim rests on the assumption that LLM inductive reasoning on successful toolpaths yields subroutines that are both correct and sufficiently broad to avoid frequent fallback to A*; no formal verification, manual inspection protocol, or error-rate analysis of the mined subroutines is described, leaving open the possibility that over-generalization or invalid sequences would erode the reported savings.
- [Abstract and Experiments section] Abstract and experimental claims state that FaSTA* is 'significantly more computationally efficient' while 'competitive with the state-of-the-art baseline in terms of success rate,' yet the provided text contains no quantitative tables, ablation isolating mining quality, error bars, or per-subtask fallback frequencies; without these data the central efficiency result cannot be verified.
minor comments (2)
- [Introduction] The notation distinguishing 'subroutine' from 'toolpath' and 'subtask' should be defined once at first use and used consistently thereafter.
- [Method] A brief discussion of how mined subroutines are stored, indexed, and retrieved (e.g., as new callable tools) would improve clarity of the adaptive planner.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to incorporate additional details and data where appropriate.
read point-by-point responses
-
Referee: [Method (subroutine mining procedure)] The efficiency claim rests on the assumption that LLM inductive reasoning on successful toolpaths yields subroutines that are both correct and sufficiently broad to avoid frequent fallback to A*; no formal verification, manual inspection protocol, or error-rate analysis of the mined subroutines is described, leaving open the possibility that over-generalization or invalid sequences would erode the reported savings.
Authors: We agree that the manuscript would benefit from explicit validation of the mined subroutines. The current description relies on end-to-end task success to imply subroutine quality. In the revised version, we will add a dedicated subsection describing our manual inspection protocol (sampling 50 mined subroutines and checking for syntactic validity, semantic correctness on held-out images, and generality), report the observed error rate, and include per-task fallback frequencies to A* to quantify subroutine coverage. revision: yes
-
Referee: [Abstract and Experiments section] Abstract and experimental claims state that FaSTA* is 'significantly more computationally efficient' while 'competitive with the state-of-the-art baseline in terms of success rate,' yet the provided text contains no quantitative tables, ablation isolating mining quality, error bars, or per-subtask fallback frequencies; without these data the central efficiency result cannot be verified.
Authors: The full manuscript contains comparative results, but we acknowledge the presentation lacks the requested granularity. We will revise the Experiments section to add explicit tables reporting success rates and wall-clock / token costs versus baselines, include error bars from multiple runs, provide an ablation isolating the subroutine mining component, and report per-subtask fallback frequencies. These additions will make the efficiency claims directly verifiable. revision: yes
Circularity Check
No circularity detected; method is empirically validated against external baselines
full rationale
The paper presents an architectural combination of LLM high-level planning, A* local search, and inductive subroutine mining from prior successful toolpaths. Efficiency and success-rate claims rest on direct comparisons to recent image-editing baselines rather than any mathematical derivation, fitted parameter renamed as prediction, or self-referential definition. No equations, uniqueness theorems, or ansatzes are invoked; subroutine extraction is described as an independent LLM inductive step whose correctness is assessed externally via overall task performance. The derivation chain therefore remains self-contained and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can perform reliable inductive reasoning to extract reusable subroutines from successful toolpaths
Reference graph
Works this paper leans on
-
[1]
Character region awareness for text detection
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. Character region awareness for text detection. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019 , pages 9365–9374. Computer Vision Foundation / IEEE, 2019. doi: 10.1109/CVPR.2019.00959
-
[2]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions, 2023
work page 2023
-
[3]
Pixart- α: Fast training of diffusion transformer for photorealistic text-to-image synthesis, 2023
Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. Pixart- α: Fast training of diffusion transformer for photorealistic text-to-image synthesis, 2023
work page 2023
-
[4]
Training-free layout control with cross-attention guidance, 2023
Minghao Chen, Iro Laina, and Andrea Vedaldi. Training-free layout control with cross-attention guidance, 2023
work page 2023
-
[5]
Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes
Xiaohan Fu, Shuheng Li, Zihan Wang, Yihao Liu, Rajesh K. Gupta, Taylor Berg-Kirkpatrick, and Earlence Fernandes. Imprompter: Tricking llm agents into improper tool use, 2024. URL https://arxiv.org/abs/2410.14923
-
[6]
DeepCache: Accelerating Diffusion Models for Free,
Zhi Gao, Yuntao Du, Xintong Zhang, Xiaojian Ma, Wenjuan Han, Song-Chun Zhu, and Qing Li. CLOV A: A closed-loop visual assistant with tool usage and update. InIEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024, pages 13258–13268. IEEE, 2024. doi: 10.1109/CVPR52733.2024.01259
-
[7]
Google Cloud. Google Cloud Vision API, 2024. URLhttps://cloud.google.com/vision. Accessed: January 29, 2025
work page 2024
-
[8]
Tora: A tool-integrated reasoning agent for mathematical problem solving,
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving,
-
[9]
URL https://arxiv.org/abs/2309.17452
work page internal anchor Pith review arXiv
-
[11]
Costa ∗: Cost-sensitive toolpath agent for multi-turn image editing, 2025
Advait Gupta, NandaKiran Velaga, Dang Nguyen, and Tianyi Zhou. Costa ∗: Cost-sensitive toolpath agent for multi-turn image editing, 2025. URL https://arxiv.org/abs/2503. 10613
work page 2025
-
[12]
Implicit occupancy flow fields for perception and prediction in self-driving
Tanmay Gupta and Aniruddha Kembhavi. Visual programming: Compositional visual reasoning without training. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 14953–14962. IEEE, 2023. doi: 10.1109/CVPR52729.2023.01436
-
[13]
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross attention control, 2022. URL https://arxiv. org/abs/2208.01626
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[14]
Dialoggen: Multi-modal interactive dialogue system for multi-turn text-to-image generation, 2024
Minbin Huang, Yanxin Long, Xinchi Deng, Ruihang Chu, Jiangfeng Xiong, Xiaodan Liang, Hong Cheng, Qinglin Lu, and Wei Liu. Dialoggen: Multi-modal interactive dialogue system for multi-turn text-to-image generation, 2024
work page 2024
-
[15]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022
Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022. URL https: //arxiv.org/abs/2201.07207
-
[16]
Understanding the planning of LLM agents: A survey
Xu Huang, Weiwen Liu, Xiaolong Chen, Xingmei Wang, Hao Wang, Defu Lian, Yasheng Wang, Ruiming Tang, and Enhong Chen. Understanding the planning of llm agents: A survey, 2024. URLhttps://arxiv.org/abs/2402.02716
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[17]
Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, and Ying Shan. Smartedit: Exploring complex instruction-based image editing with multimodal large language models, 2023. 10
work page 2023
-
[18]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. Image-to-image translation with conditional adversarial networks, 2018
work page 2018
-
[19]
In2023 IEEE/CVF International Conference on Computer Vision (ICCV)
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloé Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross B. Girshick. Segment anything. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 3992–4003. IEEE, 2023. doi: 10.1109/ ICCV510...
-
[20]
Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment anything, 2023
work page 2023
-
[21]
cwittwer/easyocr: Easyocr, July 2022
Rakpong Kittinaradorn, Wisuttida Wichitwong, Nart Tlisha, Sumitkumar Sarda, Jeff Potter, Sam_S, Arkya Bagchi, ronaldaug, Nina, Vijayabhaskar, DaeJeong Mun, Mejans, Amit Agarwal, Mijoo Kim, A2va, Abderrahim Mama, Korakot Chaovavanich, Loay, Karol Kucza, Vladimir Gurevich, Márton Tim, Abduroid, Bereket Abraham, Giovani Moutinho, milosjovac, Mo- hamed Rashad...
work page 2022
-
[22]
Deblurgan: Blind motion deblurring using conditional adversarial networks, 2018
Orest Kupyn, V olodymyr Budzan, Mykola Mykhailych, Dmytro Mishkin, and Jiri Matas. Deblurgan: Blind motion deblurring using conditional adversarial networks, 2018
work page 2018
-
[23]
Xinzhe Li. A review of prominent paradigms for llm-based agents: Tool use (including rag), planning, and feedback learning, 2024. URL https://arxiv.org/abs/2406.05804
-
[24]
Gligen: Open-set grounded text-to-image generation, 2023
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. Gligen: Open-set grounded text-to-image generation, 2023
work page 2023
-
[25]
Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, and Lei Zhang. Grounding dino: Marrying dino with grounded pre-training for open-set object detection, 2024
work page 2024
-
[26]
Llms are in-context bandit reinforcement learners, 2025
Giovanni Monea, Antoine Bosselut, Kianté Brantley, and Yoav Artzi. Llms are in-context bandit reinforcement learners, 2025. URL https://arxiv.org/abs/2410.05362
-
[27]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022. URL https://arxiv.org/abs/2112.10741
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[28]
Tool learning with large language models: a survey
Changle Qu, Sunhao Dai, Xiaochi Wei, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Jun Xu, and Ji-rong Wen. Tool learning with large language models: a survey. Frontiers of Computer Science, 19(8), January 2025. ISSN 2095-2236. doi: 10.1007/s11704-024-40678-2. URL http://dx.doi.org/10.1007/s11704-024-40678-2
-
[29]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agar- wal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machi...
work page 2021
-
[30]
Zero-Shot Text-to-Image Generation
Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. CoRR, abs/2102.12092, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[31]
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, and Vladlen Koltun. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer, 2020
work page 2020
-
[32]
High-resolution Image Synthesis with Latent Diffusion Models,
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 10674–10685. IEEE, 2022. doi: 10.1109/CVPR52688.2022.01042. 11
-
[33]
High-resolution image synthesis with latent diffusion models, 2022
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, 2022
work page 2022
-
[34]
Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep language understanding, 2022
work page 2022
-
[35]
Small llms are weak tool learners: A multi-llm agent, 2024
Weizhou Shen, Chenliang Li, Hongzhan Chen, Ming Yan, Xiaojun Quan, Hehong Chen, Ji Zhang, and Fei Huang. Small llms are weak tool learners: A multi-llm agent, 2024. URL https://arxiv.org/abs/2401.07324
-
[36]
Zineb Sordo, Eric Chagnon, and Daniela Ushizima. A review on generative ai for text-to- image and image-to-image generation and implications to scientific images, 2025. URL https://arxiv.org/abs/2502.21151
-
[37]
Sketch-guided text-to-image diffusion models, 2022
Andrey V oynov, Kfir Aberman, and Daniel Cohen-Or. Sketch-guided text-to-image diffusion models, 2022
work page 2022
-
[38]
Yolov7: Trainable bag-of- freebies sets new state-of-the-art for real-time object detectors, 2022
Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of- freebies sets new state-of-the-art for real-time object detectors, 2022
work page 2022
-
[39]
Real-esrgan: Training real-world blind super-resolution with pure synthetic data
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021, Montreal, QC, Canada, October 11-17, 2021, pages 1905–1914. IEEE, 2021. doi: 10.1109/ICCVW54120.2021.00217
-
[40]
Zhangyang Wang, Jianchao Yang, Hailin Jin, Eli Shechtman, Aseem Agarwala, Jonathan Brandt, and Thomas S. Huang. Deepfont: Identify your font from an image, 2015
work page 2015
-
[41]
Genartist: Multimodal llm as an agent for unified image generation and editing, 2024
Zhenyu Wang, Aoxue Li, Zhenguo Li, and Xihui Liu. Genartist: Multimodal llm as an agent for unified image generation and editing, 2024
work page 2024
-
[42]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023. URL https://arxiv.org/abs/2201.11903
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[43]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models, 2023. URL https://arxiv.org/abs/2303.04671
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[44]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. Mm-react: Prompting chatgpt for multimodal reasoning and action, 2023. URL https://arxiv.org/abs/2303.11381
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[45]
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models, 2023. URL https: //arxiv.org/abs/2210.03629
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[46]
Magicbrush: A manually annotated dataset for instruction-guided image editing, 2024
Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing, 2024
work page 2024
-
[47]
Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges, 2024. URL https://arxiv.org/abs/2401.07339
-
[48]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models, 2023. 12 Instruction Replace the cat with rabbit High Level Only FaSTA* Figure 6: Failure case for “High-Level Only” execution versus FaSTA∗. For the task “Replace the cat with rabbit”, the initially selected high-level subroutine fails to produce a ...
work page 2023
-
[49]
Task Decomposition and Subtask-Tree Generation Given an input image x and a natural language instruction u, CoSTA∗ employs an LLM to decompose the complex request into a sequence of more manageable subtasks. This decomposition results in a subtask tree, Gss = (Vss, Ess). • Each node vi ∈ Vss corresponds to a specific subtask si (e.g., "remove car," "recol...
-
[50]
Tool Subgraph Construction The abstract subtask tree Gss is then translated into a concrete Tool Subgraph Gts = (Vts, Ets), which is the actual graph the A∗ search will operate on. • For each subtask nodesi in Gss, the MDT is consulted to find all toolsM(si) capable of performing si. • The TDG is then used to backtrack from these tools to include all nece...
-
[51]
Cost-Sensitive A∗ Search for Optimal Toolpath CoSTA∗ employs an A∗ search algorithm on the Tool Subgraph Gts to find an optimal toolpath that balances execution cost and output quality, according to a user-defined trade-off parameter α. • Priority Function: The A∗ search prioritizes nodes (representing tool executions) based on the function f(x) = g(x) + ...
-
[52]
A text prompt describing the editing task
-
[53]
A predefined list of subtasks the model supports (provided below). N.5 Supported Subtasks Here is the complete list of subtasks available for constructing the subtask chain: Object Detection, Object Segmentation, Object Addition, Object Removal, Background Removal, Landmark Detection, Object Replacement, Image Upscaling, Image Captioning, Changing Scenery...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.