arxiv: 2604.00458 · v2 · submitted 2026-04-01 · 💻 cs.SE

Recognition: 1 theorem link

· Lean Theorem

LDMDroid: Leveraging LLMs for Detecting Data Manipulation Errors in Android Apps

Huaxun Huang, Rongxin Wu, Xiangyang Xiao

Authors on Pith no claims yet

Pith reviewed 2026-05-13 22:50 UTC · model grok-4.3

classification 💻 cs.SE

keywords Android appsdata manipulation errorslarge language modelsUI testingautomated bug detectionlogic bugsdata state changes

0 comments

The pith

Large language models detect data manipulation errors in Android apps by guiding state-aware UI sequences and using visual checks for data changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Android apps rely on data manipulation functionalities for handling data through create, read, update, and delete operations, making errors in these critical for app reliability. The paper shows that large language models can be guided in a state-aware process to generate UI interaction sequences that trigger these functionalities more effectively than existing automated tools. By also using visual features to detect changes in data states, the method verifies whether the expected data manipulations occurred, reducing reliance on manual scripts. This approach was tested on 24 real-world apps and found 17 unique bugs, with most confirmed and fixed by developers, demonstrating its practical value in catching hard-to-find logic bugs.

Core claim

LDMDroid enhances DMF triggering success by guiding LLMs through a state-aware process for generating UI event sequences. It also uses visual features to identify changes in data states, improving DME verification accuracy. Evaluated on 24 real-world Android apps, it demonstrates improved DMF triggering success rates compared to baselines and discovered 17 unique bugs, with 14 confirmed by developers and 11 fixed.

What carries the argument

The state-aware process for LLM-guided UI event sequence generation, paired with visual feature analysis for data state change detection, which together enable automated detection of data manipulation errors.

Load-bearing premise

That LLMs guided in a state-aware manner will generate UI sequences capable of triggering data manipulation functionalities reliably and that visual features will accurately reflect data state changes without significant errors.

What would settle it

Running the tool on the evaluated apps and finding that the reported bugs are not reproducible or that developers reject them as non-issues upon review.

Figures

Figures reproduced from arXiv: 2604.00458 by Huaxun Huang, Rongxin Wu, Xiangyang Xiao.

**Figure 1.** Figure 1: A DME in Material Files (v1.7.4) related to “Create File” DMF. The small red boxes indicate UI events. The expected outcome (f ) shows the newly created file appearing in the file list, while the actual outcome (e) reveals its absence, demonstrating the DME. (see [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The Overview of LDMDroid. structured data container and transform DMF semantic correctness verification into a problem of comparing state transitions of this container. This abstraction provides more focused contextual information for LLM to generate oracles. For instance, in [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Illustrative examples of LDMDroid detections. (a,b) show a true positive: a deleted habit item remains visible in the list, [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Android apps rely heavily on Data Manipulation Functionalities (DMFs) for handling app-specific data through CRUDS operations, making their correctness vital for reliability. However, detecting Data Manipulation Errors (DMEs) is challenging due to their dependence on specific UI interaction sequences and manifestation as logic bugs. Existing automated UI testing tools face two primary challenges: insufficient UI path coverage for adequate DMF triggering and reliance on manually written test scripts. To address these issues, we propose an automated approach using Large Language Models (LLMs) for DME detection. We developed LDMDroid, an automated UI testing framework for Android apps. LDMDroid enhances DMF triggering success by guiding LLMs through a state-aware process for generating UI event sequences. It also uses visual features to identify changes in data states, improving DME verification accuracy. We evaluated LDMDroid on 24 real-world Android apps, demonstrating improved DMF triggering success rates compared to baselines. LDMDroid discovered 17 unique bugs, with 14 confirmed by developers and 11 fixed. The tool is publicly available at https://github.com/runnnnnner200/LDMDroid.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LDMDroid gives a workable LLM-driven way to hunt data manipulation bugs in Android apps and surfaces 17 real ones, but the visual detection step looks like the weakest link until the full metrics are checked.

read the letter

The main thing to know is that this paper builds a framework called LDMDroid that steers LLMs with state information to produce UI sequences aimed at data operations, then uses visual screen differences to flag when data states have changed. On 24 real apps it reports better triggering rates than baselines and finds 17 unique bugs, 14 of which developers confirmed and 11 fixed. The tool is released publicly, which is useful for anyone who wants to try it directly. What stands out is the combination of state-aware prompting with visual verification rather than just throwing generic LLM prompts at UI testing. That extension feels like a reasonable next step from earlier LLM-for-testing work. The evaluation uses actual apps and external confirmations, so it avoids the circularity trap of many synthetic benchmarks. The soft spot is the visual detection method itself. Screen differences can come from animations, caching, or non-data UI updates, while real data changes might not show up visually. The abstract gives no precision or recall numbers for this step and no head-to-head comparison against direct checks like SQLite queries or file diffs. If that link is noisy, the bug count becomes harder to trust. The paper is aimed at researchers and practitioners in mobile testing and automated bug finding who already work with LLMs or Android tools. A reader looking for concrete examples of LLM guidance plus a released artifact will get value. It deserves peer review so the full evaluation details, prompt templates, and visual validation numbers can be examined properly.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces LDMDroid, an automated UI testing framework that leverages LLMs in a state-aware process to generate UI event sequences for triggering Data Manipulation Functionalities (DMFs) in Android apps, employs visual features to detect data state changes for verifying Data Manipulation Errors (DMEs), and evaluates the approach on 24 real-world apps, reporting improved DMF triggering rates over baselines along with discovery of 17 unique bugs (14 developer-confirmed, 11 fixed).

Significance. If the central claims hold after addressing validation gaps, the work contributes a practical LLM-guided approach to automated detection of logic bugs in mobile apps that depend on specific UI sequences, with strengths in real-world evaluation, developer confirmations, and public tool release that support reproducibility and potential adoption in software engineering practice.

major comments (2)

[Evaluation] Evaluation section: the visual feature-based method for identifying data state changes (e.g., screen differences signaling CRUDS outcomes) lacks any reported precision/recall metrics or comparison against ground-truth techniques such as direct SQLite queries or file diffs. This assumption is load-bearing for the reliability of the 17 reported DMEs and their confirmation counts, as non-data UI updates or animations could produce false signals.
[Abstract and Evaluation] Abstract and §4 (Evaluation): no quantitative details are supplied on the exact baselines, triggering success rate metrics, number of runs per app, or controls for LLM output variability (e.g., temperature, prompt sensitivity). These omissions leave the effectiveness claims only moderately supported despite the reported bug discoveries.

minor comments (2)

[Methodology] Methodology: provide more explicit description of the state representation passed to the LLM and the exact prompting template used for sequence generation to improve reproducibility.
[Evaluation] Ensure all tables reporting bug counts and confirmation status include clear definitions of 'unique' bugs and the criteria used for developer confirmation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important opportunities to strengthen the quantitative rigor of our evaluation, and we have revised the manuscript accordingly to address them directly. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the visual feature-based method for identifying data state changes (e.g., screen differences signaling CRUDS outcomes) lacks any reported precision/recall metrics or comparison against ground-truth techniques such as direct SQLite queries or file diffs. This assumption is load-bearing for the reliability of the 17 reported DMEs and their confirmation counts, as non-data UI updates or animations could produce false signals.

Authors: We agree that explicit validation metrics would further strengthen confidence in the visual detection component. In the revised manuscript we have added a dedicated validation paragraph in §4 that describes a post-hoc manual verification process: for a representative sample of test executions drawn from multiple apps we compared the visual signals against ground-truth SQLite queries and file-system diffs. The results of this comparison are reported in the revised version and show strong alignment, with discrepancies primarily attributable to transient UI animations that we now explicitly filter. All 17 reported DMEs were additionally validated through direct developer confirmation, providing an independent check against false positives. We have also clarified the visual feature extraction logic to better distinguish data-related state changes from other UI updates. revision: yes
Referee: [Abstract and Evaluation] Abstract and §4 (Evaluation): no quantitative details are supplied on the exact baselines, triggering success rate metrics, number of runs per app, or controls for LLM output variability (e.g., temperature, prompt sensitivity). These omissions leave the effectiveness claims only moderately supported despite the reported bug discoveries.

Authors: We accept that additional quantitative detail improves transparency. The revised manuscript expands §4 (and updates the abstract) to specify: the exact baseline tools and their configurations, triggering success rates together with measures of variability across runs, the number of independent runs executed per app, and the LLM hyperparameters (including temperature and prompt sensitivity controls) with the full prompt templates now provided in the appendix. These additions make the reported improvements in DMF triggering and the 17 discovered bugs more quantitatively grounded while preserving the focus on real-world developer-confirmed results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical tool evaluation

full rationale

The paper proposes LDMDroid as an LLM-guided UI testing framework for detecting data manipulation errors, evaluated empirically on 24 real-world Android apps. It reports improved triggering rates, discovery of 17 unique bugs, and external developer confirmations (14 confirmed, 11 fixed) without any mathematical derivations, equations, fitted parameters renamed as predictions, or self-citations that form the load-bearing justification for the central claims. The approach relies on external benchmarks and real-app validation rather than reducing results to self-defined inputs or internal loops, rendering the evaluation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on assumptions about LLM reliability for sequence generation and visual accuracy for state detection, which are treated as domain assumptions without independent formal proof or external benchmarks beyond the reported evaluation.

axioms (2)

domain assumption LLMs guided by state-aware processes can generate effective UI event sequences to trigger DMFs
Central to enhancing triggering success as stated in the approach description
domain assumption Visual features extracted from screen changes can accurately identify data state modifications for DME verification
Used to improve verification accuracy without manual scripts

pith-pipeline@v0.9.0 · 5501 in / 1245 out tokens · 31847 ms · 2026-05-13T22:50:04.183634+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
LDMDroid enhances DMF triggering success by guiding LLMs through a state-aware process for generating UI event sequences. It also uses visual features to identify changes in data states

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 1 internal anchor

[1]

Android. 2025. Monkey. https://developer.android.google.cn/studio/test/other-testing-tools/monkey

work page 2025
[2]

David Curry. 2025. Android Statistics (2025). https://www.businessofapps.com/data/android-statistics

work page 2025
[3]

F-Droid. 2025. F-Droid. https://f-droid.org

work page 2025
[4]

Github. 2025. Another Notes. https://github.com/maltaisn/another-notes-app

work page 2025
[5]

Github. 2025. [Bug] Settings Entry Mistakenly Treated as TO-DO List After Deleting Last TO-DO. https://github.com/SecUSo/privacy-friendly- todo-list/issues/158

work page 2025
[6]

GitHub. 2025. CPU Info. https://github.com/kamgurgul/cpu-info

work page 2025
[7]

GitHub. 2025. CycleStreets. https://github.com/cyclestreets/android

work page 2025
[8]

Github. 2025. Easy Notes. https://github.com/Kin69/EasyNotes

work page 2025
[9]

Github. 2025. Fridgey. https://github.com/NielsLee/FoodRecords

work page 2025
[10]

Github. 2025. Home Medkit. https://github.com/pewaru-333/HomeMedkit-App

work page 2025
[11]

Github. 2025. LinkHub. https://github.com/AmrDeveloper/LinkHub

work page 2025
[12]

Github. 2025. Material Notes. https://github.com/maelchiotti/LocalMaterialNotes

work page 2025
[13]

Github. 2025. MaterialFiles. https://github.com/zhanghai/MaterialFiles

work page 2025
[14]

Github. 2025. NoNonsense Notes. https://github.com/spacecowboy/NotePad

work page 2025
[15]

Github. 2025. Notally. https://github.com/OmGodse/Notally

work page 2025
[16]

Github. 2025. NotallyX. https://github.com/PhilKes/NotallyX

work page 2025
[17]

GitHub. 2025. OsmAnd. https://github.com/osmandapp/Osmand

work page 2025
[18]

Github. 2025. PFA Todo List. https://github.com/SecUSo/privacy-friendly-todo-list

work page 2025
[19]

GitHub. 2025. Photo Editor. https://github.com/burhanrashid52/PhotoEditor

work page 2025
[20]

Github. 2025. Play NotePad. https://github.com/mshdabiola/NotePad

work page 2025
[21]

Github. 2025. Print Notes. https://github.com/RoBoT095/printnotes

work page 2025
[22]

Github. 2025. Quillpad. https://github.com/quillpad/quillpad

work page 2025
[23]

Github. 2025. Rank My Favs. https://github.com/dessalines/rank-my-favs

work page 2025
[24]

Github. 2025. Read You. https://github.com/Ashinch/ReadYou

work page 2025
[25]

Github. 2025. Recurring Expense Tracker. https://github.com/DennisBauer/RecurringExpenseTracker

work page 2025
[26]

Github. 2025. Table Habit. https://github.com/FriesI23/mhabit

work page 2025
[27]

Github. 2025. Tasky. https://github.com/thatsmanmeet/Tasky

work page 2025
[28]

Github. 2025. To Don’t. https://github.com/Crazy-Marvin/ToDont

work page 2025
[29]

Google. 2025. Prompt Template in LDMDroid. https://docs.google.com/document/d/1asyf25458YNXkfM_4pQOaCEPBZj78lcCBQi7lcyxko8

work page 2025
[30]

Siyi Gu, Xiaoqiang Liu, Hui Guo, Bochun Cao, Baiyan Li, Lizhi Cai, and Hu Yun. 2024. Finding Deep-Hidden Bugs in Android Apps via Functional Semantics Guided Exploration. InTheoretical Aspects of Software Engineering: 18th International Symposium, TASE 2024, Guiyang, China, July 29 – Manuscript submitted to ACM 22 Xiangyang Xiao, Huaxun Huang, and Rongxin...

work page doi:10.1007/978-3-031-64626-3_9 2024
[31]

Tianxiao Gu, Chengnian Sun, Xiaoxing Ma, Chun Cao, Chang Xu, Yuan Yao, Qirun Zhang, Jian Lu, and Zhendong Su. 2019. Practical GUI Testing of Android Applications via Model Abstraction and Refinement. InProceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Montreal, Quebec, Canada, 269–280. doi:10.1109/ICSE.2019.00042

work page doi:10.1109/icse.2019.00042 2019
[32]

Bissyandé, and Jacques Klein

Pingfan Kong, Li Li, Jun Gao, Kui Liu, Tegawendé F. Bissyandé, and Jacques Klein. 2019. Automated Testing of Android Apps: A Systematic Literature Review.IEEE Transactions on Reliability68, 1 (March 2019), 45–66. doi:10.1109/TR.2018.2865733

work page doi:10.1109/tr.2018.2865733 2019
[33]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2017. DroidBot: A Lightweight UI-guided Test Input Generator for Android. InProceedings of the 39th International Conference on Software Engineering Companion (ICSE-C ’17). IEEE Press, Buenos Aires, Argentina, 23–26. doi:10.1109/ICSE- C.2017.8

work page doi:10.1109/icse- 2017
[34]

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. 2020. Humanoid: A Deep Learning-Based Approach to Automated Black-Box Android App Testing. InProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE ’19). IEEE Press, San Diego, California, 1070–1073. doi:10.1109/ASE.2019.00104

work page doi:10.1109/ase.2019.00104 2020
[35]

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Xing Che, Dandan Wang, and Qing Wang. 2024. Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering. ACM, Lisbon Portugal, 1–13. arXiv:2310.15780 doi:10...

work page doi:10.1145/3597503.3639180 2024
[36]

Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Zhilin Tian, Yuekai Huang, Jun Hu, and Qing Wang. 2024. Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery...

work page doi:10.1145/3597503.3639118 2024
[37]

Zhengwei Lv, Chao Peng, Zhao Zhang, Ting Su, Kai Liu, and Ping Yang. 2023. Fastbot2: Reusable Automated Model-based GUI Testing for Android Enhanced by Reinforcement Learning. InProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Association for Computing Machinery, New York, NY, USA, 1–5. doi:10.1145/355...

work page doi:10.1145/3551349.3559505 2023
[38]

Leonardo Mariani, Mauro Pezzè, and Daniele Zuddas. 2018. Augusto: Exploiting Popular Functionalities for the Generation of Semantic GUI Tests with Oracles. InProceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA, 280–290. doi:10.1145/3180155.3180162

work page doi:10.1145/3180155.3180162 2018
[39]

Minxue Pan, An Huang, Guoxin Wang, Tian Zhang, and Xuandong Li. 2020. Reinforcement Learning Based Curiosity-Driven Testing of Android Applications. InProceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, 153–164. doi:10.1145/3395363.3397354

work page doi:10.1145/3395363.3397354 2020
[40]

Dezhi Ran, Hao Wang, Zihe Song, Mengzhou Wu, Yuan Cao, Ying Zhang, Wei Yang, and Tao Xie. 2024. Guardian: A Runtime Framework for LLM-Based UI Exploration. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2024). Association for Computing Machinery, New York, NY, USA, 958–970. doi:10.1145/3650212.3680334

work page doi:10.1145/3650212.3680334 2024
[41]

Iflaah Salman, Ayse Tosun Misirli, and Natalia Juristo. 2015. Are Students Representatives of Professionals in Software Engineering Experiments?. InProceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE ’15). IEEE Press, Florence, Italy, 666–676

work page 2015
[42]

Yunpeng Song, Yiheng Bian, Yongtao Tang, Guiyu Ma, and Zhongmin Cai. 2024. VisionTasker: Mobile Task Automation Using Vision Based UI Understanding and LLM Task Planning. InProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology. ACM, Pittsburgh PA USA, 1–17. arXiv:2312.11190 doi:10.1145/3654777.3676386

work page doi:10.1145/3654777.3676386 2024
[43]

Ting Su, Guozhu Meng, Yuting Chen, Ke Wu, Weiming Yang, Yao Yao, Geguang Pu, Yang Liu, and Zhendong Su. 2017. Guided, Stochastic Model-Based GUI Testing of Android Apps. InProceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 245–256. doi:10.1145/3106237.3106298

work page doi:10.1145/3106237.3106298 2017
[44]

Ting Su, Yichen Yan, Jue Wang, Jingling Sun, Yiheng Xiong, Geguang Pu, Ke Wang, and Zhendong Su. 2021. Fully Automated Functional Fuzzing of Android Apps for Detecting Non-Crashing Logic Bugs.Proc. ACM Program. Lang.5, OOPSLA (Oct. 2021), 156:1–156:31. doi:10.1145/3485533

work page doi:10.1145/3485533 2021
[45]

Jingling Sun, Ting Su, Jiayi Jiang, Jue Wang, Geguang Pu, and Zhendong Su. 2023. Property-Based Fuzzing for Finding Data Manipulation Errors in Android Apps. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023). Association for Computing Machinery, New York, N...

work page doi:10.1145/3611643.3616286 2023
[46]

Bryan Wang, Gang Li, and Yang Li. 2023. Enabling Conversational Interaction with Mobile UI Using Large Language Models. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3544548.3580895

work page doi:10.1145/3544548.3580895 2023
[47]

Jue Wang, Yanyan Jiang, Ting Su, Shaohua Li, Chang Xu, Jian Lu, and Zhendong Su. 2022. Detecting Non-Crashing Functional Bugs in Android Apps via Deep-State Differential Analysis. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022). Association for Computing ...

work page doi:10.1145/3540250.3549170 2022
[48]

Jue Wang, Yanyan Jiang, Chang Xu, Chun Cao, Xiaoxing Ma, and Jian Lu. 2020. ComboDroid: Generating High-Quality Test Inputs for Android Apps via Use Case Combinations. InProceedings of the ACM/IEEE 42nd International Conference on Software Engineering (ICSE ’20). Association for Computing Machinery, New York, NY, USA, 469–480. doi:10.1145/3377811.3380382

work page doi:10.1145/3377811.3380382 2020
[49]

Zhenhailong Wang, Haiyang Xu, Junyang Wang, Xi Zhang, Ming Yan, Ji Zhang, Fei Huang, and Heng Ji. 2025. Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks. arXiv:2501.11733 [cs] doi:10.48550/arXiv.2501.11733

work page doi:10.48550/arxiv.2501.11733 2025
[50]

Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. 2024. AutoDroid: LLM-powered Task Automation in Android. InProceedings of the 30th Annual International Conference on Mobile Computing and Networking. ACM, Manuscript submitted to ACM LDMDroid: Leveraging LLMs for Detecting Data ...

work page doi:10.1145/3636534.3649379 2024
[51]

Hao Wen, Hongming Wang, Jiaxuan Liu, and Yuanchun Li. 2024. DroidBot-GPT: GPT-powered UI Automation for Android. arXiv:2304.07061 doi:10.48550/arXiv.2304.07061

work page doi:10.48550/arxiv.2304.07061 2024
[52]

Yiheng Xiong, Ting Su, Jue Wang, Jingling Sun, Geguang Pu, and Zhendong Su. 2024. General and Practical Property-based Testing for Android Apps. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering (ASE ’24). Association for Computing Machinery, New York, NY, USA, 53–64. doi:10.1145/3691620.3694986

work page doi:10.1145/3691620.3694986 2024
[53]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen

work page
[54]

A Survey of Large Language Models

A Survey of Large Language Models. arXiv:2303.18223 [cs] doi:10.48550/arXiv.2303.18223

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223
[55]

Yu Zhao, Tingting Yu, Ting Su, Yang Liu, Wei Zheng, Jingzhi Zhang, and William G. J. Halfond. 2019. ReCDroid: Automatically Reproducing Android Application Crashes from Bug Reports. InProceedings of the 41st International Conference on Software Engineering (ICSE ’19). IEEE Press, Montreal, Quebec, Canada, 128–139. doi:10.1109/ICSE.2019.00030

work page doi:10.1109/icse.2019.00030 2019
[56]

Zhipu. 2025. Zhipu AI. https://www.zhipuai.cn/en Manuscript submitted to ACM

work page 2025