{"total":13,"items":[{"citing_arxiv_id":"2606.19644","ref_index":12,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Prompt Quality and Pull Request Outcomes: A Stage-Based Empirical Study of LLM-Assisted Development","primary_cat":"cs.SE","submitted_at":"2026-06-17T22:50:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Specificity and Context predict actionable code generation while Verification predicts adoption and Context predicts integration depth in LLM-assisted PR workflows.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.17094","ref_index":60,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LogCopilot: Automating Log Aggregation Analysis through Large Language Models","primary_cat":"cs.SE","submitted_at":"2026-06-13T15:47:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LogCopilot is an LLM framework that builds a hierarchical knowledge base from logs and generates/executes LogQL queries from natural language instructions, reporting 76.8% average accuracy across four datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05986","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AttackPathGNN: Cross-function vulnerability detection in smart contracts using state interference graphs and conjunction pooling","primary_cat":"cs.CR","submitted_at":"2026-06-04T10:30:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"AttackPathGNN introduces a State Interference Graph and conjunction pooling inside a GNN to detect cross-function vulnerabilities in Solidity contracts, reporting 92.3% F1 on SmartBugs Wild.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03387","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Bastet: A Fine-Grained Expert-Labeled Dataset for DeFi Smart Contract Vulnerability Detection","primary_cat":"cs.CR","submitted_at":"2026-06-02T09:31:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Bastet is an expert-labeled dataset for DeFi smart contract vulnerabilities drawn from 2021-2024 Code4rena audits, using consensus annotation and a fine-grained two-layer taxonomy to address gaps in prior datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.20473","ref_index":78,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Code Generation by Differential Test Time Scaling","primary_cat":"cs.SE","submitted_at":"2026-05-19T20:39:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.09573","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ConCovUp: Effective Agent-Based Test Driver Generation for Concurrency Testing","primary_cat":"cs.SE","submitted_at":"2026-05-10T14:37:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"ConCovUp uses static analysis to ground LLM test generation and backward tracing to produce concurrent test drivers that raise average shared-memory access pair coverage from 36.6% to 68.1% on nine real-world libraries.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"tests. Instead, it leverages the reasoning capabilities of LLMs, guided by static concurrency analysis, to autonomously synthesize concurrent tests from scratch. Finally, unlike techniques focused on exploring the vast space of thread schedules, such as stateless model check- ing [10], controlled concurrency testing [19, 36], and predictive analysis [8, 40, 41], our work does not aim to maximize interleaving diversity directly. Instead, our approach is orthogonal to these methods. By generating semantically valid drivers that reach critical synchronization points,ConCovUpprovides the necessary entry points for fine-grained schedule exploration tools to operate effectively. Neural-Enhanced Symbolic Execution."},{"citing_arxiv_id":"2604.18395","ref_index":22,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Capturing Monetarily Exploitable Vulnerability in Smart Contracts via Auditor Knowledge-Learning Fuzzing","primary_cat":"cs.CR","submitted_at":"2026-04-20T15:17:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"FAUDITOR is a specialized fuzzer that detected 220 zero-day monetarily exploitable vulnerabilities in smart contracts by combining finance-interface targeting, NLP from auditor reports, and self-learning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.14038","ref_index":19,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"KindHML: formal verification of smart contracts based on Hennessy-Milner logic","primary_cat":"cs.CR","submitted_at":"2026-04-15T16:18:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"An encoding of Solidity contracts and first-order Hennessy-Milner logic into Lustre enables Kind 2 model checking of complex temporal properties in smart contracts.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Summing up, the main contributions of the paper include: -CHML, a novel temporal logic that can express complex temporal properties of smart contracts (encompassing, e.g., liquidity and front-running); -afullyautomatedtoolchainthatencodesCHMLpropertiesandanexpressive fragment of Solidity into Lustre [27], and verifies the resulting specification using Kind2 [19]; -an empirical evaluation of the effectiveness of the proposed approach. Our toolchain and experimental data are publicly available on github. Structure.The paper is organized as follows. In Section 2 we describe the sys- tem model. In Section 3 we introduce some use cases, that we use to present CHML (Section 4) and the encoding of CHML to first-order logic Section 5."},{"citing_arxiv_id":"2604.02398","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improving MPI Error Detection and Repair with Large Language Models and Bug References","primary_cat":"cs.SE","submitted_at":"2026-04-02T14:00:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Augmenting LLMs with bug references, few-shot learning, chain-of-thought, and RAG improves MPI error detection accuracy from 44% to 77% and generalizes across models.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"ingful patches [23, 24]. Despite these advances, LLM-based repair remains sensitive to prompt design, computationally expensive, and often requires domain-specific knowledge to produce reliable results [25, 26]. LLMs trained on large code corpora have shown promise in detecting defects and suggesting refinements by jointly reasoning over syntax and con- text [27, 28, 29]. Prompt engineering has further improved performance across defect detection and repair tasks [30, 31]. However, MPI-specific chal- lenges-suchasdeadlocks, raceconditions, andcollectivemismatches-remain underexplored, suggesting that MPI-aware fine-tuning or domain-informed prompting may be necessary to fully realize LLM potential in this domain."},{"citing_arxiv_id":"2604.16359","ref_index":105,"ref_count":4,"confidence":0.9,"is_internal_anchor":false,"paper_title":"LLM4Log: A Systematic Review of Large Language Model-based Log Analysis","primary_cat":"cs.SE","submitted_at":"2026-03-18T20:34:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Systematic review of 145 papers on LLM-based log analysis, providing a unified taxonomy, common design patterns, evaluation practices, and challenges for deployment under drift and limited labels.","context_count":2,"top_context_role":"background","top_context_polarity":"background","context_text":"Supervised parsers and log-specific models (fine-tuning / sequence labeling / pretraining) GPT-2C [145] Offline Sup. Fine-tuned GPT-style parser (Q&A formulation) - GPT-2 F1 LogStamp [165] Online Sup. Sequence labeling for template/parameter tagging - BERT-base RI LogPPT [75] Offline Sup. Prompt-based few-shot token labeling - RoBERTa PA, GA, ED LLMParser [105] Offline Sup. Few-shot fine-tuning for template generation - LLaMA-7B PA, GA Mehrabi et al. [112] Offline Sup. Compact fine-tuning (effectiveness study) - Mistral-7B MLA, ED, F1 OWL [44] Offline Sup. Log foundation model training (IT operations) - OWL RI, F1 PreLog [77] Online Sup. Log foundation model pretraining (log analytics) - PreLog-140M GA, OA, ED, uPA"},{"citing_arxiv_id":"2601.11299","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective","primary_cat":"cs.SE","submitted_at":"2026-01-16T13:54:54+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A survey of 419 practitioners shows strong reliance on reusable GitHub Actions for core CI/CD tasks but limited adoption of reusable workflows, with copy-pasting remaining common due to versioning and trust issues.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.07700","ref_index":34,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes","primary_cat":"cs.SE","submitted_at":"2025-05-12T16:09:33+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Empirical analysis of 338 PRs with self-admitted ChatGPT usage shows low full integration (median 25%), selective adaptation patterns, and broader influence on developer reasoning during reviews.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.15815","ref_index":29,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing","primary_cat":"cs.SE","submitted_at":"2024-08-28T14:24:48+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}