First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Title resolution pending
13 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
MCTM applies method-level change-proneness from version history and call-graph analysis to minimize black-box test suites, reporting 0.93 accuracy and 0.94 fault detection rate on 15 Java projects with 635 buggy versions.
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.
SecureForge audits LLM code for vulnerabilities, builds a synthetic prompt corpus via Markovian sampling, and optimizes system prompts to cut security issues by up to 48% while preserving unit test performance, with zero-shot transfer to real prompts.
DynamicsLLM uses LLMs to generate execution traces that cover three times more code smell-related events than the prior Dynamics tool on 333 F-Droid Android apps, with a hybrid method adding 25.9% coverage for low-activity apps.
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
TreeRanker ranks static code completions by organizing candidates in a prefix tree and collecting token scores via a single greedy language-model decoding pass.
XOXO is a cross-origin context poisoning attack on AI coding assistants that uses a Cayley Graph search algorithm (GCGS) to find stealthy perturbations, achieving 75.72% average success rate across five tasks and eleven models.
Developers most frequently reference the full Log4j migration guide in pull request descriptions (82.81% of cases) and continue consulting it during post-update maintenance tasks.
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
Position paper proposing Model Science as a discipline to systematically analyze AI model behavior beyond benchmarks, drawing analogies from cognitive science, neuroscience, medicine, and agriculture.
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.
citing papers explorer
-
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
-
Method-level Change-proneness: A Better Metric for Black-box Test Suite Minimization
MCTM applies method-level change-proneness from version history and call-graph analysis to minimize black-box test suites, reporting 0.93 accuracy and 0.94 fault detection rate on 15 Java projects with 635 buggy versions.
-
Are We Lost in the Woods? Detecting Silent Semantic Faults for Random Forest Classifiers with Data-informed Static Analysis
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
-
Beyond the Tip of the Iceberg: Understanding SATD in Dockerfiles through the Lens of Co-evolution
Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.
-
SecureForge: Finding and Preventing Vulnerabilities in LLM-Generated Code via Prompt Optimization
SecureForge audits LLM code for vulnerabilities, builds a synthetic prompt corpus via Markovian sampling, and optimizes system prompts to cut security issues by up to 48% while preserving unit test performance, with zero-shot transfer to real prompts.
-
DynamicsLLM: a Dynamic Analysis-based Tool for Generating Intelligent Execution Traces Using LLMs to Detect Android Behavioural Code Smells
DynamicsLLM uses LLMs to generate execution traces that cover three times more code smell-related events than the prior Dynamics tool on 333 F-Droid Android apps, with a hybrid method adding 25.9% coverage for low-activity apps.
-
A Comparative Study of Semantic Log Representations for Software Log-based Anomaly Detection
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
-
TreeRanker: Fast and Model-agnostic Ranking System for Code Suggestions in IDEs
TreeRanker ranks static code completions by organizing candidates in a prefix tree and collecting token scores via a single greedy language-model decoding pass.
-
XOXO: Stealthy Cross-Origin Context Poisoning Attacks against AI Coding Assistants
XOXO is a cross-origin context poisoning attack on AI coding assistants that uses a Cayley Graph search algorithm (GCGS) to find stealthy perturbations, achieving 75.72% average success rate across five tasks and eleven models.
-
How Do Developers Use Migration Guides? A Case Study of Log4j
Developers most frequently reference the full Log4j migration guide in pull request descriptions (82.81% of cases) and continue consulting it during post-update maintenance tasks.
-
Towards Better Static Code Analysis Reports: Sentence Transformer-based Filtering of Non-Actionable Alerts
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
-
The Case for Model Science: Verify, Explore, Steer, Refine
Position paper proposing Model Science as a discipline to systematically analyze AI model behavior beyond benchmarks, drawing analogies from cognitive science, neuroscience, medicine, and agriculture.
-
Security of LLM-generated Code: A Comparative Analysis
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.