First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
In: 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
14 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
MCTM applies method-level change-proneness from version history and call-graph analysis to minimize black-box test suites, reporting 0.93 accuracy and 0.94 fault detection rate on 15 Java projects with 635 buggy versions.
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
Analysis of SATD in Dockerfiles shows 27% of admissions and 40% of repayments are coupled to non-Dockerfile artifacts, with coupled events repaid faster overall and external dependencies as a key trigger.
SecureForge audits LLM code for vulnerabilities, builds a synthetic prompt corpus via Markovian sampling, and optimizes system prompts to cut security issues by up to 48% while preserving unit test performance, with zero-shot transfer to real prompts.
DynamicsLLM uses LLMs to generate execution traces that cover three times more code smell-related events than the prior Dynamics tool on 333 F-Droid Android apps, with a hybrid method adding 25.9% coverage for low-activity apps.
QTyBERT matches or exceeds BERT-based log anomaly detection effectiveness while reducing embedding generation time to near static word embedding levels.
TreeRanker ranks static code completions by organizing candidates in a prefix tree and collecting token scores via a single greedy language-model decoding pass.
XOXO is a cross-origin context poisoning attack on AI coding assistants that uses a Cayley Graph search algorithm (GCGS) to find stealthy perturbations, achieving 75.72% average success rate across five tasks and eleven models.
Developers most frequently reference the full Log4j migration guide in pull request descriptions (82.81% of cases) and continue consulting it during post-update maintenance tasks.
STAF applies sentence embeddings from transformers to classify SCA findings, reaching 89% F1 and beating prior filters by 11% within projects and 6% across projects.
A PRISMA-guided review of 21 papers shows RL work on C/C++ vulnerabilities focuses on fuzzing rather than detection or localization, proposes a taxonomy, and flags the lack of CFG-based state representations for vulnerable node identification.
Position paper proposing Model Science as a discipline to systematically analyze AI model behavior beyond benchmarks, drawing analogies from cognitive science, neuroscience, medicine, and agriculture.
Empirical evaluation shows that code generated by all seven tested LLMs contains vulnerabilities, the majority of critical or high severity.
citing papers explorer
-
The Case for Model Science: Verify, Explore, Steer, Refine
Position paper proposing Model Science as a discipline to systematically analyze AI model behavior beyond benchmarks, drawing analogies from cognitive science, neuroscience, medicine, and agriculture.