LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
hub Canonical reference
Reproducibility of Build Environ - ments through Space and Time
Canonical reference. 100% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
roles
background 4polarities
background 4representative citing papers
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
ML4AVD research remains locked into binary function-level classification of C/C++ vulnerabilities because twelve pain points in the pipeline reinforce each other through feedback loops.
Adding interprocedural context from callers or callees enables LLMs to detect vulnerabilities more effectively, with Gemini 3 Flash achieving F1 scores of at least 0.978 for C at low cost and Claude Haiku 4.5 excelling at explanations.
LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.
bLLMs achieve state-of-the-art results on limited and imbalanced SE sentiment datasets even in zero-shot settings, but fine-tuned sLLMs outperform when ample balanced training data is available.
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
A literature review of Nix's functional package management solutions to software deployment problems alongside the new and unsolved issues it introduces.
citing papers explorer
-
Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
-
A Methodological Analysis of Empirical Studies in Quantum Software Testing
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
-
Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points
ML4AVD research remains locked into binary function-level classification of C/C++ vulnerabilities because twelve pain points in the pipeline reinforce each other through feedback loops.
-
Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs
Adding interprocedural context from callers or callees enables LLMs to detect vulnerabilities more effectively, with Gemini 3 Flash achieving F1 scores of at least 0.978 for C at low cost and Claude Haiku 4.5 excelling at explanations.
-
Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition
LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.
-
A Study of LLMs' Preferences for Libraries and Programming Languages
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
-
"Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programming
LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.
-
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models
bLLMs achieve state-of-the-art results on limited and imbalanced SE sentiment datasets even in zero-shot settings, but fine-tuned sLLMs outperform when ample balanced training data is available.
-
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
-
Nix: A Solution With Problems
A literature review of Nix's functional package management solutions to software deployment problems alongside the new and unsolved issues it introduces.