hub

arXiv preprint arXiv:2103.06333 (2021)

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang · 2021 · arXiv 2103.06333

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

cs.SE · 2026-05-04 · accept · novelty 7.0 · 2 refs

LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.

From Task to Tutorial: An Automated GUI Framework for Excel Tutorial Document and Video Creation

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

An AI framework automates Excel tutorial and video creation from task descriptions via an Execution Agent, achieving 8.5% higher task success and 1/20th the authoring time of experts.

AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

cs.CL · 2023-12-20 · accept · novelty 7.0

A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

cs.SE · 2023-05-20 · unverdicted · novelty 7.0

LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.

Large Language Models for Multi-Lingual Equivalent Mutant Detection: An Extended Empirical Study

cs.SE · 2026-07-01 · unverdicted · novelty 6.0

LLM-based methods achieve higher F1-scores than traditional approaches for equivalent mutant detection in Java and C, with fine-tuned code embeddings performing best and showing cross-lingual generalization.

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

cs.SE · 2025-11-07 · unverdicted · novelty 6.0

Student models distilled from code language models often fail to deeply mimic teachers, showing up to 62% behavioral discrepancies and 285% worse drops under attacks that accuracy metrics miss.

Multi Language Models for On-the-Fly Syntax Highlighting

cs.SE · 2025-10-05 · unverdicted · novelty 6.0

Unified multi-language deep learning model for on-the-fly syntax highlighting using normalization and few-shot learning to support six languages with lower deployment cost.

Prompt Optimization for LLM Code Generation via Reinforcement Learning

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.

PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection

cs.SE · 2026-04-28 · unverdicted · novelty 5.0

Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.

Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code

cs.SE · 2025-08-05 · unverdicted · novelty 5.0

Empirical tests show compressed code language models retain task performance but suffer markedly lower robustness under four standard adversarial attacks.

DeepFWI: Identifying Bug-Sensitive Warnings with Multi-Modal Code-Warning Semantics

cs.SE · 2024-03-24 · conditional · novelty 5.0

DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source projects.

Prompt-Driven Code Summarization: A Systematic Literature Review

cs.SE · 2026-04-16 · unverdicted · novelty 4.0

A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

cs.SE · 2024-10-29 · unverdicted · novelty 4.0

Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.

Can Coding Agents Be General Agents?

cs.SE · 2026-04-10 · unverdicted · novelty 3.0

Coding agents reliably finish simple business tasks in an ERP system but show characteristic failures on complex tasks, with bridging domain logic and code execution as the main bottleneck.

A Survey on Large Language Models for Code Generation

cs.CL · 2024-06-01 · unverdicted · novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

citing papers explorer

Showing 15 of 15 citing papers.

HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair cs.SE · 2026-05-04 · accept · none · ref 2 · 2 links
LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.
From Task to Tutorial: An Automated GUI Framework for Excel Tutorial Document and Video Creation cs.SE · 2025-09-26 · unverdicted · none · ref 3
An AI framework automates Excel tutorial and video creation from task descriptions via an Execution Agent, achieving 8.5% higher task success and 1/20th the authoring time of experts.
AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation cs.CL · 2023-12-20 · accept · none · ref 1
A three-agent loop of code generation, test creation, and execution feedback lifts pass@1 to 96.3% on HumanEval and 91.8% on MBPP for GPT-4 while using roughly half the tokens of prior state-of-the-art.
Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs cs.SE · 2023-05-20 · unverdicted · none · ref 16
LLMs achieve strong results on syntax parsing tasks but show limited and variable performance on dynamic reasoning, with a clear performance hierarchy across model scales.
Large Language Models for Multi-Lingual Equivalent Mutant Detection: An Extended Empirical Study cs.SE · 2026-07-01 · unverdicted · none · ref 3
LLM-based methods achieve higher F1-scores than traditional approaches for equivalent mutant detection in Java and C, with fine-tuned code embeddings performing best and showing cross-lingual generalization.
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher? cs.SE · 2025-11-07 · unverdicted · none · ref 24
Student models distilled from code language models often fail to deeply mimic teachers, showing up to 62% behavioral discrepancies and 285% worse drops under attacks that accuracy metrics miss.
Multi Language Models for On-the-Fly Syntax Highlighting cs.SE · 2025-10-05 · unverdicted · none · ref 11
Unified multi-language deep learning model for on-the-fly syntax highlighting using normalization and few-shot learning to support six languages with lower deployment cost.
Prompt Optimization for LLM Code Generation via Reinforcement Learning cs.SE · 2026-05-18 · unverdicted · none · ref 1
A PPO agent with hybrid actions and test-driven rewards optimizes prompts for code LLMs, raising strict Pass@1 scores on MBPP+, HumanEval+, and APPS over prior methods.
PLMGH: What Matters in PLM-GNN Hybrids for Code Classification and Vulnerability Detection cs.SE · 2026-04-28 · unverdicted · none · ref 2
Controlled experiments show PLM-GNN hybrids improve code tasks over GNN-only baselines, with PLM source having larger impact than GNN backbone.
Model Compression vs. Adversarial Robustness: An Empirical Study on Language Models for Code cs.SE · 2025-08-05 · unverdicted · none · ref 6
Empirical tests show compressed code language models retain task performance but suffer markedly lower robustness under four standard adversarial attacks.
DeepFWI: Identifying Bug-Sensitive Warnings with Multi-Modal Code-Warning Semantics cs.SE · 2024-03-24 · conditional · none · ref 3
DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source projects.
Prompt-Driven Code Summarization: A Systematic Literature Review cs.SE · 2026-04-16 · unverdicted · none · ref 33
A systematic review that categorizes prompting strategies for LLM-based code summarization, assesses their effectiveness, and identifies gaps in research and evaluation practices.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 11
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
Can Coding Agents Be General Agents? cs.SE · 2026-04-10 · unverdicted · none · ref 1
Coding agents reliably finish simple business tasks in an ERP system but show characteristic failures on complex tasks, with bridging domain logic and code execution as the main bottleneck.
A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 7
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

arXiv preprint arXiv:2103.06333 (2021)

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer