super hub Canonical reference

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Daya Guo, Dejian Yang, Kai Dong, Qihao Zhu, Wentao Zhang, Zhenda Xie · 2024 · cs.SE · arXiv 2401.14196

Canonical reference. 74% of citing Pith papers cite this work as background.

140 Pith papers citing it

Background 74% of classified citations

open full Pith review browse 140 citing papers more from Daya Guo arXiv PDF

abstract

The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-trained on a high-quality project-level code corpus and employ a fill-in-the-blank task with a 16K window to enhance code generation and infilling. Our extensive evaluations demonstrate that DeepSeek-Coder not only achieves state-of-the-art performance among open-source code models across multiple benchmarks but also surpasses existing closed-source models like Codex and GPT-3.5. Furthermore, DeepSeek-Coder models are under a permissive license that allows for both research and unrestricted commercial use.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 30 method 6 dataset 2 baseline 1

citation-polarity summary

background 29 use method 6 use dataset 2 baseline 1 support 1

claims ledger

abstract The rapid development of large language models has revolutionized code intelligence in software development. However, the predominance of closed-source models has restricted extensive research and development. To address this, we introduce the DeepSeek-Coder series, a range of open-source code models with sizes from 1.3B to 33B, trained from scratch on 2 trillion tokens. These models are pre-trained on a high-quality project-level code corpus and employ a fill-in-the-blank task with a 16K window to enhance code generation and infilling. Our extensive evaluations demonstrate that DeepSeek-Coder

authors

Daya Guo Dejian Yang Kai Dong Qihao Zhu Wentao Zhang Zhenda Xie

co-cited works

representative citing papers

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

cs.AI · 2026-04-15 · conditional · novelty 9.0

AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.

Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing

cs.CR · 2026-04-07 · unverdicted · novelty 8.0

The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.

The Alignment Problem in Constrained Code Generation

cs.SE · 2026-06-19 · unverdicted · novelty 7.0

Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.

Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation

cs.LG · 2026-06-02 · unverdicted · novelty 7.0

Using frontier models to synthesize plausible-but-wrong FIM completions as hard negatives for SFT improves Delulu exact match by +18.8 and edit similarity by +0.22 on Qwen2.5-Coder-7B while also lifting HumanEval-Infilling and SAFIM.

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.

Constrained Code Generation with Discrete Diffusion

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.

Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

cs.AR · 2026-05-13 · unverdicted · novelty 7.0

Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.

StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning

cs.SE · 2026-05-12 · unverdicted · novelty 7.0

StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.

MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation

cs.GR · 2026-05-09 · unverdicted · novelty 7.0

MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting output length by 75-85%.

PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models

cs.SE · 2026-04-30 · unverdicted · novelty 7.0

PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.

RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates

cs.SE · 2026-04-29 · unverdicted · novelty 7.0

RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than prior LLM-based tools.

When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation

cs.SE · 2026-04-27 · unverdicted · novelty 7.0

Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.

Aligned Multi-View Scripts for Universal Chart-to-Code Generation

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

Introduces an aligned multi-language dataset and a language-conditioned low-rank adapter for generating executable plotting code in Python, R, and LaTeX from chart images.

PlayCoder: Making LLM-Generated GUI Code Playable

cs.SE · 2026-04-21 · conditional · novelty 7.0

PlayCoder raises the rate of LLM-generated GUI apps that can be played end-to-end without logic errors from near zero to 20.3% Play@3 by adding repository-aware generation, agent-driven testing, and iterative repair.

Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

cs.AI · 2026-04-16 · unverdicted · novelty 7.0

IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.

Evaluating LLMs Code Reasoning Under Real-World Context

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

R2Eval is a new benchmark with 135 real-world code reasoning problems from Python projects that preserves complex data structures for more realistic LLM evaluation.

Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code

cs.SE · 2026-04-14 · unverdicted · novelty 7.0

CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.

An Iterative Test-and-Repair Framework for Competitive Code Generation

cs.SE · 2026-04-07 · unverdicted · novelty 7.0

FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.

Think Anywhere in Code Generation

cs.SE · 2026-03-31 · unverdicted · novelty 7.0

Think-Anywhere lets LLMs invoke on-demand reasoning at any token during code generation via cold-start imitation followed by outcome-based RL, reaching state-of-the-art results on LeetCode, LiveCodeBench, HumanEval, and MBPP.

Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution

cs.SE · 2026-02-27 · unverdicted · novelty 7.0

IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.

Efficient Remote KV Cache Reuse with GPU-native Video Codec

cs.DC · 2026-02-10 · conditional · novelty 7.0

KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.

citing papers explorer

Showing 50 of 140 citing papers.

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot cs.AI · 2026-04-15 · conditional · none · ref 6 · internal anchor
AI reviews for all 22,977 AAAI-26 papers were preferred by authors and PC members over human reviews on accuracy and suggestions and outperformed baselines at spotting weaknesses.
Hackers or Hallucinators? A Comprehensive Analysis of LLM-Based Automated Penetration Testing cs.CR · 2026-04-07 · unverdicted · none · ref 40 · internal anchor
The first SoK on LLM-based AutoPT frameworks provides a six-dimension taxonomy of agent designs and a unified empirical benchmark evaluating 15 frameworks via over 10 billion tokens and 1,500 manually reviewed logs.
The Alignment Problem in Constrained Code Generation cs.SE · 2026-06-19 · unverdicted · none · ref 17 · internal anchor
Incomplete constrainers in constrained decoding push LLMs into low-probability program regions, making unconstrained decoding outperform constrained decoding on functional correctness across seven models and three benchmarks.
Synthetic Hallucinations, Real Gains: Hard Negatives from Frontier Models for FIM Hallucination Mitigation cs.LG · 2026-06-02 · unverdicted · none · ref 7 · internal anchor
Using frontier models to synthesize plausible-but-wrong FIM completions as hard negatives for SFT improves Delulu exact match by +18.8 and edit similarity by +0.22 on Qwen2.5-Coder-7B while also lifting HumanEval-Infilling and SAFIM.
CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 9 · internal anchor
CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.
Constrained Code Generation with Discrete Diffusion cs.CL · 2026-05-16 · unverdicted · none · ref 24 · internal anchor
Constrained Diffusion for Code (CDC) integrates constraint satisfaction into the reverse denoising process of discrete diffusion models via constraint-aware operators that use optimization and program analysis to steer generation toward feasible programs.
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench cs.AR · 2026-05-13 · unverdicted · none · ref 30 · internal anchor
Phoenix-bench shows agentic AI systems lose 37-58% resolved rate when moving from SWE-bench Verified to hardware tasks because bugs spread across parallel modules via signal flow, with testbench feedback lifting performance by 42-45% while file-level oracles add only 1.4%.
StepCodeReasoner: Aligning Code Reasoning with Stepwise Execution Traces via Reinforcement Learning cs.SE · 2026-05-12 · unverdicted · none · ref 23 · internal anchor
StepCodeReasoner aligns code reasoning with verifiable stepwise execution traces via print anchors and bi-level GRPO reinforcement learning, reaching SOTA results on CRUXEval (91.1%) and LiveCodeBench (86.5%) for a 7B model.
MeshFIM: Local Low-Poly Mesh Editing via Fill-in-the-Middle Autoregressive Generation cs.GR · 2026-05-09 · unverdicted · none · ref 28 · internal anchor
MeshFIM enables local low-poly mesh editing by autoregressively filling target regions conditioned on context, using boundary markers, positional embeddings, and a gated geometry encoder to enforce attachment, topology, and region limits.
Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients cs.CL · 2026-05-07 · unverdicted · none · ref 66 · internal anchor
POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.
Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs cs.LG · 2026-05-06 · unverdicted · none · ref 14 · internal anchor
Fine-tuned 7B LLMs generating unified diffs for neural architecture refinement achieve 66-75% valid rates and 64-66% mean first-epoch accuracy, outperforming full-generation baselines by large margins while cutting output length by 75-85%.
PuzzleMark: Implicit Jigsaw Learning for Robust Code Dataset Watermarking in Neural Code Completion Models cs.SE · 2026-04-30 · unverdicted · none · ref 9 · internal anchor
PuzzleMark provides a robust and imperceptible watermarking method for code datasets using adaptive variable name concatenation and statistical verification, achieving perfect detection rates with minimal performance impact.
RepoDoc: A Knowledge Graph-Based Framework to Automatic Documentation Generation and Incremental Updates cs.SE · 2026-04-29 · unverdicted · none · ref 14 · internal anchor
RepoDoc uses a repository knowledge graph with module clustering and semantic impact propagation to generate more complete documentation 3x faster with 85% fewer tokens and handle incremental updates 73% faster than prior LLM-based tools.
When Prompt Under-Specification Improves Code Correctness: An Exploratory Study of Prompt Wording and Structure Effects on LLM-Based Code Generation cs.SE · 2026-04-27 · unverdicted · none · ref 14 · internal anchor
Structurally rich task descriptions make LLMs robust to prompt under-specification, and under-specification can enhance code correctness by disrupting misleading lexical or structural cues.
Aligned Multi-View Scripts for Universal Chart-to-Code Generation cs.CL · 2026-04-27 · unverdicted · none · ref 1 · internal anchor
Introduces an aligned multi-language dataset and a language-conditioned low-rank adapter for generating executable plotting code in Python, R, and LaTeX from chart images.
PlayCoder: Making LLM-Generated GUI Code Playable cs.SE · 2026-04-21 · conditional · none · ref 14 · internal anchor
PlayCoder raises the rate of LLM-generated GUI apps that can be played end-to-end without logic errors from near zero to 20.3% Play@3 by adding repository-aware generation, agent-driven testing, and iterative repair.
Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing cs.SE · 2026-04-21 · unverdicted · none · ref 22 · internal anchor
A cascaded large-small model system generates edit sketches with the large model and applies them with the small model to make code editing both accurate and token-efficient.
IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning cs.AI · 2026-04-16 · unverdicted · none · ref 5 · internal anchor
IG-Search computes step-level information gain rewards from policy probabilities to improve credit assignment in RL training for search-augmented QA, yielding 1.6-point gains over trajectory-level baselines on multi-hop tasks.
Evaluating LLMs Code Reasoning Under Real-World Context cs.SE · 2026-04-14 · unverdicted · none · ref 10 · internal anchor
R2Eval is a new benchmark with 135 real-world code reasoning problems from Python projects that preserves complex data structures for more realistic LLM evaluation.
Structural Anchors and Reasoning Fragility:Understanding CoT Robustness in LLM4Code cs.SE · 2026-04-14 · unverdicted · none · ref 2 · internal anchor
CoT prompting in LLM4Code shows mixed robustness that depends on model family, task structure, and perturbations destabilizing structural anchors, leading to trajectory deformations like lengthening, branching, and simplification.
An Iterative Test-and-Repair Framework for Competitive Code Generation cs.SE · 2026-04-07 · unverdicted · none · ref 17 · internal anchor
FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.
Think Anywhere in Code Generation cs.SE · 2026-03-31 · unverdicted · none · ref 7 · internal anchor
Think-Anywhere lets LLMs invoke on-demand reasoning at any token during code generation via cold-start imitation followed by outcome-based RL, reaching state-of-the-art results on LeetCode, LiveCodeBench, HumanEval, and MBPP.
Steerable Instruction Following Coding Data Synthesis with Actor-Parametric Schema Co-Evolution cs.SE · 2026-02-27 · unverdicted · none · ref 9 · internal anchor
IFCodeEvolve synthesizes coding data via actor-schema co-evolution with MCTS, boosting a 32B model's performance to match proprietary SOTA on instruction following.
Efficient Remote KV Cache Reuse with GPU-native Video Codec cs.DC · 2026-02-10 · conditional · none · ref 32 · internal anchor
KVCodec uses GPU-native video codecs and pipelined fetching to compress and transmit KV caches, delivering up to 3.51x faster TTFT than prior methods while preserving accuracy.
"Tab, Tab, Bug": Security Pitfalls of Next Edit Suggestions in AI-Integrated IDEs cs.CR · 2026-02-06 · conditional · none · ref 18 · internal anchor
NES systems in AI IDEs expand attack surfaces via context poisoning from imperceptible actions and global codebase retrieval, with professional developers largely unaware of the risks.
RACC: Representation-Aware Coverage Criteria for LLM Safety Testing cs.SE · 2026-02-02 · unverdicted · none · ref 21 · internal anchor
RACC defines six representation-aware coverage criteria that score jailbreak test suites by measuring activation of safety concepts extracted from LLM hidden states on a calibration set.
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding cs.CL · 2026-02-02 · unverdicted · none · ref 41 · internal anchor
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
Can LLMs Compress (and Decompress)? Evaluating Code Understanding and Execution via Invertibility cs.LG · 2026-01-19 · unverdicted · none · ref 1 · internal anchor
LLMs lack internal coherence for reliable bidirectional code reasoning, as they fail round-trip consistency on compression tasks even after training.
In Line with Context: Repository-Level Code Generation via Context Inlining cs.SE · 2026-01-01 · unverdicted · none · ref 16 · internal anchor
InlineCoder reframes repository-level code generation as function-level coding by using a draft anchor to inline the target function into its call graph for upstream usage and downstream dependency context.
MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning cs.CL · 2025-11-04 · unverdicted · none · ref 5 · internal anchor
MemSearcher trains LLMs to manage compact memory in multi-turn searches via multi-context GRPO for end-to-end RL, outperforming ReAct-style baselines with stable token counts.
EyeMulator: Improving Code Language Models by Mimicking Human Visual Attention cs.SE · 2025-08-22 · unverdicted · none · ref 17 · internal anchor
EyeMulator augments CodeLLM fine-tuning loss with token weights derived from human eye-tracking scan paths, producing large gains on code translation and summarization across StarCoder, Llama-3.2 and DeepSeek-Coder.
Guidelines for Empirical Studies in Software Engineering involving Large Language Models cs.SE · 2025-08-21 · accept · none · ref 49 · 2 links · internal anchor
The paper delivers a taxonomy of seven LLM study types in software engineering along with eight guidelines that separate mandatory requirements from recommended practices to address reproducibility challenges.
VocabTailor: Dynamic Vocabulary Selection for Downstream Tasks in Small Language Models cs.CL · 2025-08-21 · unverdicted · none · ref 8 · internal anchor
VocabTailor introduces a decoupled dynamic vocabulary selection framework that reduces vocabulary-related memory in SLMs by up to 99% with minimal task performance loss.
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training cs.LG · 2025-07-21 · unverdicted · none · ref 9 · internal anchor
An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering cs.SE · 2024-05-06 · unverdicted · none · ref 14 · internal anchor
SWE-agent introduces a custom agent-computer interface that lets LM agents solve software engineering tasks, reaching 12.5% pass@1 on SWE-bench and 87.7% on HumanEvalFix, exceeding prior non-interactive approaches.
Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents cs.PL · 2026-06-11 · unverdicted · none · ref 6 · internal anchor
FCGraft synthesizes code policies for embodied agents by grafting KV caches from a library of validated functions, claiming 18.31% higher success rate and 2.3x faster synthesis than prompt-level caching.
Lost in the Flow with Code Talkers: Unveiling the Instruction-Tuning Tax of Large Language Models in Code Tasks cs.SE · 2026-06-07 · unverdicted · none · ref 20 · internal anchor
Empirical study finds instruction tuning on CodeLLMs improves instruction following at the expense of infilling performance, termed the Instruction-Tuning Tax.
MailoHLS: Multi-Adapter Structure-Aware Learning for Pareto-Driven HLS Pragma Optimization cs.AR · 2026-06-05 · unverdicted · none · ref 32 · internal anchor
MailoHLS combines LLM semantic reasoning and GNN structural modeling with multi-adapter PEFT and Pareto optimization to produce near-Pareto-optimal HLS pragma configurations, reporting up to 12.42x latency speedup on seen kernels and 10.2x on unseen ones.
UniRTL: Unifying Code and Graph for Robust RTL Representation Learning cs.LG · 2026-05-29 · unverdicted · none · ref 6 · internal anchor
UniRTL unifies RTL code and CDFG through mutual masked modeling and hierarchical training with a graph-aware tokenizer, outperforming prior single-modality methods on performance prediction and code retrieval.
Physics-in-the-Loop: A Hybrid Agentic Architecture for Validated CAD Engineering Design cs.CV · 2026-05-19 · unverdicted · none · ref 66 · internal anchor
A hybrid agentic architecture integrates knowledge-based physical verification tools into LLM-driven CAD design loops, producing more complex and functionally valid designs than prior agentic baselines.
Enhancing the Code Reasoning Capabilities of LLMs via Consistency-based Reinforcement Learning cs.LG · 2026-05-18 · unverdicted · none · ref 15 · internal anchor
CodeThinker improves LLM code reasoning via consistency-based RL with stepwise training data, dynamic beam sampling, and consistency rewards, reaching SOTA on benchmarks with 4.3% gains on Qwen2.5-Coder-7B.
Contextualized Code Pretraining for Code Generation cs.SE · 2026-05-18 · unverdicted · none · ref 19 · internal anchor
Introduces contextualized code pretraining with caller-callee pairs from static analysis to train CallerGen models that outperform baselines on the new CallerEval benchmark.
MemRepair: Hierarchical Memory for Agentic Repository-Level Vulnerability Repair cs.SE · 2026-05-17 · conditional · none · ref 15 · internal anchor
MemRepair is a hierarchical memory-augmented agent framework that raises repository-level vulnerability repair rates to 58.0-58.2% on Python/Go/JS benchmarks and 30.58% on C++ by combining history, pattern, and refinement memories with iterative feedback.
Task Abstention for Large Language Models in Code Generation cs.SE · 2026-05-16 · unverdicted · none · ref 7 · internal anchor
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
XSearch: Explainable Code Search via Concept-to-Code Alignment cs.SE · 2026-05-15 · unverdicted · none · ref 24 · internal anchor
XSearch achieves explainable code search by breaking queries into functional concepts and matching them directly to code statements, delivering large gains on out-of-distribution benchmarks.
See Before You Code: Learning Visual Priors for Spatially Aware Educational Animation Generation cs.AI · 2026-05-15 · unverdicted · none · ref 28 · internal anchor
OmniManim improves render quality in educational animation code generation by using a Vision Agent with coarse-to-fine bounding-box denoising and interpolation-aware optimization on new datasets.
Revisiting DAgger in the Era of LLM-Agents cs.LG · 2026-05-13 · conditional · none · ref 15 · internal anchor
DAgger-style training with turn-level policy interpolation raises 4B and 8B LLM agents to 27.3% and 29.8% on SWE-bench Verified, beating several larger published systems.
Uncertainty Quantification for LLM-based Code Generation cs.SE · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.
Securing the Dark Matter: A Semantic-Enhanced Neuro-Symbolic Framework for Supply Chain Analysis of Opaque Industrial Software cs.SE · 2026-05-08 · unverdicted · none · ref 14 · internal anchor
A neuro-symbolic framework reconstructs semantics from opaque binaries via abstract interpretation, reflexive LLM prompting, typed knowledge graphs, and Graphormer reasoning to outperform baselines in vulnerability detection and APT matching for industrial control systems.
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs cs.SE · 2026-05-06 · unverdicted · none · ref 18 · internal anchor
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.

DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer