hub

Codegemma: Open code models based on gemma

CodeGemma Team, Heri Zhao, Jeffrey Hui, et al · 2024 · arXiv 2406.11409

18 Pith papers cite this work. Polarity classification is still indexing.

18 Pith papers citing it

read on arXiv browse 18 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

PrivCode++: Latent-Conditioned Differentially Private Code Generation for Comprehensive Guarantees

cs.CR · 2026-06-08 · unverdicted · novelty 7.0

PrivCode++ introduces the first DP code generation method protecting both prompts and code via latent-conditioned two-stage training, claiming higher utility and stronger privacy than prior baselines.

PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs

cs.CR · 2025-09-03 · unverdicted · novelty 7.0

PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.

SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.

Acoda: Adversarial Code Obfuscation for Defending against LLM-based Analysis

cs.SE · 2026-06-10 · unverdicted · novelty 6.0

Acoda uses a genetic algorithm to optimize eight obfuscation methods that reduce LLM code analysis success rates to as low as 30% while preserving original semantics.

Efficient Skill Grounding via Code Refactoring with Small Language Models

cs.AI · 2026-06-06 · unverdicted · novelty 6.0

RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.

Subjective Code Preferences in Experts and Large Language Models

cs.HC · 2026-05-24 · unverdicted · novelty 6.0

LLMs frequently reverse their stated coding preferences when shown actual code instead of descriptions, show positional bias, and produce more polarized ratings than human experts on complexity, commenting, modularity, and readability.

SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs

cs.SE · 2026-05-06 · unverdicted · novelty 6.0

SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.

Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation

cs.SE · 2026-04-20 · unverdicted · novelty 6.0

Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.

RefineStat: Efficient Exploration for Probabilistic Program Synthesis

cs.LG · 2025-09-01 · unverdicted · novelty 6.0

RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.

Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations?

cs.SE · 2025-05-15 · conditional · novelty 6.0

LLMs achieve strong initial accuracy on code output prediction but frequently alter their answers under semantics-preserving mutations, with drops up to 70% and flawed reasoning detected in 10-50% of correct cases via human review.

MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms

cs.SE · 2025-02-10 · unverdicted · novelty 6.0

Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.

Training Language Models to Self-Correct via Reinforcement Learning

cs.LG · 2024-09-19 · unverdicted · novelty 6.0

SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.

LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning

cs.LG · 2026-06-12 · unverdicted · novelty 5.0

The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.

Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation

cs.CR · 2026-06-02 · unverdicted · novelty 5.0

A decoupled four-stage LLM pipeline with rsLoRA, distillation, and CoVe aggregation outperforms larger models on smart contract vulnerability detection and explanation using only 0.6B-4B parameter models.

UA-ChatDev: Uncertainty-Aware Multi-Agent Collaboration for Reliable Software Development

cs.AI · 2026-07-02 · unverdicted · novelty 4.0

UA-ChatDev integrates token-level uncertainty estimation and phase-aware verification into multi-agent software development and reports better benchmark scores than prior frameworks.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

cs.CL · 2025-07-07 · unverdicted · novelty 4.0

Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

cs.SE · 2024-10-29 · unverdicted · novelty 4.0

Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.

mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code

cs.LG · 2026-04-23 · unverdicted · novelty 2.0

Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

citing papers explorer

Showing 17 of 17 citing papers after filters.

PrivCode++: Latent-Conditioned Differentially Private Code Generation for Comprehensive Guarantees cs.CR · 2026-06-08 · unverdicted · none · ref 52
PrivCode++ introduces the first DP code generation method protecting both prompts and code via latent-conditioned two-stage training, claiming higher utility and stronger privacy than prior baselines.
PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs cs.CR · 2025-09-03 · unverdicted · none · ref 50
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.
SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models cs.CL · 2026-06-29 · unverdicted · none · ref 19
SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.
Acoda: Adversarial Code Obfuscation for Defending against LLM-based Analysis cs.SE · 2026-06-10 · unverdicted · none · ref 34
Acoda uses a genetic algorithm to optimize eight obfuscation methods that reduce LLM code analysis success rates to as low as 30% while preserving original semantics.
Efficient Skill Grounding via Code Refactoring with Small Language Models cs.AI · 2026-06-06 · unverdicted · none · ref 75
RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.
Subjective Code Preferences in Experts and Large Language Models cs.HC · 2026-05-24 · unverdicted · none · ref 9
LLMs frequently reverse their stated coding preferences when shown actual code instead of descriptions, show positional bias, and produce more polarized ratings than human experts on complexity, commenting, modularity, and readability.
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs cs.SE · 2026-05-06 · unverdicted · none · ref 9
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation cs.SE · 2026-04-20 · unverdicted · none · ref 8
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
RefineStat: Efficient Exploration for Probabilistic Program Synthesis cs.LG · 2025-09-01 · unverdicted · none · ref 46
RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.
MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms cs.SE · 2025-02-10 · unverdicted · none · ref 25
Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.
Training Language Models to Self-Correct via Reinforcement Learning cs.LG · 2024-09-19 · unverdicted · none · ref 32
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning cs.LG · 2026-06-12 · unverdicted · none · ref 96
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation cs.CR · 2026-06-02 · unverdicted · none · ref 32
A decoupled four-stage LLM pipeline with rsLoRA, distillation, and CoVe aggregation outperforms larger models on smart contract vulnerability detection and explanation using only 0.6B-4B parameter models.
UA-ChatDev: Uncertainty-Aware Multi-Agent Collaboration for Reliable Software Development cs.AI · 2026-07-02 · unverdicted · none · ref 23
UA-ChatDev integrates token-level uncertainty estimation and phase-aware verification into multi-agent software development and reports better benchmark scores than prior frameworks.
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities cs.CL · 2025-07-07 · unverdicted · none · ref 15
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? cs.SE · 2024-10-29 · unverdicted · none · ref 64
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code cs.LG · 2026-04-23 · unverdicted · none · ref 7
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.

Codegemma: Open code models based on gemma

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer