PrivCode++ introduces the first DP code generation method protecting both prompts and code via latent-conditioned two-stage training, claiming higher utility and stronger privacy than prior baselines.
hub
Codegemma: Open code models based on gemma
18 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.
SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.
Acoda uses a genetic algorithm to optimize eight obfuscation methods that reduce LLM code analysis success rates to as low as 30% while preserving original semantics.
RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.
LLMs frequently reverse their stated coding preferences when shown actual code instead of descriptions, show positional bias, and produce more polarized ratings than human experts on complexity, commenting, modularity, and readability.
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.
LLMs achieve strong initial accuracy on code output prediction but frequently alter their answers under semantics-preserving mutations, with drops up to 70% and flawed reasoning detected in 10-50% of correct cases via human review.
Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
A decoupled four-stage LLM pipeline with rsLoRA, distillation, and CoVe aggregation outperforms larger models on smart contract vulnerability detection and explanation using only 0.6B-4B parameter models.
UA-ChatDev integrates token-level uncertainty estimation and phase-aware verification into multi-agent software development and reports better benchmark scores than prior frameworks.
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.
citing papers explorer
-
PrivCode++: Latent-Conditioned Differentially Private Code Generation for Comprehensive Guarantees
PrivCode++ introduces the first DP code generation method protecting both prompts and code via latent-conditioned two-stage training, claiming higher utility and stronger privacy than prior baselines.
-
PromptCOS: Towards Content-only System Prompt Copyright Auditing for LLMs
PromptCOS is a content-only watermarking method for LLM system prompts that embeds detectable cyclic signals via auxiliary tokens while preserving fidelity and resisting removal attacks.
-
SrDetection: A Self-Referential Framework for Data Leakage Detection in Code Large Language Models
SrDetection detects data leakage in Code LLMs via contrast between original benchmark samples and their semantic variants, reporting F1 gains of 21.52 (gray-box) and 14.46 (black-box) over baselines in a controlled testbed.
-
Acoda: Adversarial Code Obfuscation for Defending against LLM-based Analysis
Acoda uses a genetic algorithm to optimize eight obfuscation methods that reduce LLM code analysis success rates to as low as 30% while preserving original semantics.
-
Efficient Skill Grounding via Code Refactoring with Small Language Models
RECENT decouples skill semantics from embodiment-specific bindings via code refactoring to let small language models achieve skill grounding performance matching large language model baselines.
-
Subjective Code Preferences in Experts and Large Language Models
LLMs frequently reverse their stated coding preferences when shown actual code instead of descriptions, show positional bias, and produce more polarized ratings than human experts on complexity, commenting, modularity, and readability.
-
SynConfRoute: Syntax-Aware Routing for Efficient Code Completion with Small CodeLLMs
SynConfRoute routes code completions using syntax validation and token confidence, improving pass@1 by up to 31% on hard tasks and reducing accelerator usage by 58% versus always using the largest model.
-
Co-Located Tests, Better AI Code: How Test Syntax Structure Affects Foundation Model Code Generation
Co-locating tests with implementation code yields substantially higher preservation and correctness in foundation-model-generated programs than separated test syntax.
-
RefineStat: Efficient Exploration for Probabilistic Program Synthesis
RefineStat improves small language model performance on probabilistic program synthesis by adding semantic constraint enforcement and diagnostic-aware refinement, producing syntactically and statistically reliable code that often matches larger models.
-
MultiFileTest: A Multi-File-Level LLM Unit Test Generation Benchmark and Impact of Error Fixing Mechanisms
Frontier LLMs achieve only moderate performance on multi-file unit test generation, with basic executability and cascade errors common, but manual and self-error-fixing mechanisms yield measurable gains.
-
Training Language Models to Self-Correct via Reinforcement Learning
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
-
LLM Evolution as an Industry-Scale Ecosystem: A Lifecycle Perspective on Continual Learning
The paper reformulates industrial continual learning for LLMs as a closed-loop ecosystem problem, identifies three core challenges, and organizes solutions around five lifecycle design principles.
-
Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation
A decoupled four-stage LLM pipeline with rsLoRA, distillation, and CoVe aggregation outperforms larger models on smart contract vulnerability detection and explanation using only 0.6B-4B parameter models.
-
UA-ChatDev: Uncertainty-Aware Multi-Agent Collaboration for Reliable Software Development
UA-ChatDev integrates token-level uncertainty estimation and phase-aware verification into multi-agent software development and reports better benchmark scores than prior frameworks.
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini 2.5 Pro and Flash models are presented as achieving frontier performance in reasoning, coding, and long-context multimodal tasks while spanning a cost-capability Pareto curve.
-
Are Decoder-Only Large Language Models the Silver Bullet for Code Search?
Fine-tuned decoder-only LLMs achieve up to 40.4% higher MAP than UniXcoder on CoSQA+ for code search, with non-monotonic size scaling and data composition sensitivity.
-
mcdok at SemEval-2026 Task 13: Finetuning LLMs for Detection of Machine-Generated Code
Fine-tuning LLMs by adapting the mdok approach produces competitive results on binary detection, source attribution, and hybrid/adversarial code identification in SemEval-2026 Task 13.