hub

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

Yue Wang, Weishi Wang, Shafiq Joty, Steven C · 2021 · DOI 10.18653/v1/2021.emnlp-main.685

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

open at publisher browse 10 citing papers

hub tools

JSON dossier citing papers JSON publisher DOI

representative citing papers

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution

cs.NE · 2026-05-10 · unverdicted · novelty 7.0

QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.

VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns

cs.CR · 2026-05-03 · unverdicted · novelty 7.0

VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.

Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL

cs.CL · 2026-04-22 · unverdicted · novelty 7.0

Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.

CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

CASCADE finds code-documentation mismatches by running LLM-generated tests from docs and confirming failure only when documentation-derived code succeeds on the same test.

BloombergGPT: A Large Language Model for Finance

cs.LG · 2023-03-30 · conditional · novelty 6.0

BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.

MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection

cs.SE · 2026-05-11 · unverdicted · novelty 5.0

MARGIN reduces geometric distortions in imbalanced vulnerability embeddings by dynamically regularizing margins with von Mises-Fisher concentration estimates and hyperspherical prototypes.

Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection

cs.AI · 2026-05-04 · unverdicted · novelty 5.0

Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.

Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

cs.CR · 2026-04-20 · unverdicted · novelty 5.0

BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.

StarCoder: may the source be with you!

cs.CL · 2023-05-09 · accept · novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

cs.SE · 2026-04-17 · unverdicted · novelty 4.0

LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

citing papers explorer

Showing 10 of 10 citing papers.

Parameter-Efficient Neuroevolution for Diverse LLM Generation: Quality-Diversity Optimization via Prompt Embedding Evolution cs.NE · 2026-05-10 · unverdicted · none · ref 69
QD-LLM evolves prompt embeddings via neuroevolution in a quality-diversity framework, delivering 46% higher coverage and 41% higher QD-score than prior methods on coding and writing benchmarks.
VulKey: Automated Vulnerability Repair Guided by Domain-Specific Repair Patterns cs.CR · 2026-05-03 · unverdicted · none · ref 51
VulKey reaches 31.5% repair accuracy on real C/C++ vulnerabilities by matching hierarchical expert patterns to guide LLM patch generation, beating prior baselines by 7.6%.
Parallel-SFT: Improving Zero-Shot Cross-Programming-Language Transfer for Code RL cs.CL · 2026-04-22 · unverdicted · none · ref 69
Parallel-SFT mixes parallel programs across languages during SFT to produce more transferable RL initializations, yielding better zero-shot generalization to unseen programming languages.
CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation cs.SE · 2026-04-21 · unverdicted · none · ref 44
CASCADE finds code-documentation mismatches by running LLM-generated tests from docs and confirming failure only when documentation-derived code succeeds on the same test.
BloombergGPT: A Large Language Model for Finance cs.LG · 2023-03-30 · conditional · none · ref 124
BloombergGPT is a 50B parameter LLM trained on a 708B token mixed financial and general dataset that outperforms prior models on financial benchmarks while preserving general LLM performance.
MARGIN: Margin-Aware Regularized Geometry for Imbalanced Vulnerability Detection cs.SE · 2026-05-11 · unverdicted · none · ref 30
MARGIN reduces geometric distortions in imbalanced vulnerability embeddings by dynamically regularizing margins with von Mises-Fisher concentration estimates and hyperspherical prototypes.
Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection cs.AI · 2026-05-04 · unverdicted · none · ref 50
Reasoning-oriented knowledge distillation from DeepSeek-R1 plus response stabilization improves reliability and often performance of compact models for cross-language code clone detection on pairs like Python-Java and Rust-Java.
Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective cs.CR · 2026-04-20 · unverdicted · none · ref 44
BPE tokenization creates gibberish bias in CLLMs, causing secrets with high character entropy but low token entropy to be preferentially memorized due to training data distribution shifts.
StarCoder: may the source be with you! cs.CL · 2023-05-09 · accept · none · ref 142
StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning cs.SE · 2026-04-17 · unverdicted · none · ref 24
LLMSniffer improves detection of LLM-generated code on GPTSniffer and Whodunit benchmarks by fine-tuning GraphCodeBERT via two-stage supervised contrastive learning plus preprocessing and MLP classification.

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

hub tools

fields

years

verdicts

representative citing papers

citing papers explorer