hub Canonical reference

Code generation with AlphaCodium : From prompt engineering to flow engineering

Tal Ridnik, Dedy Kredo, Itamar Friedman · 2024 · arXiv 2401.08500

Canonical reference. 80% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 1

citation-polarity summary

background 4 use method 1

representative citing papers

Code Generation by Differential Test Time Scaling

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.

Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Rule2DRC is a benchmark for LLM agents synthesizing DRC scripts from natural language rules, paired with SplitTester that improves Best-of-N selection via execution-guided discriminative test generation.

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 2 refs

RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.

Social Bias in LLM-Generated Code: Benchmark and Mitigation

cs.SE · 2026-05-01 · unverdicted · novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.

An Iterative Test-and-Repair Framework for Competitive Code Generation

cs.SE · 2026-04-07 · unverdicted · novelty 7.0

FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.

AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

AgentSlimming compresses graph-structured multi-agent systems by estimating agent importance and removing or replacing low-value agents, cutting token costs by up to 78.9% with negligible performance loss.

You Don't Need Public Tests to Generate Correct Code

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation

cs.SE · 2026-04-06 · unverdicted · novelty 4.0

Compiled AI generates deterministic code artifacts from LLMs in a one-time compilation step, enabling reliable workflow execution with zero runtime tokens after break-even.

Large Language Model-Based Agents for Software Engineering: A Survey

cs.SE · 2024-09-04 · unverdicted · novelty 4.0

A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.

A Survey on Large Language Models for Code Generation

cs.CL · 2024-06-01 · unverdicted · novelty 3.0

A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.

ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows

cs.CV · 2026-05-13

Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning

cs.CL · 2026-04-19

citing papers explorer

Showing 14 of 14 citing papers.

Code Generation by Differential Test Time Scaling cs.SE · 2026-05-19 · unverdicted · none · ref 92
DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.
Rule2DRC: Benchmarking LLM Agents for DRC Script Synthesis with Execution-Guided Test Generation cs.LG · 2026-05-15 · unverdicted · none · ref 16
Rule2DRC is a benchmark for LLM agents synthesizing DRC scripts from natural language rules, paired with SplitTester that improves Best-of-N selection via execution-guided discriminative test generation.
RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement cs.LG · 2026-05-10 · unverdicted · none · ref 41 · 2 links
RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
Social Bias in LLM-Generated Code: Benchmark and Mitigation cs.SE · 2026-05-01 · unverdicted · none · ref 158
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
An Iterative Test-and-Repair Framework for Competitive Code Generation cs.SE · 2026-04-07 · unverdicted · none · ref 42
FixAudit improves LLM code generation on competitive programming benchmarks by training a shared model for iterative code-aware test generation and repair, achieving 35%+ gains in Pass@1 over baselines on the same 7B model.
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution cs.AI · 2026-05-14 · unverdicted · none · ref 6
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
AgentSlimming: Towards Efficient and Cost-Aware Multi-Agent Systems cs.LG · 2026-05-09 · unverdicted · none · ref 4
AgentSlimming compresses graph-structured multi-agent systems by estimating agent importance and removing or replacing low-value agents, cutting token costs by up to 78.9% with negligible performance loss.
You Don't Need Public Tests to Generate Correct Code cs.SE · 2026-04-23 · unverdicted · none · ref 17
DryRUN lets LLMs create their own test inputs and run internal simulations for self-correcting code generation, matching the performance of test-dependent methods like CodeSIM on LiveCodeBench without public tests or external signals.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 297
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Compiled AI: Deterministic Code Generation for LLM-Based Workflow Automation cs.SE · 2026-04-06 · unverdicted · none · ref 15
Compiled AI generates deterministic code artifacts from LLMs in a one-time compilation step, enabling reliable workflow execution with zero runtime tokens after break-even.
Large Language Model-Based Agents for Software Engineering: A Survey cs.SE · 2024-09-04 · unverdicted · none · ref 89
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
A Survey on Large Language Models for Code Generation cs.CL · 2024-06-01 · unverdicted · none · ref 229
A systematic literature review that organizes recent work on LLMs for code generation into a taxonomy covering data curation, model advances, evaluations, ethics, environmental impact, and applications, with benchmark comparisons.
ProtoMedAgent: Multimodal Clinical Interpretability via Privacy-Aware Agentic Workflows cs.CV · 2026-05-13 · unreviewed · ref 32
Self-Consistency from Only Two Samples: CoT-PoT Ensembling for Efficient LLM Reasoning cs.CL · 2026-04-19 · unreviewed · ref 69

Code generation with AlphaCodium : From prompt engineering to flow engineering

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer