Title resolution pending

zhang2023algo APACrefauthors Zhang, K · 2023 · arXiv 2305.14591

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Falsification, Not Exposure: An Internally Preregistered Placebo-Controlled Decomposition of Self-Repair Feedback in Frozen Small Code Models

cs.SE · 2026-06-30 · unverdicted · novelty 7.0

Preregistered placebo-controlled decomposition shows external executable counterexamples drive self-repair gains in small code models more than re-exposure or self-critique.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models

cs.LG · 2025-10-16 · unverdicted · novelty 5.0

GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.

citing papers explorer

Showing 3 of 3 citing papers.

Falsification, Not Exposure: An Internally Preregistered Placebo-Controlled Decomposition of Self-Repair Feedback in Frozen Small Code Models cs.SE · 2026-06-30 · unverdicted · none · ref 49
Preregistered placebo-controlled decomposition shows external executable counterexamples drive self-repair gains in small code models more than re-exposure or self-critique.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 110
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models cs.LG · 2025-10-16 · unverdicted · none · ref 23
GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer