International Conference on Learning Representations , year=

Self-consistency improves chain of thought reasoning in language models , author=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching

cs.LG · 2026-05-09 · conditional · novelty 7.0

Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.

Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal

cs.CL · 2026-05-10 · unverdicted · novelty 6.0

LLMs detect CoT reasoning errors in hidden states with 0.95 AUROC but cannot use this awareness to correct them via steering, patching, or self-correction, indicating the signal is diagnostic not causal.

Semantic Voting: Execution-Grounded Consensus for LLM Code Generation

cs.SE · 2026-05-09 · unverdicted · novelty 6.0

Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.

citing papers explorer

Showing 3 of 3 citing papers.

Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching cs.LG · 2026-05-09 · conditional · none · ref 5
Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal cs.CL · 2026-05-10 · unverdicted · none · ref 9
LLMs detect CoT reasoning errors in hidden states with 0.95 AUROC but cannot use this awareness to correct them via steering, patching, or self-correction, indicating the signal is diagnostic not causal.
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation cs.SE · 2026-05-09 · unverdicted · none · ref 1
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.

International Conference on Learning Representations , year=

fields

years

verdicts

representative citing papers

citing papers explorer