Process-driven image generation decomposes text-to-image synthesis into interleaved cycles of textual planning, visual drafting, textual reflection, and visual refinement with dense consistency supervision.
arXiv preprint arXiv:2205.09712 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Process supervision significantly outperforms outcome supervision for training models on the MATH dataset, achieving 78% accuracy on a representative test subset with active learning and a released 800k step-label dataset.
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.
Targeting logical connectives with gradient steering, localized branching, and transition optimization improves LLM reasoning chain stability and efficiency.
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
citing papers explorer
-
Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning
Process-driven image generation decomposes text-to-image synthesis into interleaved cycles of textual planning, visual drafting, textual reflection, and visual refinement with dense consistency supervision.
-
Let's Verify Step by Step
Process supervision significantly outperforms outcome supervision for training models on the MATH dataset, achieving 78% accuracy on a representative test subset with active learning and a released 800k step-label dataset.
-
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
-
Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA
Temporal reasoning is not the core bottleneck for LLMs on time-based QA; the real issue is unstructured text-to-event mapping, addressed by a neuro-symbolic system with PIS that reaches 100% accuracy on benchmarks when representations are correct.
-
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Chain-of-thought prompting enables large language models to surpass average human performance on 17 of 23 challenging BIG-Bench tasks.
-
Where Reasoning Breaks: Logic-Aware Path Selection by Controlling Logical Connectives in LLMs Reasoning Chains
Targeting logical connectives with gradient steering, localized branching, and transition optimization improves LLM reasoning chain stability and efficiency.
-
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.