arxiv: 2604.15529 · v3 · submitted 2026-04-16 · 💻 cs.AI

Recognition: no theorem link

LACE: Lattice Attention for Cross-thread Exploration

Yang Li , Zirui Zhang , Yang Liu , Chengzhi Mao

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:01 UTC · model grok-4.3

classification 💻 cs.AI

keywords large language modelsreasoningparallel searchattention mechanismssynthetic datacross-thread interactioninference collaboration

0 comments

The pith

LACE enables parallel reasoning paths in LLMs to interact via cross-thread attention, raising accuracy by over 7 points compared to isolated sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models typically generate multiple reasoning paths in parallel, yet these paths operate independently and often repeat the same mistakes. LACE repurposes the model to add lattice attention that lets concurrent threads share intermediate results and correct one another in real time during inference. Because no natural data shows such collaboration, the authors build a synthetic pipeline that trains models to communicate across threads and fix errors jointly. When tested, the interacting setup beats standard non-communicating parallel search by more than seven accuracy points on reasoning tasks. The work therefore treats parallel trajectories not as separate trials but as a single coordinated exploration process.

Core claim

By adding cross-thread attention to allow reasoning paths to exchange insights and perform mutual error correction, and by training this capability on synthetic collaborative data, large language models achieve substantially higher reasoning accuracy than when the same number of paths are run in isolation.

What carries the argument

Lattice attention mechanism that lets concurrent reasoning threads attend to one another's intermediate states so they can share insights and correct errors during a single forward pass.

If this is right

Parallel search stops wasting effort on identical failure modes because threads can detect and avoid them together.
Inference-time collaboration becomes trainable without requiring naturally occurring multi-agent dialogue data.
Reasoning performance scales with the ability of paths to exchange information rather than only with the number of independent samples.
The same lattice structure can be applied to other multi-path generation tasks such as planning or program synthesis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Architectures that treat multiple generations as an interacting lattice rather than a bag of independent samples may become a standard inference primitive.
Training objectives could be extended to reward not only final answer correctness but also the quality of intermediate cross-thread corrections.
If the synthetic data pipeline generalizes, similar methods could reduce reliance on very large numbers of samples by making each path more informative.

Load-bearing premise

Synthetic data that explicitly teaches cross-thread communication and error correction will produce behavior that transfers to real reasoning problems.

What would settle it

A controlled experiment on standard reasoning benchmarks in which the LACE model shows no accuracy gain or a loss relative to ordinary parallel sampling with the same total compute.

Figures

Figures reproduced from arXiv: 2604.15529 by Chengzhi Mao, Yang Li, Yang Liu, Zirui Zhang.

**Figure 1.** Figure 1: Comparison between standard isolated sampling and our LACE framework. Blue lines represent common causal attention, while purple lines denote our lattice attention. Top: Traditional LLMs perform reasoning in isolation, where these independent threads lack communication, often leading them fail in identical ways. Bottom: LACE introduces Lattice Attention, which enables cross-thread interaction. This archite… view at source ↗

**Figure 2.** Figure 2: Overview of Lattice Attention Layers. t denotes thread index, and l denotes token position. Lattice attention enables cross-thread communication by attending over the outputs of standard attention across different threads at aligned token positions. The resulting cross-thread context is fused back into the main path via a gated residual connection. This design allows each thread to benefit from the reasoni… view at source ↗

**Figure 4.** Figure 4: Efficacy of Learned Self-Selection and Cross-Thread Evaluation. Comparing accuracy across Mean, Voting, and selfpicked Best settings. The significant superiority of the self-selected output over majority voting confirms that the model learns effective internal verification and cross-thread comparison, outperforming simple post-hoc selection. naturally fail to satisfy the two criteria. Overly simple probl… view at source ↗

**Figure 6.** Figure 6: Information flow in the lattice. We visualize how much information from other threads flow to the given thread measuring gate scores at the final layer. When these scores peak in the middle, the model is effectively integrating collective insights during reasoning before narrowing its focus to deliver an independent final answer. , we provide an in-depth analysis of its emergent crossthread mechanisms (W… view at source ↗

**Figure 7.** Figure 7: Cross-Thread Intensity Visualization of LACE Generation. We show an example BEST thread, where token background color visualizes the gate score, reflecting the information absorbed from other threads. High cross-thread intensity is observed at the onset of exploration steps and the final self-assessement process, demonstrating that LACE actively engages in cross-thread collaboration. AIME25 LiveBench Train… view at source ↗

**Figure 8.** Figure 8: Average gate scores of top-20 tokens ranked by mean gate scores across all threads and samples in AIME 25 by Lattice-1.7B. Tokens related to self-assessment and cross-thread evaluation dominate the rankings, indicating that the model increases cross-thread attention during critical reasoning steps. shown in [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 10.** Figure 10: Self-assessment summaries from four parallel threads on an AIME 25 problem. Thread 1 correctly identifies itself as the best solution, while other threads acknowledge their relative positions [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Prompt template for diverse solution generation. The {solution summary} placeholder is populated with cached summaries of previously generated solutions to encourage distinct reasoning paths. Solution Summary Prompt System: You are an expert at abstracting mathematical problem-solving strategies. Your goal is to describe the high-level methodology used in a solution without revealing specific numbers or c… view at source ↗

**Figure 12.** Figure 12: Prompt template for solution summarization. The generated summaries are cached and used to guide diverse solution generation in subsequent iterations. Original prompt is very long and here is a shortened version. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Prompt template for step decomposition. Verbose or erroneous draft reasoning is rewritten into a concise, structured format with 4–8 steps. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Prompt template for solution evaluation. The judge assesses multiple candidate solutions based on correctness, insight, and efficiency, assigning hierarchical labels for training data curation. Original prompt is very long and here is a shortened version. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Inference prompt for parallel reasoning. Each thread is aware of its position among parallel paths and is instructed to leverage cross-thread insights while generating its solution, followed by self-assessment and labeling. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

read the original abstract

Current large language models reason in isolation. Although it is common to sample multiple reasoning paths in parallel, these trajectories do not interact, and often fail in the same redundant ways. We introduce LACE, a framework that transforms reasoning from a collection of independent trials into a coordinated, parallel process. By repurposing the model architecture to enable cross-thread attention, LACE allows concurrent reasoning paths to share intermediate insights and correct one another during inference. A central challenge is the absence of natural training data that exhibits such collaborative behavior. We address this gap with a synthetic data pipeline that explicitly teaches models to communicate and error-correct across threads. Experiments show that this unified exploration substantially outperforms standard parallel search, improving reasoning accuracy by over 7 points. Our results suggest that large language models can be more effective when parallel reasoning paths are allowed to interact.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces LACE, a framework that repurposes LLM architectures with lattice attention to enable cross-thread interactions among parallel reasoning paths, allowing them to share insights and correct errors during inference. It addresses the lack of natural collaborative training data via a synthetic data pipeline designed to teach communication and error-correction across threads, and reports that this approach yields over 7 points higher reasoning accuracy than standard parallel search.

Significance. If the reported gains hold under scrutiny, the work would be significant for LLM reasoning research by demonstrating that enabling interaction among parallel trajectories can outperform independent sampling. The synthetic data pipeline represents a practical attempt to bootstrap the missing collaborative signal, and the lattice attention mechanism offers a concrete architectural change that could be adopted more broadly if shown to be robust.

major comments (2)

[Abstract] Abstract: The central quantitative claim of 'improving reasoning accuracy by over 7 points' is stated without any accompanying experimental details, including the benchmarks or datasets used, the exact baselines (e.g., standard parallel search variants), number of runs, error bars, or ablation results. This information is load-bearing for evaluating whether the improvement is reliable or an artifact of the evaluation setup.
[Methods / Synthetic Data Pipeline] Synthetic data pipeline description: The claim that the pipeline 'explicitly teaches models to communicate and error-correct across threads' and that this generalizes to real tasks lacks specifics on task construction, the distribution of injected errors, the communication protocol or loss terms used to incentivize cross-thread behavior, and any transfer experiments. This directly impacts the weakest assumption that synthetic collaboration will transfer beyond the training distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of LACE's significance and for the constructive feedback on the abstract and synthetic data pipeline. We address each major comment point by point below and have prepared revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The central quantitative claim of 'improving reasoning accuracy by over 7 points' is stated without any accompanying experimental details, including the benchmarks or datasets used, the exact baselines (e.g., standard parallel search variants), number of runs, error bars, or ablation results. This information is load-bearing for evaluating whether the improvement is reliable or an artifact of the evaluation setup.

Authors: We agree that the abstract would be strengthened by including key experimental context for the central claim. In the revised manuscript, we have updated the abstract to specify the benchmarks (GSM8K and MATH), the baseline of independent parallel sampling with majority voting, and that results are reported as averages over 5 runs with standard deviations. The full details on error bars, ablation studies, and statistical significance are already contained in Section 4 and the appendix; we have added an explicit cross-reference in the abstract to direct readers to these sections. revision: yes
Referee: [Methods / Synthetic Data Pipeline] Synthetic data pipeline description: The claim that the pipeline 'explicitly teaches models to communicate and error-correct across threads' and that this generalizes to real tasks lacks specifics on task construction, the distribution of injected errors, the communication protocol or loss terms used to incentivize cross-thread behavior, and any transfer experiments. This directly impacts the weakest assumption that synthetic collaboration will transfer beyond the training distribution.

Authors: We acknowledge that greater specificity on the synthetic pipeline would improve clarity and address concerns about transfer. In the revised manuscript, we have added a dedicated subsection detailing the pipeline: tasks are constructed by generating parallel reasoning threads on arithmetic and logical problems with errors injected uniformly at 20-50% of steps; the lattice attention mechanism serves as the communication protocol; and training incorporates an auxiliary cross-thread correction loss alongside the standard next-token prediction objective. We have also included new transfer experiments showing that models trained solely on the synthetic collaborative data improve accuracy on held-out real benchmarks (without additional fine-tuning). While exhaustive testing across all possible distributions is not feasible within the scope of this work, the reported results provide direct evidence of positive transfer to the evaluated tasks. revision: yes

Circularity Check

0 steps flagged

No circularity detected; architectural proposal and synthetic data pipeline are independent of target metrics.

full rationale

The paper introduces lattice attention for cross-thread interaction and a synthetic data pipeline to induce collaborative reasoning. No equations, derivations, or parameter-fitting steps are described that reduce the claimed accuracy gains to the inputs by construction. The synthetic data is presented as addressing an external absence of collaborative examples rather than redefining or tautologically generating the evaluation targets. No self-citations, uniqueness theorems, or ansatzes are invoked to justify core components. The result is an empirical claim resting on external benchmarks, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Only the abstract is available, so the ledger records the high-level assumptions stated there. The central unverified premise is that synthetic collaborative examples will produce transferable cross-thread behavior on real tasks.

axioms (1)

domain assumption Synthetic data can be constructed that teaches models to communicate and error-correct across reasoning threads in a way that generalizes beyond the synthetic distribution.
Explicitly invoked to solve the 'absence of natural training data' problem described in the abstract.

invented entities (1)

Lattice attention mechanism no independent evidence
purpose: To enable concurrent reasoning paths to share intermediate insights during inference.
Introduced as the core architectural change in the LACE framework.

pith-pipeline@v0.9.0 · 5435 in / 1159 out tokens · 64814 ms · 2026-05-12T04:01:13.040353+00:00 · methodology

Review history (3 revisions) →

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · 17 internal anchors

[1]

De RL : Diverse-exploration reinforcement learning for large language models improves mathematical reasoning

An, C., Qin, Z., Kong, S., and Flamant, C. De RL : Diverse-exploration reinforcement learning for large language models improves mathematical reasoning. Under review at ICLR 2025, 2024. URL https://openreview.net/forum?id=ZIYYeTkZQ4

work page 2025
[2]

Curriculum learning

Bengio, Y., Louradour, J., Collobert, R., and Weston, J. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning, pp.\ 41--48, 2009

work page 2009
[3]

Evaluating Large Language Models Trained on Code

Chen, M. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[4]

Training Verifiers to Solve Math Word Problems

Cobbe, K., Kosaraju, V., Bavarian, M., Chen, M., Jun, H., Kaiser, L., Plappert, M., Tworek, J., Hilton, J., Nakano, R., et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Textworld: A learning environment for text-based games, 2019

C \^o t \'e , M.-A., K \'a d \'a r, \'A ., Yuan, X., Kybartas, B., Barnes, T., Fine, E., Moore, J., Tao, R. Y., Hausknecht, M., El Asri, L., Adada, M., Tay, W., and Trischler, A. Textworld: A learning environment for text-based games. CoRR, abs/1806.11532, 2018

work page arXiv 2018
[6]

Tales: Text-adventure learning environment suite

Cui, C., Yuan, X., Xiao, Z., Ammanabrolu, P., and C \^o t \'e , M.-A. Tales: Text-adventure learning environment suite. arXiv preprint arXiv:2504.14128, 2025. URL https://arxiv.org/abs/2504.14128

work page arXiv 2025
[7]

N., Fan, A., Auli, M., and Grangier, D

Dauphin, Y. N., Fan, A., Auli, M., and Grangier, D. Language modeling with gated convolutional networks. In International conference on machine learning, pp.\ 933--941. PMLR, 2017

work page 2017
[8]

Guilford, J. P. The nature of human intelligence. 1967

work page 1967
[9]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

and Page, S

Hong, L. and Page, S. E. Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences, 101 0 (46): 0 16385--16389, 2004

work page 2004
[11]

Group think: Multiple concurrent reasoning agents collaborating at token level granularity

Hsu, C.-J., Buffelli, D., McGowan, J., Liao, F.-T., Chen, Y.-C., Vakili, S., and Shiu, D.-s. Group think: Multiple concurrent reasoning agents collaborating at token level granularity. arXiv preprint arXiv:2505.11107, 2025

work page arXiv 2025
[12]

Large Language Models Cannot Self-Correct Reasoning Yet

Huang, J., Chen, X., Mishra, S., Zheng, H. S., Yu, A. W., Song, X., and Zhou, D. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2023

work page internal anchor Pith review arXiv 2023
[13]

Evaluation of best-of-n sampling strategies for language model alignment

Ichihara, Y., Jinnai, Y., Morimura, T., Ariu, K., Abe, K., Sakamoto, M., and Uchibe, E. Evaluation of best-of-n sampling strategies for language model alignment. arXiv preprint arXiv:2502.12668, 2025

work page arXiv 2025
[14]

Johnson-Laird, P. N. Mental models and human reasoning. Proceedings of the National Academy of Sciences, 107 0 (43): 0 18243--18250, 2010

work page 2010
[15]

Thinking, fast and slow

Kahneman, D. Thinking, fast and slow. macmillan, 2011

work page 2011
[16]

E., Sosa, M., Schor, J

Kay, K., Chung, J. E., Sosa, M., Schor, J. S., Karlsson, M. P., Larkin, M. C., Liu, D. F., and Frank, L. M. Constant sub-second cycling between representations of possible futures in the hippocampus. Cell, 180 0 (3): 0 552--567, 2020

work page 2020
[17]

Correlated errors in large language models

Kim, E., Garg, A., Peng, K., and Garg, N. Correlated errors in large language models. arXiv preprint arXiv:2506.07962, 2025

work page arXiv 2025
[18]

S., Reid, M., Matsuo, Y., and Iwasawa, Y

Kojima, T., Gu, S. S., Reid, M., Matsuo, Y., and Iwasawa, Y. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35: 0 22199--22213, 2022

work page 2022
[19]

From System 1 to System 2: A Survey of Reasoning Large Language Models

Li, Z.-Z., Zhang, D., Zhang, M.-L., Zhang, J., Liu, Z., Yao, Y., Xu, H., Zheng, J., Wang, P.-J., Chen, X., et al. From system 1 to system 2: A survey of reasoning large language models. arXiv preprint arXiv:2502.17419, 2025

work page internal anchor Pith review arXiv 2025
[20]

3d-rpe: Enhancing long-context modeling through 3d rotary position encoding

Ma, X., Liu, W., Zhang, P., and Xu, N. 3d-rpe: Enhancing long-context modeling through 3d rotary position encoding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp.\ 24804--24811, 2025

work page 2025
[21]

Self-refine: Iterative refinement with self-feedback

Madaan, A., Tandon, N., Gupta, P., Hallinan, S., Gao, L., Wiegreffe, S., Alon, U., Dziri, N., Prabhumoye, S., Yang, Y., et al. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36: 0 46534--46594, 2023

work page 2023
[22]

L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Cand \`e s, E., and Hashimoto, T

Muennighoff, N., Yang, Z., Shi, W., Li, X. L., Fei-Fei, L., Hajishirzi, H., Zettlemoyer, L., Liang, P., Cand \`e s, E., and Hashimoto, T. B. s1: Simple test-time scaling. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp.\ 20286--20332, 2025

work page 2025
[23]

Training language models to follow instructions with human feedback

Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35: 0 27730--27744, 2022

work page 2022
[24]

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Qiu, Z., Wang, Z., Zheng, B., Huang, Z., Wen, K., Yang, S., Men, R., Yu, L., Huang, F., Huang, S., et al. Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free. arXiv preprint arXiv:2505.06708, 2025

work page internal anchor Pith review arXiv 2025
[25]

Language models are unsupervised multitask learners

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al. Language models are unsupervised multitask learners. OpenAI blog, 1 0 (8): 0 9, 2019

work page 2019
[26]

Majority of the bests: Improving best-of-n via bootstrapping

Rakhsha, A., Madan, K., Zhang, T., Farahmand, A.-m., and Khasahmadi, A. Majority of the bests: Improving best-of-n via bootstrapping. arXiv preprint arXiv:2511.18630, 2025

work page arXiv 2025
[27]

14 Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn

Rodionov, G., Garipov, R., Shutova, A., Yakushev, G., Schultheis, E., Egiazarian, V., Sinitsin, A., Kuznedelev, D., and Alistarh, D. Hogwild! inference: Parallel llm generation via concurrent attention. arXiv preprint arXiv:2504.06261, 2025

work page arXiv 2025
[28]

Proximal Policy Optimization Algorithms

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Shao, Z., Wang, P., Zhu, Q., Xu, R., Song, J., Bi, X., Zhang, H., Zhang, M., Li, Y., Wu, Y., et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

GLU Variants Improve Transformer

Shazeer, N. Glu variants improve transformer. arXiv preprint arXiv:2002.05202, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2002
[31]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Snell, C., Lee, J., Xu, K., and Kumar, A. Scaling llm test-time compute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., Radford, A., Amodei, D., and Christiano, P. F. Learning to summarize with human feedback. Advances in neural information processing systems, 33: 0 3008--3021, 2020

work page 2020
[33]

Roformer: Enhanced transformer with rotary position embedding

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., and Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568: 0 127063, 2024

work page 2024
[34]

Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data

Toshniwal, S., Du, W., Moshkov, I., Kisacanin, B., Ayrapetyan, A., and Gitman, I. Openmathinstruct-2: Accelerating ai for math with massive open-source instruction data. arXiv preprint arXiv:2410.01560, 2024 a

work page arXiv 2024
[35]

Openmathinstruct-1: A 1.8 million math instruction tuning dataset

Toshniwal, S., Moshkov, I., Narenthiran, S., Gitman, D., Jia, F., and Gitman, I. Openmathinstruct-1: A 1.8 million math instruction tuning dataset. arXiv preprint arXiv: Arxiv-2402.10176, 2024 b

work page arXiv 2024
[36]

N., Kaiser, ., and Polosukhin, I

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, ., and Polosukhin, I. Attention is all you need. Advances in neural information processing systems, 30, 2017

work page 2017
[37]

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., and Zhou, D. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[38]

Emergent Abilities of Large Language Models

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022 a

work page internal anchor Pith review Pith/arXiv arXiv 2022
[39]

V., Zhou, D., et al

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., Le, Q. V., Zhou, D., et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35: 0 24824--24837, 2022 b

work page 2022
[40]

Para- thinker: Native parallel thinking as a new paradigm to scale llm test-time compute.arXiv preprint arXiv:2509.04475,

Wen, H., Su, Y., Zhang, F., Liu, Y., Liu, Y., Zhang, Y.-Q., and Li, Y. Parathinker: Native parallel thinking as a new paradigm to scale llm test-time compute. arXiv preprint arXiv:2509.04475, 2025

work page arXiv 2025
[41]

S., Naidu, S

White, C., Dooley, S., Roberts, M., Pal, A., Feuer, B., Jain, S., Shwartz-Ziv, R., Jain, N., Saifullah, K., Dey, S., Shubh-Agrawal, Sandha, S. S., Naidu, S. V., Hegde, C., LeCun, Y., Goldstein, T., Neiswanger, W., and Goldblum, M. Livebench: A challenging, contamination-free LLM benchmark. In The Thirteenth International Conference on Learning Representat...

work page 2025
[42]

W., Chabris, C

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., and Malone, T. W. Evidence for a collective intelligence factor in the performance of human groups. science, 330 0 (6004): 0 686--688, 2010

work page 2010
[43]

Native parallel reasoner: Reasoning in parallelism via self-distilled reinforcement learning

Wu, T., Liu, Y., Bai, J., Jia, Z., Zhang, S., Lin, Z., Wang, Y., Zhu, S.-C., and Zheng, Z. Native parallel reasoner: Reasoning in parallelism via self-distilled reinforcement learning. arXiv preprint arXiv:2512.07461, 2025

work page internal anchor Pith review arXiv 2025
[44]

Inference scaling laws: An empirical analysis of compute-optimal inference for problem-solving with language models.arXiv preprint arXiv:2408.00724,

Wu, Y., Sun, Z., Li, S., Welleck, S., and Yang, Y. Inference scaling laws: An empirical analysis of compute-optimal inference for problem-solving with language models. arXiv preprint arXiv:2408.00724, 2024

work page arXiv 2024
[45]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

arXiv preprint arXiv:2402.16837 , year=

Yang, S., Gribovskaya, E., Kassner, N., Geva, M., and Riedel, S. Do large language models latently perform multi-hop reasoning? arXiv preprint arXiv:2402.16837, 2024

work page arXiv 2024
[47]

Tree of thoughts: Deliberate problem solving with large language models

Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T., Cao, Y., and Narasimhan, K. Tree of thoughts: Deliberate problem solving with large language models. Advances in neural information processing systems, 36: 0 11809--11822, 2023

work page 2023
[48]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Yu, Q., Zhang, Z., Zhu, R., Yuan, Y., Zuo, X., Yue, Y., Dai, W., Fan, T., Liu, G., Liu, L., et al. Dapo: An open-source llm reinforcement learning system at scale. arXiv preprint arXiv:2503.14476, 2025 a

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Accelerate parallelizable reasoning via parallel decoding within one sequence

Yu, Y., Wang, W., Chen, R., and Pei, J. Accelerate parallelizable reasoning via parallel decoding within one sequence. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp.\ 9018--9025, 2025 b

work page 2025
[50]

Adding conditional control to text-to-image diffusion models

Zhang, L., Rao, A., and Agrawala, M. Adding conditional control to text-to-image diffusion models

work page
[51]

and Math-AI, T

Zhang, Y. and Math-AI, T. American invitational mathematics examination (aime) 2024, 2024

work page 2024
[52]

and Math-AI, T

Zhang, Y. and Math-AI, T. American invitational mathematics examination (aime) 2025, 2025

work page 2025
[53]

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

Zhang, Y., Li, M., Long, D., Zhang, X., Lin, H., Yang, B., Xie, P., Yang, A., Liu, D., Lin, J., et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models. arXiv preprint arXiv:2506.05176, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

Judging llm-as-a-judge with mt-bench and chatbot arena

Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Advances in neural information processing systems, 36: 0 46595--46623, 2023

work page 2023
[55]

Parallel-r1: Towards parallel thinking via reinforcement learning

Zheng, T., Zhang, H., Yu, W., Wang, X., Yang, X., Dai, R., Liu, R., Bao, H., Huang, C., Huang, H., et al. Parallel-r1: Towards parallel thinking via reinforcement learning. arXiv preprint arXiv:2509.07980, 2025

work page arXiv 2025
[56]

Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Zhou, D., Sch \"a rli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., et al. Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[57]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page