Recognition: 2 theorem links
· Lean TheoremDystruct: Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
Pith reviewed 2026-05-12 02:00 UTC · model grok-4.3
The pith
A training-free Bayesian framework enables diffusion language models to generate variable-length text by jointly inferring lengths, blocks, and schedules during decoding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that flexible-length generation in diffusion language models can be cast as a dynamic structural inference problem solved through Bayesian methods. At each window expansion step the framework integrates local uncertainty with structural signals in a single mechanism to compute the expansion length, the block boundaries, and the decoding schedule, thereby supporting both flexible block expansion and block organization while preserving coherence across the full output.
What carries the argument
Dystruct, a training-free Bayesian structured decoding framework that jointly infers expansion length, block boundaries, and decoding schedule by unifying local uncertainty signals with structural information at each expansion step.
If this is right
- Generation quality and flexibility improve over both fixed-length diffusion models and prior flexible-length methods across multiple benchmarks.
- The model can dynamically expand and organize blocks while keeping overall coherence without post-hoc tuning.
- No retraining or architectural changes to the base diffusion language model are required.
- The same Bayesian update step determines length, boundaries, and schedule in one pass.
Where Pith is reading between the lines
- Similar Bayesian structural inference could be tested on other non-autoregressive generation settings where length uncertainty also appears.
- The method may help address coherence issues in long-form parallel decoding tasks that current local-signal approaches struggle with.
- Applying the framework to specialized domains such as code or dialogue could reveal whether the structural signals capture domain-specific patterns automatically.
Load-bearing premise
Local uncertainty signals combined with structural signals through a unified Bayesian mechanism are sufficient to infer global sequence structure and maintain coherence in variable-length outputs without any model retraining.
What would settle it
If side-by-side experiments on standard benchmarks show that Dystruct outputs receive lower quality scores or exhibit more coherence failures than fixed-length diffusion baselines once length is left free to vary, the central claim would be refuted.
Figures
read the original abstract
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive models, primarily due to their ability to enable parallel decoding. Despite this advantage, most existing DLMs rely on a fixed generation length specified prior to decoding, which restricts their flexibility in real-world applications. While a few recent works attempt to support flexible-length generation, they typically suffer from notable limitations: some require costly retraining to accommodate variable-length outputs, while others depend solely on local confidence signals during decoding. Such local criteria fail to capture the evolving structure of the sequence, often resulting in suboptimal generation quality. In this paper, we propose a training-free, Bayesian structured decoding framework that formulates flexible-length generation as a dynamic structural inference problem. Our approach formulates flexible-length generation as a dynamic structural inference problem, jointly computing the expansion length, the block boundaries, and the decoding schedule. At each window expansion step, the method integrates local uncertainty with structural signals via a unified mechanism that supports dynamic structured generation, including both flexible block expansion and block organization, while maintaining coherence. Extensive experiments across multiple benchmarks demonstrate that our approach significantly improves generation quality and flexibility over existing fixed-length and flexible-length baselines. These results highlight the advantage of Bayesian structured decoding for diffusion language model, providing a principled and efficient solution for structured text generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Dystruct, a training-free Bayesian structured decoding framework for diffusion language models. It formulates flexible-length generation as a dynamic structural inference problem that jointly computes expansion length, block boundaries, and decoding schedule by integrating local uncertainty signals with structural signals at each window expansion step, with the goal of maintaining coherence across variable-length outputs. Extensive experiments on multiple benchmarks are claimed to show quality and flexibility gains over fixed-length and flexible-length baselines.
Significance. If the empirical results hold under rigorous validation, the work would be significant because it offers a principled, training-free mechanism to overcome the fixed-length restriction that limits most diffusion language models, potentially improving their practicality for real-world applications without the cost of retraining. The unified Bayesian treatment of local uncertainty and global structure is a conceptually clean contribution that could inform future non-autoregressive decoding methods.
major comments (2)
- [Experiments section] The central empirical claim (abstract and Experiments section) rests on reported quality and flexibility gains, yet the manuscript provides no error bars, statistical significance tests, number of random seeds, or ablation studies isolating the contribution of the Bayesian structural inference versus simpler local-confidence baselines. This makes it impossible to determine whether the gains are robust or attributable to the proposed mechanism.
- [Method section] The method description (Method section) states that local uncertainty and structural signals are combined via a 'unified Bayesian mechanism' to jointly infer expansion length, block boundaries, and schedule, but no explicit update equations, prior definitions, or likelihood formulations are supplied. Without these, the claim of a 'principled' inference procedure cannot be verified or reproduced.
minor comments (1)
- [Abstract] The abstract contains a redundant sentence repeating the phrase 'formulates flexible-length generation as a dynamic structural inference problem.'
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important areas for strengthening the manuscript. We address each major comment below and will revise the paper accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [Experiments section] The central empirical claim (abstract and Experiments section) rests on reported quality and flexibility gains, yet the manuscript provides no error bars, statistical significance tests, number of random seeds, or ablation studies isolating the contribution of the Bayesian structural inference versus simpler local-confidence baselines. This makes it impossible to determine whether the gains are robust or attributable to the proposed mechanism.
Authors: We acknowledge the validity of this observation. The current version reports aggregate performance metrics without accompanying statistical details or targeted ablations. In the revised manuscript, we will rerun all experiments with multiple random seeds, include error bars and standard deviations, conduct statistical significance tests (e.g., paired t-tests), and add an ablation study that isolates the Bayesian structural inference component against a local-confidence-only baseline. These additions will allow readers to assess the robustness and specific contribution of the proposed mechanism. revision: yes
-
Referee: [Method section] The method description (Method section) states that local uncertainty and structural signals are combined via a 'unified Bayesian mechanism' to jointly infer expansion length, block boundaries, and schedule, but no explicit update equations, prior definitions, or likelihood formulations are supplied. Without these, the claim of a 'principled' inference procedure cannot be verified or reproduced.
Authors: We agree that the Method section requires greater mathematical precision to substantiate the claim of a principled Bayesian procedure. In the revision, we will expand the description to include the explicit posterior update equations, the prior distribution over dynamic structural configurations (expansion lengths and block boundaries), and the likelihood model that incorporates local uncertainty signals at each window step. This will render the inference process fully specified and reproducible. revision: yes
Circularity Check
No significant circularity in the proposed framework
full rationale
The paper introduces a training-free Bayesian structured decoding method that formulates flexible-length generation as a dynamic structural inference problem, integrating local uncertainty signals with structural signals to jointly determine expansion length, block boundaries, and decoding schedule. No equations, fitted parameters, or self-referential definitions appear in the provided abstract or description that would reduce the central claim to its own inputs by construction. The approach is presented as relying on external uncertainty and structural signals rather than internal fitting or prior self-citations that bear the load of the uniqueness or correctness of the inference mechanism. Experiments on benchmarks are invoked as independent validation, making the derivation self-contained without the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Local uncertainty and structural signals can be integrated into a unified Bayesian mechanism that infers global sequence structure
invented entities (1)
-
Dynamic structural inference problem
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We model the prior over these latent variables as a structured factorization: p(Z(t)) = p(Lt) p(P(t)|Lt,α) p(τ(t)|P(t)). ... CRP prior over block partitions ... p(bg=1|mg,αg(t)) = αg(t)/(mg+αg(t))
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Posterior over block partitions ... MAP inference arg max log p(P(t)|O(t),Lt,αg(t))
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Language models are few-shot learners
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott...
work page 1901
-
[2]
Subham S Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin T Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models.Advances in Neural Information Processing Systems, 37:130136–130184, 2024
work page 2024
-
[3]
Large language diffusion models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, JUN ZHOU, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[4]
Kakade, Timothy Ngotiaoco, Sitan Chen, and Michael Samuel Albergo
Jaeyeon Kim, Lee Cheuk Kit, Carles Domingo-Enrich, Yilun Du, Sham M. Kakade, Timothy Ngotiaoco, Sitan Chen, and Michael Samuel Albergo. Any-order flexible length masked diffusion. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[5]
Beyond masks: Efficient, flexible diffusion language models via deletion- insertion processes
Fangyu Ding, Ding Ding, Sijin Chen, Kaibo Wang, Peng Xu, Zijin Feng, Haoli Bai, Kai Han, Youliang Yan, Binhang Yuan, and Jiacheng Sun. Beyond masks: Efficient, flexible diffusion language models via deletion- insertion processes. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[6]
Beyond fixed: Training-free variable-length denoising for diffusion large language models
Jinsong Li, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Jiaqi Wang, and Dahua Lin. Beyond fixed: Training-free variable-length denoising for diffusion large language models. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[7]
David M Blei, Thomas L Griffiths, and Michael I Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies.Journal of the ACM (JACM), 57(2):1–30, 2010
work page 2010
-
[8]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[9]
Emiel Hoogeboom, Didrik Nielsen, Priyank Jaini, Patrick Forré, and Max Welling. Argmax flows and multinomial diffusion: Learning categorical distributions.Advances in neural information processing systems, 34:12454–12465, 2021
work page 2021
-
[10]
Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves controllable text generation.Advances in neural information processing systems, 35:4328–4343, 2022
work page 2022
-
[11]
Latent diffusion energy-based model for interpretable text modelling
Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, and Ying Nian Wu. Latent diffusion energy-based model for interpretable text modelling. InInternational Conference on Machine Learning, pages 25702–25720. PMLR, 2022
work page 2022
-
[12]
Step-unrolled denoising autoencoders for text generation
Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, and Aaron van den Oord. Step-unrolled denoising autoencoders for text generation. InInternational Conference on Learning Representations, 2022
work page 2022
-
[13]
DiffusER: Diffusion via edit-based reconstruction
Machel Reid, Vincent Josua Hellendoorn, and Graham Neubig. DiffusER: Diffusion via edit-based reconstruction. InThe Eleventh International Conference on Learning Representations, 2023
work page 2023
-
[14]
Likelihood-based diffusion language models
Ishaan Gulrajani and Tatsunori Hashimoto. Likelihood-based diffusion language models. InThirty-seventh Conference on Neural Information Processing Systems, 2023
work page 2023
-
[15]
Diffusionbert: Improving generative masked language models with diffusion models
Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuan-Jing Huang, and Xipeng Qiu. Diffusionbert: Improving generative masked language models with diffusion models. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: Long papers), pages 4521–4534, 2023
work page 2023
-
[16]
Diffuseq: Sequence to sequence text generation with diffusion models
Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. Diffuseq: Sequence to sequence text generation with diffusion models. InThe Eleventh International Conference on Learning Representations, 2023. 15 Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
work page 2023
-
[17]
Justin Lovelace, Varsha Kishore, Chao Wan, Eliot Shekhtman, and Kilian Q Weinberger. Latent diffusion for language generation.Advances in Neural Information Processing Systems, 36:56998–57025, 2023
work page 2023
-
[18]
Discrete flow matching.Advances in Neural Information Processing Systems, 37:133345–133385, 2024
Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky TQ Chen, Gabriel Synnaeve, Yossi Adi, and Yaron Lipman. Discrete flow matching.Advances in Neural Information Processing Systems, 37:133345–133385, 2024
work page 2024
-
[19]
Discrete diffusion modeling by estimating the ratios of the data distribution
Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[20]
Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, and Zhiting Hu. Unified generation, reconstruction, and representation: Generalized diffusion with adaptive latent encoding-decoding. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[21]
Simplified and generalized masked diffusion for discrete data
Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, and Michalis Titsias. Simplified and generalized masked diffusion for discrete data. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
work page 2024
-
[22]
Scaling up masked diffusion models on text
Shen Nie, Fengqi Zhu, Chao Du, Tianyu Pang, Qian Liu, Guangtao Zeng, Min Lin, and Chongxuan Li. Scaling up masked diffusion models on text. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[23]
Anji Liu, Oliver Broadrick, Mathias Niepert, and Guy Van den Broeck. Discrete copula diffusion. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[24]
Beyond autoregression: Discrete diffusion for complex reasoning and planning
Jiacheng Ye, Jiahui Gao, Shansan Gong, Lin Zheng, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Beyond autoregression: Discrete diffusion for complex reasoning and planning. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[25]
Energy-based diffusion language models for text generation
Minkai Xu, Tomas Geffner, Karsten Kreis, Weili Nie, Yilun Xu, Jure Leskovec, Stefano Ermon, and Arash Vahdat. Energy-based diffusion language models for text generation. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[26]
Scaling diffusion language models via adaptation from autoregressive models
Shansan Gong, Shivam Agarwal, Yizhe Zhang, Jiacheng Ye, Lin Zheng, Mukai Li, Chenxin An, Peilin Zhao, Wei Bi, Jiawei Han, Hao Peng, and Lingpeng Kong. Scaling diffusion language models via adaptation from autoregressive models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[27]
Beyond autoregression: Fast LLMs via self-distillation through time
Justin Deschenaux and Caglar Gulcehre. Beyond autoregression: Fast LLMs via self-distillation through time. In The Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[28]
Generalized interpolating discrete diffusion
Dimitri von Rütte, Janis Fluri, Yuhui Ding, Antonio Orvieto, Bernhard Schölkopf, and Thomas Hofmann. Generalized interpolating discrete diffusion. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[29]
Think while you generate: Discrete diffusion with planned denoising
Sulin Liu, Juno Nam, Andrew Campbell, Hannes Stark, Yilun Xu, Tommi Jaakkola, and Rafael Gomez-Bombarelli. Think while you generate: Discrete diffusion with planned denoising. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[30]
Block diffusion: Interpolating between autoregressive and diffusion language models
Marianne Arriola, Subham Sekhar Sahoo, Aaron Gokaslan, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Justin T Chiu, and V olodymyr Kuleshov. Block diffusion: Interpolating between autoregressive and diffusion language models. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[31]
Kaiwen Zheng, Yongxin Chen, Hanzi Mao, Ming-Yu Liu, Jun Zhu, and Qinsheng Zhang. Masked diffusion models are secretly time-agnostic masked models and exploit inaccurate categorical sampling. InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[32]
Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin T Chiu, and V olodymyr Kuleshov. The diffusion duality. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[33]
Ruixiang ZHANG, Shuangfei Zhai, Yizhe Zhang, James Thornton, Zijing Ou, Joshua M. Susskind, and Navdeep Jaitly. Target concrete score matching: A holistic framework for discrete diffusion. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[34]
Jaeyeon Kim, Kulin Shah, Vasilis Kontonis, Sham M. Kakade, and Sitan Chen. Train for the worst, plan for the best: Understanding token ordering in masked diffusions. InForty-second International Conference on Machine Learning, 2025
work page 2025
-
[35]
Anchored diffusion language model
Litu Rout, Constantine Caramanis, and Sanjay Shakkottai. Anchored diffusion language model. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025
work page 2025
-
[36]
Fast and fluent diffusion language models via convolutional decoding and rejective fine-tuning
Yeongbin Seo, Dongha Lee, Jaehyung Kim, and Jinyoung Yeo. Fast and fluent diffusion language models via convolutional decoding and rejective fine-tuning. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025. 16 Dynamically Structured Diffusion Language Model Decoding via Bayesian Inference
work page 2025
-
[37]
Gabe Guo and Stefano Ermon. Self-speculative decoding accelerates lossless inference in any-order and any-subset autoregressive models. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[38]
Hierarchy decoding: A training-free parallel decoding strategy for diffusion large language models
Xiaojing Qi, Lun Du, Xinyuan Zhang, Lanning Wei, Tao Jin, and Da Zheng. Hierarchy decoding: A training-free parallel decoding strategy for diffusion large language models. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[39]
Adablock-dLLM: Semantic-aware diffusion LLM inference via adaptive block size
Guanxi Lu, Hao Mark Chen, Yuto Karashima, Zhican Wang, Daichi Fujiki, and Hongxiang Fan. Adablock-dLLM: Semantic-aware diffusion LLM inference via adaptive block size. InThe Fourteenth International Conference on Learning Representations, 2026
work page 2026
-
[40]
When to Commit? Towards Variable-Size Self-Contained Blocks for Discrete Diffusion Language Models
Danny Wang, Ruihong Qiu, and Zi Huang. When to commit? towards variable-size self-contained blocks for discrete diffusion language models.arXiv preprint arXiv:2604.23994, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[41]
Dream 7b: Diffusion large language models, 2025
Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models, 2025
work page 2025
-
[42]
A framework for few-shot language model evaluation, 12 2023
Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac’h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A framework...
work page 2023
-
[43]
Training verifiers to solve math word problems, 2021
Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems, 2021
work page 2021
-
[44]
Measuring mathematical problem solving with the MATH dataset
Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021
work page 2021
-
[45]
Program synthesis with large language models, 2021
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, and Charles Sutton. Program synthesis with large language models, 2021
work page 2021
-
[46]
Evaluating large language models trained on code, 2021
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, et al. Evaluating large language...
work page 2021
-
[47]
Challenging BIG-bench tasks and whether chain- of-thought can solve them
Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc Le, Ed Chi, Denny Zhou, and Jason Wei. Challenging BIG-bench tasks and whether chain- of-thought can solve them. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: AC...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.