pith. machine review for the scientific record. sign in

arxiv: 2605.12940 · v1 · submitted 2026-05-13 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

The Expressivity Boundary of Probabilistic Circuits: A Comparison with Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:44 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords probabilistic circuitsexpressivitytransformersautoregressive language modelingseparation rankvtree structuredecomposable circuitsdependency topologies
0
0 comments X

The pith

Probabilistic circuits match transformer separation rank only on data partitions aligned with their fixed vtree structure and degrade on heterogeneous dependency topologies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies two main bottlenecks limiting probabilistic circuits in autoregressive language modeling relative to transformers. An output issue arises because PCs form predictions as convex combinations in probability space, which struggles to represent sharp token distributions; a logit-space reparameterization reduces this gap. A deeper context-encoding limit is proven for structured-decomposable PCs, which achieve equivalent separation rank to transformers only on partitions matching their fixed vtree routing, but suffer sharp performance loss when data dependency patterns vary across different topologies. The work further establishes that the broader class of decomposable PCs is strictly more expressive than the structured subclass, although effective optimization of the more general form remains unresolved.

Core claim

Structured-decomposable probabilistic circuits can match the separation rank of transformers on vtree-aligned partitions, yet this capacity is confined to partitions consistent with the fixed routing structure, causing severe degradation on data with heterogeneous dependency topologies. Decomposable probabilistic circuits are strictly more expressive than structured-decomposable ones. Parameterizing predictions as convex combinations in probability space creates an additional output bottleneck for sharp language distributions, though logit-space parameterization substantially narrows the gap.

What carries the argument

Vtree-aligned separation rank in structured-decomposable probabilistic circuits, which fixes the allowable variable partitions and thereby bounds context-encoding capacity.

If this is right

  • Logit-space parameterization of PC outputs reduces the gap with transformers on sharp distributions.
  • Structured PCs exhibit severe degradation precisely when data dependency topologies misalign with the fixed vtree.
  • Decomposable PCs are strictly more expressive than structured-decomposable PCs, indicating a route to higher capacity once optimization improves.
  • The exact inference advantage of PCs cannot offset these expressivity limits in autoregressive language modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Dynamic or data-dependent routing structures inside PCs could bypass the fixed-vtree alignment requirement.
  • Hybrid models that retain PC exact inference while borrowing flexible routing from transformers may close the remaining gap.
  • Optimization advances for general decomposable PCs would test whether the strict expressivity advantage translates to practical gains.
  • The results imply that any architecture with rigid decomposition structure will face analogous limits on data with variable dependency patterns.

Load-bearing premise

Language data contains heterogeneous dependency topologies that systematically misalign with any single fixed vtree structure.

What would settle it

An experiment in which a structured-decomposable PC with one fixed vtree maintains transformer-level performance across language datasets whose dependency topologies are known to be heterogeneous.

Figures

Figures reproduced from arXiv: 2605.12940 by Anji Liu, Muhan Zhang, Xuejie Liu, Zhiyu Zhao.

Figure 1
Figure 1. Figure 1: Impact of output parameterization on modeling performance. Lower validation loss is better. Across all datasets, logit-space variants improve over their probability-space counterparts within the same model family: Logit-HMM improves over HMM, and Transformer improves over Prob-Transformer. This supports the hypothesis that probability-space convex combinations impose a stronger geometric constraint than lo… view at source ↗
read the original abstract

Probabilistic Circuits (PCs) are deep generative models that support exact and efficient probabilistic inference. Yet in autoregressive language modeling, PCs still lag behind Transformer-based large language models (LLMs), suggesting an important expressivity gap. In this work, we compare PCs and LLMs under a unified autoregressive formulation. First, an output bottleneck: PCs parameterize predictions as convex combinations in probability space, which struggles to represent the sharp distributions typical of language; adopting a logit-space parameterization substantially narrows this gap. Second, a context-encoding bottleneck: we prove that structured-decomposable PCs can match Transformer separation rank on vtree-aligned partitions, but show, both theoretically and empirically, that this capacity is limited to partitions aligned with the fixed routing structure, leading to severe degradation when the data exhibits heterogeneous dependency topologies. We further prove that decomposable PCs are strictly more expressive than structured-decomposable ones, though effectively optimizing them remains an open challenge.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that Probabilistic Circuits (PCs) lag behind Transformer-based LLMs in autoregressive language modeling due to two main bottlenecks. First, an output bottleneck arises because PCs parameterize predictions as convex combinations in probability space, which struggles with sharp language distributions; switching to logit-space parameterization narrows the gap. Second, a context-encoding bottleneck is identified where structured-decomposable PCs can match Transformer separation ranks only on vtree-aligned partitions, with severe degradation on data exhibiting heterogeneous dependency topologies. The paper proves that decomposable PCs are strictly more expressive than structured-decomposable ones, though optimizing the former remains open.

Significance. If the central claims hold, this work provides a valuable theoretical and empirical dissection of PC expressivity limits relative to Transformers, explaining performance gaps in language modeling while suggesting concrete fixes like logit parameterization and decomposable structures. The separation-rank analysis and proofs offer a unified framework that could guide future exact-inference models toward closing the gap with LLMs, particularly if the heterogeneous-topology assumption is validated.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Context-Encoding Bottleneck): The degradation claim rests on language data systematically exhibiting heterogeneous dependency topologies that misalign with any fixed vtree; this assumption is load-bearing but receives only indirect empirical support. Direct analysis of dependency structures (e.g., via mutual information or parse trees) across datasets is needed to confirm the misalignment drives the observed performance gap rather than other factors.
  2. [§4] §4 (Expressivity Proof): The proof that decomposable PCs are strictly more expressive than structured-decomposable ones must explicitly state the measure (e.g., separation rank or function class) establishing the strict inequality and verify it holds under the autoregressive formulation used elsewhere; without this, the claim that structured-decomposable PCs are fundamentally limited remains incomplete.
  3. [Empirical section] Empirical section and associated tables: The reported degradation on misaligned partitions is central, yet details on vtree construction, how alignment is quantified, and baseline controls (e.g., random vs. learned vtrees) are insufficient for verification; this weakens the link between theory and the claimed practical expressivity gap.
minor comments (2)
  1. [Notation] Notation section: Define vtree, separation rank, and structured-decomposability at first use with a small illustrative example to aid readers outside the PC community.
  2. [Figures] Figure captions: Add explicit descriptions of what each panel shows regarding alignment vs. misalignment to improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas to strengthen the manuscript. We address each major comment below and will incorporate revisions to improve clarity, completeness, and empirical support.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Context-Encoding Bottleneck): The degradation claim rests on language data systematically exhibiting heterogeneous dependency topologies that misalign with any fixed vtree; this assumption is load-bearing but receives only indirect empirical support. Direct analysis of dependency structures (e.g., via mutual information or parse trees) across datasets is needed to confirm the misalignment drives the observed performance gap rather than other factors.

    Authors: We appreciate the referee's emphasis on strengthening the empirical grounding of the heterogeneous topology assumption. Our current results show substantial performance degradation specifically on misaligned partitions (while controlling for model size and training), which we interpret as evidence that language data exhibits such heterogeneity. To directly address the concern, the revised manuscript will include an additional analysis subsection quantifying dependency structures via mutual information matrices computed on the training data, along with a comparison of alignment scores between learned vtrees and these structures. This will provide more explicit confirmation that topology mismatch, rather than other factors, drives the observed gaps. revision: yes

  2. Referee: [§4] §4 (Expressivity Proof): The proof that decomposable PCs are strictly more expressive than structured-decomposable ones must explicitly state the measure (e.g., separation rank or function class) establishing the strict inequality and verify it holds under the autoregressive formulation used elsewhere; without this, the claim that structured-decomposable PCs are fundamentally limited remains incomplete.

    Authors: We thank the referee for this precise observation. The proof in §4 establishes the strict inequality using separation rank as the expressivity measure: we construct a family of functions whose separation rank exceeds what any structured-decomposable PC can achieve for a given vtree, while decomposable PCs can represent them exactly. This construction is carried out in the autoregressive setting by considering the joint distribution factorized as a product of conditionals. In the revision, we will add an explicit statement of the measure at the beginning of the proof and include a dedicated paragraph verifying that the separation-rank separation holds under the autoregressive formulation employed in the rest of the paper. revision: yes

  3. Referee: [Empirical section] Empirical section and associated tables: The reported degradation on misaligned partitions is central, yet details on vtree construction, how alignment is quantified, and baseline controls (e.g., random vs. learned vtrees) are insufficient for verification; this weakens the link between theory and the claimed practical expressivity gap.

    Authors: We agree that greater detail on the experimental setup is necessary for reproducibility and to solidify the theory-empirics connection. The revised empirical section will expand the description of vtree construction (including the specific bottom-up clustering heuristic and hyperparameters), introduce an explicit alignment metric (average normalized mutual information between variable pairs and vtree partitions), and report new baseline results using randomly generated vtrees of matching size. These additions will allow readers to verify that the performance differences arise from alignment rather than other implementation choices. revision: yes

Circularity Check

0 steps flagged

No circularity: proofs rely on external separation-rank results and standard PC definitions

full rationale

The paper's derivation chain consists of theoretical proofs establishing that structured-decomposable PCs match Transformer separation rank exactly on vtree-aligned partitions and that decomposable PCs are strictly more expressive than structured-decomposable ones. These steps invoke established separation-rank properties of Transformers and the standard definitions of vtrees and decomposability in probabilistic circuits; they do not reduce any claimed prediction or expressivity bound to a fitted parameter, self-referential equation, or load-bearing self-citation. The empirical degradation result is conditioned on the external modeling assumption of heterogeneous dependency topologies in language data, which is not derived from the paper's own equations. No self-definitional, fitted-input, or ansatz-smuggling patterns appear.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claims rely on standard definitions of probabilistic circuits, vtree structures, separation rank from transformer literature, and convex combination properties, with no new free parameters, invented entities, or ad-hoc axioms introduced.

axioms (2)
  • standard math Standard axioms of probability and convex combinations in output parameterization
    Invoked when defining the output bottleneck in probability space versus logit space.
  • domain assumption Properties of separation rank for transformers and decomposability for PCs from prior literature
    Used to establish matching capacity on vtree-aligned partitions.

pith-pipeline@v0.9.0 · 5466 in / 1348 out tokens · 68382 ms · 2026-05-14T20:44:11.017758+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 6 internal anchors

  1. [1]

    Learning the structure of sum-product net- works via an svd-based algorithm

    Tameem Adel, David Balduzzi, and Ali Ghodsi. Learning the structure of sum-product net- works via an svd-based algorithm. InUAI, pages 32–41, 2015

  2. [2]

    On the sample complexity of learning sum-product networks

    Ishaq Aden-Ali and Hassan Ashtiani. On the sample complexity of learning sum-product networks. InInternational Conference on Artificial Intelligence and Statistics, pages 4508–

  3. [3]

    The softmax bottleneck does not limit the probabilities of the most likely tokens

    Ronen Basri and David Jacobs. The softmax bottleneck does not limit the probabilities of the most likely tokens. InThe Fourteenth International Conference on Learning Representations, 2026

  4. [4]

    Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

  5. [5]

    Probabilistic cir- cuits: A unifying framework for tractable probabilistic models.UCLA

    YooJung Choi, Antonio Vergari, and Guy Van den Broeck. Probabilistic cir- cuits: A unifying framework for tractable probabilistic models.UCLA. URL: http://starai.cs.ucla.edu/papers/ProbCirc20.pdf, 2020

  6. [6]

    Strudel: Learning structured- decomposable probabilistic circuits

    Meihua Dang, Antonio Vergari, and Guy Van den Broeck. Strudel: Learning structured- decomposable probabilistic circuits. InInternational Conference on Probabilistic Graphical Models, pages 137–148. PMLR, 2020

  7. [7]

    A compilation of succinctness results for arithmetic circuits

    Alexis de Colnet and Stefan Mengel. A compilation of succinctness results for arithmetic circuits. In18th International Conference on Principles of Knowledge Representation and Reasoning, KR 2021, 2021

  8. [8]

    Lost in backpropagation: The lm head is a gradient bottleneck

    Nathan Godey and Yoav Artzi. Lost in backpropagation: The lm head is a gradient bottleneck. arXiv preprint arXiv:2603.10145, 2026

  9. [9]

    Generative adversarial nets.Advances in neural information processing systems, 27, 2014

    Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27, 2014

  10. [10]

    Denoising diffusion probabilistic models.Ad- vances in Neural Information Processing Systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Ad- vances in Neural Information Processing Systems, 33:6840–6851, 2020

  11. [11]

    Auto-Encoding Variational Bayes

    Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013

  12. [12]

    Probabilistic sentential decision diagrams

    Doga Kisa, Guy Van den Broeck, Arthur Choi, and Adnan Darwiche. Probabilistic sentential decision diagrams. InKR, 2014

  13. [13]

    UD English-ATIS.https://github

    Aslı Kuzgun, Neslihan Cesur, and Olcay Taner Yıldız. UD English-ATIS.https://github. com/UniversalDependencies/UD_English-Atis, 2022. Universal Dependencies v2.9. License: CC BY-SA 4.0

  14. [14]

    On the hardness of approximating distributions with tractable probabilistic models

    John Leland and YooJung Choi. On the hardness of approximating distributions with tractable probabilistic models. InThe Thirty-ninth Annual Conference on Neural Information Process- ing Systems, 2025

  15. [15]

    The inductive bias of in-context learning: Rethinking pretraining example design.arXiv preprint arXiv:2110.04541, 2021

    Yoav Levine, Noam Wies, Daniel Jannai, Dan Navon, Yedid Hoshen, and Amnon Shashua. The inductive bias of in-context learning: Rethinking pretraining example design.arXiv preprint arXiv:2110.04541, 2021

  16. [16]

    Scaling tractable probabilistic circuits: a systems perspective

    Anji Liu, Kareem Ahmed, and Guy Van Den Broeck. Scaling tractable probabilistic circuits: a systems perspective. InProceedings of the 41st International Conference on Machine Learn- ing, pages 30630–30646, 2024

  17. [17]

    Lossless compression with probabilistic circuits

    Anji Liu, Stephan Mandt, and Guy Van den Broeck. Lossless compression with probabilistic circuits. InInternational Conference on Learning Representations, 2021. 10

  18. [18]

    Rethinking probabilistic circuit parameter learning.arXiv preprint arXiv:2505.19982, 2025

    Anji Liu, Zilei Shao, and Guy Van den Broeck. Rethinking probabilistic circuit parameter learning.arXiv preprint arXiv:2505.19982, 2025

  19. [19]

    Tractable regularization of probabilistic circuits

    Anji Liu and Guy Van den Broeck. Tractable regularization of probabilistic circuits. InAd- vances in Neural Information Processing Systems 34 (NeurIPS), dec 2021

  20. [20]

    Scaling up probabilistic circuits by latent variable distillation.arXiv preprint arXiv:2210.04398, 2022

    Anji Liu, Honghua Zhang, and Guy Van den Broeck. Scaling up probabilistic circuits by latent variable distillation.arXiv preprint arXiv:2210.04398, 2022

  21. [21]

    Understanding the distillation process from deep generative models to tractable probabilistic circuits

    Xuejie Liu, Anji Liu, Guy Van den Broeck, and Yitao Liang. Understanding the distillation process from deep generative models to tractable probabilistic circuits. InInternational Con- ference on Machine Learning, pages 21825–21838. PMLR, 2023

  22. [22]

    A tractable inference perspective of offline rl.Advances in Neural Information Processing Systems, 37:70953–70980, 2024

    Xuejie Liu, Anji Liu, Guy Van den Broeck, and Yitao Liang. A tractable inference perspective of offline rl.Advances in Neural Information Processing Systems, 37:70953–70980, 2024

  23. [23]

    What is the relationship between ten- sor factorizations and circuits (and how can we exploit it)?arXiv preprint arXiv:2409.07953, 2024

    Lorenzo Loconte, Antonio Mari, Gennaro Gala, Robert Peharz, Cassio de Campos, Erik Quaeghebeur, Gennaro Vessio, and Antonio Vergari. What is the relationship between ten- sor factorizations and circuits (and how can we exploit it)?arXiv preprint arXiv:2409.07953, 2024

  24. [24]

    pytorch hmm: A PyTorch implementation of Hidden Markov Models

    Loren Lugosch. pytorch hmm: A PyTorch implementation of Hidden Markov Models. https://github.com/lorenlugosch/pytorch_HMM, 2020

  25. [25]

    Building a large annotated corpus of english: The penn treebank.Computational linguistics, 19(2):313–330, 1993

    Mitch Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of english: The penn treebank.Computational linguistics, 19(2):313–330, 1993

  26. [26]

    On the Expressive Efficiency of Sum Product Networks

    James Martens and Venkatesh Medabalimi. On the expressive efficiency of sum product net- works.arXiv preprint arXiv:1411.7717, 2014

  27. [27]

    Pointer Sentinel Mixture Models

    Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models.arXiv preprint arXiv:1609.07843, 2016

  28. [28]

    SPFlow: An Easy and Extensible Library for Deep Probabilistic Learning using Sum-Product Networks

    Alejandro Molina, Antonio Vergari, Karl Stelzner, Robert Peharz, Pranav Subramani, Nicola Di Mauro, Pascal Poupart, and Kristian Kersting. Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks.arXiv preprint arXiv:1901.03704, 2019

  29. [29]

    On the latent variable interpretation in sum-product networks.IEEE transactions on pattern analysis and machine intelligence, 39(10):2030–2044, 2016

    Robert Peharz, Robert Gens, Franz Pernkopf, and Pedro Domingos. On the latent variable interpretation in sum-product networks.IEEE transactions on pattern analysis and machine intelligence, 39(10):2030–2044, 2016

  30. [30]

    Einsum networks: Fast and scalable learning of tractable probabilistic circuits

    Robert Peharz, Steven Lang, Antonio Vergari, Karl Stelzner, Alejandro Molina, Martin Trapp, Guy Van den Broeck, Kristian Kersting, and Zoubin Ghahramani. Einsum networks: Fast and scalable learning of tractable probabilistic circuits. InProceedings of the 37th International Conference on Machine Learning (ICML), jul 2020

  31. [31]

    On theoretical properties of sum-product networks

    Robert Peharz, Sebastian Tschiatschek, Franz Pernkopf, and Pedro Domingos. On theoretical properties of sum-product networks. InArtificial Intelligence and Statistics, pages 744–752. PMLR, 2015

  32. [32]

    New compilation languages based on structured decomposability

    Knot Pipatsrisawat and Adnan Darwiche. New compilation languages based on structured decomposability. InAAAI, volume 8, pages 517–522, 2008

  33. [33]

    Sum-product networks: A new deep architecture

    Hoifung Poon and Pedro Domingos. Sum-product networks: A new deep architecture. In2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 689–690. IEEE, 2011

  34. [34]

    Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.OpenAI blog, 1(8):9, 2019

  35. [35]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Tim- oth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023. 11

  36. [36]

    A compo- sitional atlas of tractable circuit operations for probabilistic inference

    Antonio Vergari, YooJung Choi, Anji Liu, Stefano Teso, and Guy Van den Broeck. A compo- sitional atlas of tractable circuit operations for probabilistic inference. InAdvances in Neural Information Processing Systems 34 (NeurIPS), dec 2021

  37. [37]

    On the relationship between monotone and squared probabilistic circuits

    Benjie Wang and Guy Van den Broeck. On the relationship between monotone and squared probabilistic circuits. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 39, pages 21026–21034, 2025

  38. [38]

    Trace back from the fu- ture: A probabilistic reasoning approach to controllable language generation.arXiv preprint arXiv:2504.18535, 2025

    Gwen Yidou Weng, Benjie Wang, and Guy Van den Broeck. Trace back from the fu- ture: A probabilistic reasoning approach to controllable language generation.arXiv preprint arXiv:2504.18535, 2025

  39. [39]

    Which transformer architecture fits my data? a vocabulary bottleneck in self-attention

    Noam Wies, Yoav Levine, Daniel Jannai, and Amnon Shashua. Which transformer architecture fits my data? a vocabulary bottleneck in self-attention. InInternational Conference on Machine Learning, pages 11170–11181. PMLR, 2021

  40. [40]

    Etas: Zero-shot transformer architecture search via network trainability and expressivity

    Jiechao Yang and Yong Liu. Etas: Zero-shot transformer architecture search via network trainability and expressivity. InFindings of the Association for Computational Linguistics: ACL 2024, pages 6780–6795, 2024

  41. [41]

    Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

    Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, and William W Cohen. Breaking the softmax bottleneck: A high-rank rnn language model.arXiv preprint arXiv:1711.03953, 2017

  42. [42]

    Tractable control for autoregressive language generation

    Honghua Zhang, Meihua Dang, Nanyun Peng, and Guy Van den Broeck. Tractable control for autoregressive language generation. InInternational Conference on Machine Learning, pages 40932–40945. PMLR, 2023

  43. [43]

    Scaling probabilistic circuits via monarch matrices

    Honghua Zhang, Meihua Dang, Benjie Wang, Stefano Ermon, Nanyun Peng, and Guy Van Den Broeck. Scaling probabilistic circuits via monarch matrices. InInternational Conference on Machine Learning, pages 74804–74818. PMLR, 2025. 12 A Proofs A.1 Preliminary Definitions For a structured-decomposable PC with vtreeTover variablesX={X 1, . . . , XT }, we introduce...

  44. [44]

    IfvisA-pure, thenR A(v)≤d v

  45. [45]

    IfvisB-pure, thenR A(v) = 1

  46. [46]

    Proof.Case 1(visA-pure)

    Ifvis mixed, thenR A(v)≤R A(L)·R A(R). Proof.Case 1(visA-pure). SinceU v ∩B=∅, all functions inH v depend only onx Uv∩A =x Uv. Let{ϕ v i }dv i=1 be any basis ofH v. Then everyf∈ H v can be written as f(x Uv) = dvX i=1 cf i ϕv i (xUv), wherec f i ∈Rare constants, which we view as trivial functions on the emptyB-side. Hence RA(v)≤d v. Case 2(visB-pure). Sin...

  47. [47]

    Define F(y, s) =1[s= 1]A(y) +1[s= 0]B(y)

    there is no vtreeTonYsuch that bothAandBadmit polynomial-size structured- decomposable PCs respectingT. Define F(y, s) =1[s= 1]A(y) +1[s= 0]B(y). ThenFadmits a polynomial-size decomposable PC, but does not admit a polynomial-size structured-decomposable PC. Proof.We first show thatFadmits a polynomial-size decomposable PC. Construct a circuit with two pro...