arxiv: 2605.12671 · v1 · submitted 2026-05-12 · 💻 cs.CL

Recognition: no theorem link

All Circuits Lead to Rome: Rethinking Functional Anisotropy in Circuit and Sheaf Discovery for LLMs

Bangwei Guo, Dimitris N. Metaxas, Gerald Penn, Jingcheng Niu, Jinman Zhao, Mingyu Jin, Xi Chen, Yutao Yue, Yutong Yin, Zhaoran Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 20:48 UTC · model grok-4.3

classification 💻 cs.CL

keywords circuit discoverymechanistic interpretabilitylarge language modelssheaf discoveryfunctional anisotropysuperpositionnon-unique explanations

0 comments

The pith

A single task in large language models can be performed by multiple structurally distinct circuits or sheaves that are all faithful, sparse, and complete.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges the common assumption that each function in an LLM is handled by one unique or near-unique internal mechanism. Instead, it demonstrates that the same task can be supported by several different circuits or sheaves that each achieve high performance with minimal shared structure. To find these alternatives, the authors add an overlap penalty to standard circuit discovery methods. This non-uniqueness grows more evident when more circuits are sought and appears across different discovery techniques. The results lead to the claim that mechanistic explanations for LLM behavior are inherently non-canonical.

Core claim

We show that a single LLM task can instead be supported by multiple, structurally distinct circuits or sheaves that are simultaneously faithful, sparse, and complete. We introduce Overlap-Aware Sheaf Repulsion to systematically uncover such competing mechanisms by penalizing structural overlap across discovery runs. We identify an ultra-sparse three-edge sheaf in which no edge is individually indispensable. We propose the Distributive Dense Circuit Hypothesis and provide a theoretical analysis showing that non-unique, low-overlap circuit explanations arise naturally from high-dimensional superposition under mild assumptions.

What carries the argument

Overlap-Aware Sheaf Repulsion, an augmentation to the circuit or sheaf discovery objective that adds an explicit penalty on structural overlap between multiple independent discovery runs.

Load-bearing premise

That the overlap penalty uncovers genuinely different mechanisms rather than artifacts created by the penalty itself.

What would settle it

An experiment in which every high-performing low-overlap circuit recovered by the method still shares a small core set of edges whose removal disables the task.

Figures

Figures reproduced from arXiv: 2605.12671 by Bangwei Guo, Dimitris N. Metaxas, Gerald Penn, Jingcheng Niu, Jinman Zhao, Mingyu Jin, Xi Chen, Yutao Yue, Yutong Yin, Zhaoran Wang.

**Figure 1.** Figure 1: Illustration of our findings. We can identify multiple distinct circuits or sheaves that perform the same LLM task. non-canonical and call for a rethinking of how CSD results should be interpreted and evaluated.1 1. Introduction Nowadays, circuit and sheaf discovery (CSD; Wang et al., 2022a; Conmy et al., 2023; Syed et al., 2024; Yu et al., 2025, inter alia) have emerged as promising directions in mechani… view at source ↗

**Figure 2.** Figure 2: Analysis of structural differences between two IOI sheaves. The two sheaves exhibit markedly different layer-wise edge distributions despite identical task performance, indicating that the observed low overlap is not simply due to a trivial reparameterisation or rotation of model components. edges or components may remain largely inactive during normal inference and only become engaged when the primary m… view at source ↗

**Figure 3.** Figure 3: EAP circuits’ task performance and pairwise overlap as a function of the top-k selection threshold. increases ( [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: The three-edge IOI circuit under zero ablation. structure, while essential computation is concentrated in a compact, indispensable subcircuit. In this section, we show that this weaker hypothesis also fails by identifying an ultrasparse three-edge sheaf for IOI and demonstrating that none of its edges are individually indispensable. Focusing on IOI, we intersect edge sets across multiple independently dis… view at source ↗

**Figure 4.** Figure 4: The sheaf profiles of iterative intersections and unions of sheaves discovered by DiscoGP with OASR for IOI. encourages retention of edges that preserve global output alignment rather than task-specific computation. When we replace the KL-divergence objective with the same taskspecific loss used by DiscoGP, EP exhibits substantially lower circuit consistency (sometimes even exceeding the diversity induced… view at source ↗

**Figure 6.** Figure 6: Edge pruning viewed as structured removal of residual-stream terms at the level of attention head components. Circuit Pruning Over Residual-Stream Edges For the purposes of the proof, the distinction between circuits and sheaves is immaterial. Without loss of generality, we use the term circuit throughout. Under the edge-additive view in Eq. (7), circuit pruning is implemented by selectively removing resid… view at source ↗

**Figure 7.** Figure 7: Example prompts from each dataset together with their correct and incorrect continuations. 28 [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: The distributions of incoming edge count across layers and different types of components in pairs of sheaves discovered. 39 [PITH_FULL_IMAGE:figures/full_fig_p039_8.png] view at source ↗

**Figure 9.** Figure 9: We visualize the computational graphs of the sheaves discovered for the IOI and DNA tasks. When the computation graph is rendered at a very fine granularity (e.g., with explicit Q/K/V/O decomposition), the resulting edge density leads to severe visual clutter that obscures the global structure. Note that this also results in fewer edge counts by re-merging the Q/K/V/O nodes. In contrast, representing the m… view at source ↗

**Figure 10.** Figure 10: We visualize the computational graphs of the sheaves discovered for the AGA and ANA tasks. The sparseness of the overlapping edges supports the similar argument as given above for [PITH_FULL_IMAGE:figures/full_fig_p041_10.png] view at source ↗

**Figure 11.** Figure 11: We visualize the computational graphs of the sheaves discovered for the DNA ia and Docstring tasks. The same conclusion still holds for these two tasks given the visual similarity of the connectivity patterns across all sheaf visualizations. 42 [PITH_FULL_IMAGE:figures/full_fig_p042_11.png] view at source ↗

read the original abstract

In this paper, we present empirical and theoretical evidence against a central but largely implicit assumption in circuit and sheaf discovery (CSD), which we term the Functional Anisotropy Hypothesis: the idea that functions in large language models (LLMs) are localised to a unique or near-unique internal mechanism. We show that a single LLM task can instead be supported by multiple, structurally distinct circuits or sheaves that are simultaneously faithful, sparse, and complete. To systematically uncover such competing mechanisms, we introduce Overlap-Aware Sheaf Repulsion, a method that augments the CSD objective with an explicit penalty on structural overlap across multiple discovery runs, enabling the discovery of circuits or sheaves with strong task performance but minimal shared structure across a plethora of common CSD benchmarks. We find that this phenomenon becomes increasingly pronounced as the number of discovered sheaves grows and persists robustly across major CSD methods. We further identify an ultra-sparse three-edge sheaf and show that none of its edges is individually indispensable, undermining even weakened notions of canonical or essential components. To explain these findings, we propose a Distributive Dense Circuit Hypothesis and provide a theoretical analysis demonstrating that non-unique, low-overlap circuit explanations arise naturally from high-dimensional superposition under mild assumptions. Together, our results suggest that mechanistic explanations in LLMs are inherently non-canonical and call for a rethinking of how CSD results should be interpreted and evaluated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues that LLM tasks often have multiple distinct faithful circuits rather than one canonical one, using a new repulsion penalty to find them and some theory to explain why.

read the letter

The central claim is that circuit and sheaf discovery does not have to land on a single mechanism; the same task can be handled by several structurally different ones that all stay faithful, sparse, and complete. They add an Overlap-Aware Sheaf Repulsion term to the usual CSD objective so that repeated runs are pushed to avoid sharing edges, and they report that this works across several standard benchmarks and existing discovery methods. They also highlight one ultra-sparse three-edge sheaf in which no individual edge is required, which undercuts even soft versions of the idea that some components are essential. The Distributive Dense Circuit Hypothesis is offered as an explanation, with a sketch showing that high-dimensional superposition can produce low-overlap solutions under mild assumptions. That combination of a concrete penalty, empirical checks on multiple methods, and a distributional story is the actual new piece. The empirical pattern is worth seeing because it appears consistently as the number of discovered sheaves increases. The main soft spot is whether the repulsion term is doing more than just manufacturing diversity. If the penalized circuits lose task performance or completeness compared with the unpenalized baseline, then the multiplicity is partly an optimization artifact rather than evidence against functional anisotropy. The paper needs to show the performance numbers side-by-side and test whether the same spread appears without the penalty. The theoretical section is light on derivations, so it is hard to judge how mild the assumptions really are. This is aimed at people already running circuit discovery experiments. It is solid enough to go to referees; the question it raises is real even if the current evidence is only suggestive.

Referee Report

3 major / 2 minor

Summary. The manuscript presents empirical and theoretical evidence challenging the Functional Anisotropy Hypothesis in circuit and sheaf discovery (CSD) for LLMs. It claims that a single task can be supported by multiple structurally distinct circuits or sheaves that remain simultaneously faithful, sparse, and complete. To uncover these, the authors introduce Overlap-Aware Sheaf Repulsion, an augmentation to the CSD objective that penalizes structural overlap across discovery runs. They report an ultra-sparse three-edge sheaf with no individually indispensable edges and propose the Distributive Dense Circuit Hypothesis, supported by a theoretical analysis showing that non-unique low-overlap explanations arise naturally from high-dimensional superposition under mild assumptions.

Significance. If the central claims hold, the work would meaningfully shift mechanistic interpretability by showing that CSD outputs are inherently non-canonical. This would require the field to move beyond seeking unique mechanisms and toward systematic evaluation of multiplicity, with direct consequences for how faithfulness, sparsity, and completeness are assessed in practice. The Overlap-Aware Sheaf Repulsion method and the Distributive Dense Circuit Hypothesis constitute concrete contributions if the empirical performance is shown to be preserved.

major comments (3)

[Empirical results] Empirical results (CSD benchmarks section): the claim that the penalized circuits remain 'simultaneously faithful, sparse, and complete' is load-bearing. Direct quantitative comparisons of task performance (accuracy or loss) between standard CSD runs and Overlap-Aware Sheaf Repulsion runs are required; without them it is impossible to rule out that the observed structural distinctness trades off faithfulness or completeness.
[Theoretical analysis] Theoretical analysis (Distributive Dense Circuit Hypothesis section): the mild assumptions under which non-unique low-overlap circuits arise must be stated explicitly and shown to hold independently of the penalized objective. If the analysis is validated only inside the repulsion regime, the argument risks circularity with the empirical observations.
[Ultra-sparse sheaf results] Ultra-sparse three-edge sheaf (results subsection): the claim that none of its edges is individually indispensable is central to undermining even weakened notions of canonical components. The manuscript must report the precise task, the exact faithfulness and completeness metrics, and the ablation protocol used to establish this property.

minor comments (2)

[Preliminaries] The terms 'faithful', 'sparse', and 'complete' are invoked repeatedly but lack a single, formal definition early in the paper; a dedicated preliminaries subsection would improve clarity.
[Figures] Figure captions for the discovered sheaves should include the exact numerical values of the overlap penalty weight and the resulting task performance to allow direct assessment of the trade-off.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive report. We agree that strengthening the empirical comparisons, clarifying the theoretical assumptions, and expanding the ultra-sparse sheaf details will improve the manuscript. We address each major comment below and will incorporate the requested changes in the revised version.

read point-by-point responses

Referee: [Empirical results] Empirical results (CSD benchmarks section): the claim that the penalized circuits remain 'simultaneously faithful, sparse, and complete' is load-bearing. Direct quantitative comparisons of task performance (accuracy or loss) between standard CSD runs and Overlap-Aware Sheaf Repulsion runs are required; without them it is impossible to rule out that the observed structural distinctness trades off faithfulness or completeness.

Authors: We agree that side-by-side quantitative comparisons are necessary to confirm no performance trade-off. The original manuscript states that the repulsion-augmented runs preserve strong task performance, but we will add explicit tables in the CSD benchmarks section reporting accuracy and loss for both standard CSD and Overlap-Aware Sheaf Repulsion across all evaluated tasks, demonstrating that faithfulness and completeness metrics remain comparable. revision: yes
Referee: [Theoretical analysis] Theoretical analysis (Distributive Dense Circuit Hypothesis section): the mild assumptions under which non-unique low-overlap circuits arise must be stated explicitly and shown to hold independently of the penalized objective. If the analysis is validated only inside the repulsion regime, the argument risks circularity with the empirical observations.

Authors: The Distributive Dense Circuit Hypothesis analysis relies on properties of high-dimensional superposition (random subspace projections with bounded interference) that are independent of the discovery objective. In the revision we will explicitly enumerate these assumptions in the theoretical section and include a short proof sketch demonstrating that non-unique low-overlap explanations arise under the same conditions even without the repulsion term, thereby avoiding circularity. revision: yes
Referee: [Ultra-sparse sheaf results] Ultra-sparse three-edge sheaf (results subsection): the claim that none of its edges is individually indispensable is central to undermining even weakened notions of canonical components. The manuscript must report the precise task, the exact faithfulness and completeness metrics, and the ablation protocol used to establish this property.

Authors: We will expand the relevant results subsection to specify the exact task (the primary benchmark on which the three-edge sheaf was discovered), report the numerical faithfulness and completeness scores, and describe the ablation protocol in full: each of the three edges is removed in turn while measuring the resulting drop in task performance to confirm that no single edge is indispensable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces Overlap-Aware Sheaf Repulsion as a new augmentation to the CSD objective and reports empirical results from its application, then proposes the Distributive Dense Circuit Hypothesis with a separate theoretical analysis under mild assumptions on high-dimensional superposition. No load-bearing self-citations, self-definitional reductions, or fitted parameters renamed as predictions appear in the provided text. The empirical discovery of multiple low-overlap circuits follows directly from the stated purpose of the new penalty term, while the theoretical component is presented as independent justification rather than a restatement of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Since only the abstract is available, specific free parameters, axioms, and invented entities cannot be fully audited. The paper introduces the Overlap-Aware Sheaf Repulsion method and the Distributive Dense Circuit Hypothesis as key elements.

invented entities (1)

Distributive Dense Circuit Hypothesis no independent evidence
purpose: To explain non-unique, low-overlap circuit explanations arising from high-dimensional superposition
Proposed in the paper to account for the empirical findings of multiple faithful circuits.

pith-pipeline@v0.9.0 · 5585 in / 1379 out tokens · 126988 ms · 2026-05-14T20:48:46.251781+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

300 extracted references · 52 canonical work pages · 5 internal anchors

[1]

Abdelmalik, Philip and Peron, Emilie and Schnitzler, Johannes and Fontaine, Julie and Elfenkampera, Eva and Barbozaa, Philippe , year = 2018, journal =

2018
[2]

Proceedings of the 25th

Abdou, Mostafa and Kulmizev, Artur and Hershcovich, Daniel and Frank, Stella and Pavlick, Ellie and S. Proceedings of the 25th
[3]

From Language as System to Language as Use:
[4]

Ackema, Peter and Neeleman, Ad , year = 2004, publisher =

2004
[5]

Adger, David , year = 2003, publisher =

2003
[6]

5th International Conference on Learning Representations,

Adi, Yossi and Kermany, Einat and Belinkov, Yonatan and Lavi, Ofer and Goldberg, Yoav , year = 2017, publisher =. 5th International Conference on Learning Representations,

2017
[7]

Agarwal, Naman and Bullins, Brian and Hazan, Elad , year = 2017, eprint =

2017
[8]

Proceedings of the

Agarwal, Apoorv and Xie, Boyi and Vovsha, Ilia and Rambow, Owen and Passonneau, Rebecca , year = 2011, pages =. Proceedings of the

2011
[9]

Proceedings of the 59th

Aghajanyan, Armen and Gupta, Sonal and Zettlemoyer, Luke , year = 2021, pages =. Proceedings of the 59th

2021
[10]

Agirre, Eneko and Cer, Daniel and Diab, Mona and. Second
[11]

Agirre, Eneko and Cer, Daniel and Diab, Mona and. *
[12]

Proceedings of the 8th

Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and. Proceedings of the 8th
[13]

Proceedings of the 9th

Agirre, Eneko and Banea, Carmen and Cardie, Claire and Cer, Daniel and Diab, Mona and. Proceedings of the 9th

work page
[14]

Proceedings of the 10th

Agirre, Eneko and Banea, Carmen and Cer, Daniel and Diab, Mona and. Proceedings of the 10th
[15]

and Mandelker, Gershon N

Agrawal, Anup and Jaffe, Jeffrey F. and Mandelker, Gershon N. , year = 1992, journal =. 2328956 , eprinttype =

1992
[16]

, year = 2009, journal =

Ahern, Kenneth R. , year = 2009, journal =

work page 2009
[17]

Proceedings of the 25th

Akiba, Takuya and Sano, Shotaro and Yanase, Toshihiko and Ohta, Takeru and Koyama, Masanori , year = 2019, series =. Proceedings of the 25th

2019
[18]

Akmajian, Adrian , year = 2010, edition =

2010
[19]

Proceedings of the 41st

Aky. Proceedings of the 41st
[20]

Journal of Biomedical Informatics , volume =
[21]

Aldinger, Nadine , year = 2004, pages =

2004
[22]

Journal of Linguistics , volume =

Alexiadou, Artemis and Iord. Journal of Linguistics , volume =
[23]

Alexiadou, Artemis and Grimshaw, Jane , year = 2008, volume =. Working

2008
[24]

, year = 1981, pages =

Allen, James F. , year = 1981, pages =. In

1981
[25]

, year = 1983, journal =

Allen, James F. , year = 1983, journal =

1983
[26]

Ameisen, Emmanuel and Lindsey, Jack and Pearce, Adam and Gurnee, Wes and Turner, Nicholas L. and Chen, Brian and Citro, Craig and Abrahams, David and Carter, Shan and Hosmer, Basil and Marcus, Jonathan and Sklar, Michael and Templeton, Adly and Bricken, Trenton and McDougall, Callum and Cunningham, Hoagy and Henighan, Thomas and Jermyn, Adam and Jones, An...

2025
[27]

Pragmatics &

Amsili, Pascal and Beyssade, Claire , year = 2010, volume =. Pragmatics &

2010
[28]

Anderson, P. W. , year = 1972, journal =

work page 1972
[29]

Anscombre, Jean-Claude , year = 1990, journal =

work page 1990
[30]

arXiv , url =:2505.20063 , primaryclass =

Arad, Dana and Mueller, Aaron and Belinkov, Yonatan , year = 2025, number =. arXiv , url =:2505.20063 , primaryclass =

work page arXiv 2025
[31]

Proceedings of the 14th

Araki, Jun and Mulaffer, Lamana and Pandian, Arun and Yamakawa, Yukari and Oflazer, Kemal and Mitamura, Teruko , year = 2018, pages =. Proceedings of the 14th

work page 2018
[32]

Ariel, Mira , year = 1990, publisher =

work page 1990
[33]

Arora, Parul and Kumar, Himanshu and Panigrahi, Bijaya Ketan , year = 2020, journal =

work page 2020
[34]

Arora, Sanjeev and Liang, Yingyu and Ma, Tengyu , year = 2017, url =

work page 2017
[35]

International Conference on Learning Representations (ICLR) , author =

work page
[36]

Arts, Anja and Maes, Alfons and Noordman, Leo G M and Jansen, Carel , year = 2011, journal =

work page 2011
[37]

Aslam, Faheem and Awan, Tahir Mumtaz and Syed, Jabir Hussain and Kashif, Aisha and Parveen, Mahwish , year = 2020, journal =

work page 2020
[38]

Findings of the

Azaria, Amos and Mitchell, Tom , year = 2023, pages =. Findings of the

work page 2023
[39]

Harald and Milin, Petar and

Baayen, R. Harald and Milin, Petar and. Psychological Review , volume =

work page
[40]

Harald and Piepenbrock, Richard and

Baayen, R. Harald and Piepenbrock, Richard and

work page
[41]

Harald and Piepenbrock, Richard and Gulikers, L

Baayen, R. Harald and Piepenbrock, Richard and Gulikers, L
[42]

Harald and Shaoul, Cyrus and Willits, Jon and Ramscar, Michael , year = 2016, journal =

Baayen, R. Harald and Shaoul, Cyrus and Willits, Jon and Ramscar, Michael , year = 2016, journal =

2016
[43]

Harald and Davidson, Douglas J

Baayen, R. Harald and Davidson, Douglas J. and Bates, Douglas M. , year = 2008, journal =

2008
[44]

Harald and Smolka, Eva , year = 2019, url =

Baayen, R. Harald and Smolka, Eva , year = 2019, url =

2019
[45]

Harald and Hendrix, Peter and Ramscar, Michael , year = 2013, journal =

Baayen, R. Harald and Hendrix, Peter and Ramscar, Michael , year = 2013, journal =

2013
[46]

, year = 2016, journal =

Ba, Jimmy Lei and Kiros, Jamie Ryan and Hinton, Geoffrey E. , year = 2016, journal =

2016
[47]

Baldwin, Timothy and Bannard, Colin and Tanaka, Takaaki and Widdows, Dominic , year = 2003, pages =

work page 2003
[48]

Bally, Charles , year = 1950, edition =

1950
[49]

and Yap, Melvin J

Balota, David A. and Yap, Melvin J. and Hutchison, Keith A. and Cortese, Michael J. and Kessler, Brett and Loftis, Bjorn and Neely, James H. and Nelson, Douglas L. and Simpson, Greg B. and Treiman, Rebecca , year = 2007, journal =

2007
[50]

Bannard, Colin , year = 2005, journal =

2005
[51]

Bannard, Colin , year = 2007, pages =

2007
[52]

, year = 2020, journal =

Barkur, Gopalkrishna and Vibha and Kamath, Giridhar B. , year = 2020, journal =

2020
[53]

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL) , author =
[54]

Baroni, Marco and Bernardi, Raffaela and Zamparelli, Roberto , year = 2014, journal =

2014
[55]

Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP) , author =

2010
[56]

Linguistica E Modelli Tecnologici Di Ricerca (

Baroni, Marco and Guevara, Emiliano and Pirrelli, Vito , year = 2009, pages =. Linguistica E Modelli Tecnologici Di Ricerca (

2009
[57]

Les Pr\'epositions:

Bartning, Inge , year = 1993, series =. Les Pr\'epositions:

1993
[58]

Bates, Douglas , year = 2005, journal =

2005
[59]

Journal of Statistical Software , volume =

Bates, Douglas and M. Journal of Statistical Software , volume =
[60]

, year = 2001, publisher =

Batterman, Robert W. , year = 2001, publisher =

2001
[61]

Language Typology and Language Universals , author =
[62]

Bauer, Laurie , year = 2009, pages =. The

2009
[63]

arXiv , url =:2310.03084 , primaryclass =

Bayazit, Deniz and Foroutan, Negar and Chen, Zeming and Weiss, Gail and Bosselut, Antoine , year = 2023, number =. arXiv , url =:2310.03084 , primaryclass =

work page arXiv 2023
[64]

Proceedings of the 11th

Baziotis, Christos and Pelekis, Nikos and Doulkeridis, Christos , year = 2017, pages =. Proceedings of the 11th

2017
[65]

Proceedings of

Baziotis, Christos and Nikolaos, Athanasiou and Chronopoulou, Alexandra and Kolovou, Athanasia and Paraskevopoulos, Georgios and Ellinas, Nikolaos and Narayanan, Shrikanth and Potamianos, Alexandros , year = 2018, pages =. Proceedings of

2018
[66]

Becker, Gilbert , year = 1986, journal =

1986
[67]

Bell, Melanie J. and Sch. Morphology , volume =
[68]

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Bengio, Yoshua and L. arXiv , url =:1308.3432 , primaryclass =

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Advances in Neural Information Processing Systems (NeurIPS) , author =
[70]

Probl\`emes De Linguistique G\'en\'erale , author =
[71]

The Twelfth International Conference on Learning Representations (ICLR) , author =
[72]

Bergstra, James and Bengio, Yoshua , pages =
[73]

Bernardy, Jean-Philippe and Lappin, Shalom , year = 2017, journal =

2017
[74]

Proceedings of the

Bethard, Steven and Kolomiyets, Oleksandr and Moens, Marie-Francine , year = 2012, pages =. Proceedings of the

2012
[75]

Bethard, Steven , year = 2013, pages =. Second

2013
[76]

Bethard, Steven John , year = 2007, school =

2007
[77]

Proceedings of the 9th

Bethard, Steven and Derczynski, Leon and Savova, Guergana and Pustejovsky, James and Verhagen, Marc , year = 2015, pages =. Proceedings of the 9th

2015
[78]

Proceedings of the 10th

Bethard, Steven and Savova, Guergana and Chen, Wei-Te and Derczynski, Leon and Pustejovsky, James and Verhagen, Marc , year = 2016, pages =. Proceedings of the 10th

2016
[79]

and Klingenstein, Sara , year = 2007, pages =

Bethard, Steven and Martin, James H. and Klingenstein, Sara , year = 2007, pages =. International

2007
[80]

and Castles, Anne and Coltheart, Max and Kezilas, Yvette and Grainger, Jonathan , year = 2016, journal =

Beyersmann, Elisabeth and Ziegler, Johannes C. and Castles, Anne and Coltheart, Max and Kezilas, Yvette and Grainger, Jonathan , year = 2016, journal =

2016

Showing first 80 references.