arxiv: 2605.10396 · v1 · submitted 2026-05-11 · 💻 cs.LG · cs.NE

Recognition: 2 theorem links

· Lean Theorem

Causal Explanations from the Geometric Properties of ReLU Neural Networks

Hector Woods , Philippa Ryan , Rob Alexander

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:06 UTC · model grok-4.3

classification 💻 cs.LG cs.NE

keywords ReLU neural networkscausal explanationsgeometric propertiespiecewise linear functionspolytopal regionsXAIinterpretabilitysafety verification

0 comments

The pith

ReLU neural networks divide input space into polytopal regions that directly yield accurate causal explanations for decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that ReLU networks represent input space as a collection of convex polytopes, each applying its own linear function to the outputs. Causal explanations for a given decision can be read off by locating the relevant region and identifying the hyperplanes that bound it. This method extracts rules straight from the network's geometry rather than training a separate simplified model. Because the rules come from the exact piecewise linear structure, they match the original network's computations without any loss of fidelity. The approach therefore supports reliable safety checks for learned policies in autonomous systems.

Core claim

A ReLU network corresponds to a piecewise linear function divided into regions defined by an n-dimensional convex polytope, and this geometric representation can be used to generate causal explanations for the network's behaviour by extracting rules directly from the geometry, which is therefore an accurate reflection of the network's behaviour.

What carries the argument

The partitioning of the input space into convex polytopal regions, inside each of which every output neuron applies a fixed linear function.

Load-bearing premise

The polytopal regions and their associated linear functions inside a ReLU network correspond to causally meaningful factors that can be extracted as human-interpretable explanations without additional assumptions about the task or data.

What would settle it

An input point for which the causal rule derived from its containing polytope and bounding hyperplanes produces a different output value or decision than the actual forward pass of the network.

read the original abstract

Neural networks have proved an effective means of learning control policies for autonomous systems, but these learned policies are difficult to understand due to the black-box nature of neural networks. This lack of interpretability makes safety assurance for such autonomous systems challenging. The fields of eXplainable Artificial Intelligence (XAI) and eXplainable Reinforcement Learning (XRL) aim to interpret the decision making processes of neural networks and autonomous agents, respectively. In particular, work on causal explanations aims to provide "why" and "why not" explanations for why a model made a given decision. However, most of the work on explainability to date utilises a distilled version of the original model. While this distilled policy is interpretable, it necessarily degrades in performance significantly when compared to the original model, and is not guaranteed to be an accurate reflection of the decision making processes in the original model and as such cannot be used to guarantee its safety. Recent work on understanding the geometry of ReLU neural networks shows that a ReLU network corresponds to a piecewise linear function divided into regions defined by an n-dimensional convex polytope. Through this lens, a neural network can be understood as dividing the input space into distinct regions which apply a single linear function for each output neuron. We show that this geometric representation can be used to generate causal explanations for the network's behaviour similar to previous work, but which extracts rules directly from the geometry of Neural Networks with the ReLU activation function, and is therefore an accurate reflection of the network's behaviour.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches using ReLU polytope geometry for faithful causal explanations but never supplies the extraction step or any evidence that the geometry carries causal content.

read the letter

The central claim is that you can generate causal explanations for ReLU policies by reading the input-space polytopes and their associated linear pieces directly, avoiding the fidelity loss that comes with distillation. That framing is the only real novelty here. It correctly flags that surrogate models can diverge from the original network and therefore cannot support strong safety arguments for autonomous control. The motivation around safety assurance is straightforward and on target. Citing the existing geometry results on piecewise-linear ReLU functions is also fair; those results do give an exact partition of the input space into regions with fixed linear maps. Beyond that, the paper does not deliver much. The geometry supplies a precise functional description of the network, but nothing in the manuscript shows how to turn a polytope boundary or a set of linear coefficients into a causal claim. The boundaries are determined by the learned weights alone and do not encode interventions or counterfactuals. The text never provides an algorithm, a toy example, or even a sketch of how one would map geometric features onto a structural causal model. As a result the promise that the extracted rules will be “accurate reflections” of causal behaviour stays untested. A reader already working on geometric analysis of ReLUs might pick up the suggestion as a prompt for later work. Anyone looking for a usable method in XRL will find the manuscript too thin. The work is not yet ready for peer review; it needs the concrete extraction procedure and at least a small-scale validation before it merits referee time.

Referee Report

2 major / 0 minor

Summary. The paper claims that the geometric decomposition of ReLU networks into polytopal regions of input space, each governed by a single linear map, can be directly mined to produce causal 'why' and 'why not' explanations that are faithful to the original network, avoiding the fidelity loss of distilled surrogate models for safety-critical autonomous control policies.

Significance. If the geometric-to-causal mapping were made explicit and validated, the approach would usefully extend existing geometric analyses of ReLU networks to the causal-explanation setting in XAI/XRL, preserving exact piecewise-linear behavior. The manuscript correctly highlights the limitations of distillation-based methods and grounds its proposal in the established polytope characterization of ReLU activations.

major comments (2)

[Abstract] Abstract: the assertion that the geometric representation 'can be used to generate causal explanations' and 'extracts rules directly from the geometry' is presented without any derivation, algorithm, pseudocode, or worked example showing how polytopal boundaries or per-region linear coefficients are mapped onto causal interventions, counterfactuals, or a structural causal model.
[Abstract] Abstract and stated weakest assumption: the claim that the hyperplane boundaries and linear functions inside each polytope 'correspond to causally meaningful factors' is asserted but not justified; the boundaries are determined solely by learned weights and pre-activation thresholds, supplying an exact functional partition rather than an explicit causal graph or do-operator semantics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive report. The comments correctly identify that the abstract and core claim would benefit from greater explicitness in mapping geometry to causal semantics. We will revise the manuscript to include the requested derivations, algorithm, and worked example while preserving the central contribution that the exact polytopal decomposition yields faithful explanations without surrogate fidelity loss.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that the geometric representation 'can be used to generate causal explanations' and 'extracts rules directly from the geometry' is presented without any derivation, algorithm, pseudocode, or worked example showing how polytopal boundaries or per-region linear coefficients are mapped onto causal interventions, counterfactuals, or a structural causal model.

Authors: We accept the observation. The current manuscript establishes the geometric equivalence and contrasts it with distillation methods but does not yet supply the explicit extraction procedure. In revision we will insert a dedicated subsection containing (i) a formal mapping from per-region affine coefficients to local causal effects, (ii) pseudocode for enumerating 'why' attributions via feature weights and 'why-not' counterfactuals via adjacent-polytope boundary crossings, and (iii) a fully worked numerical example on a two-layer ReLU policy network that demonstrates the generated explanations match the original network's piecewise-linear behavior exactly. revision: yes
Referee: [Abstract] Abstract and stated weakest assumption: the claim that the hyperplane boundaries and linear functions inside each polytope 'correspond to causally meaningful factors' is asserted but not justified; the boundaries are determined solely by learned weights and pre-activation thresholds, supplying an exact functional partition rather than an explicit causal graph or do-operator semantics.

Authors: We agree that the partitions are induced by the network's learned parameters and therefore constitute a functional rather than an exogenous causal graph. In the revision we will explicitly qualify the scope of our causal claims: the hyperplanes delineate changes in the network's internal activation pattern, which, within the model's own computation, function as intervention points. Crossing a boundary corresponds to a do-intervention on the relevant pre-activation that alters downstream linear maps. We will add a short discussion distinguishing this model-internal notion of causality from full structural causal model discovery and will cite the relevant literature on causal abstraction in neural networks to ground the terminology. revision: partial

Circularity Check

0 steps flagged

No circularity: geometric representation cited as external input; causal extraction presented as new application without self-referential reduction

full rationale

The paper attributes the polytopal decomposition and piecewise-linear structure of ReLU networks to 'recent work on understanding the geometry of ReLU neural networks' without re-deriving or fitting those properties inside the present manuscript. The central move—extracting rules directly from the input-space regions and their associated linear maps—is described as a novel way to produce explanations that remain faithful to the original network, but this step does not define any quantity in terms of itself, rename a fitted parameter as a prediction, or rest on a load-bearing self-citation whose content is unverified. No equations appear in the provided text that would create a self-definitional loop, and the distinction between functional fidelity and causal semantics is an interpretive claim rather than a circular derivation. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that ReLU networks admit a faithful piecewise-linear decomposition into polytopal regions whose linear pieces can be interpreted causally; this decomposition is imported from prior geometric work rather than derived here.

axioms (1)

domain assumption A ReLU network corresponds to a piecewise linear function divided into regions defined by an n-dimensional convex polytope.
Invoked in the abstract as the foundation for the explanation method; cited as 'recent work on understanding the geometry of ReLU neural networks'.

pith-pipeline@v0.9.0 · 5574 in / 1246 out tokens · 30389 ms · 2026-05-12T05:06:19.865676+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a ReLU network corresponds to a piecewise linear function divided into regions defined by an n-dimensional convex polytope... each neuron divides the input space by a hyperplane
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adjacent polytopes can be identified by flipping any of the bits in the bit vector... Hamming distance

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Y . Liu, C. Cole, C. Peterson, and M. Kirby, ‘ReLU Neural Networks, Polyhedral Decompositions, and Persistent Homology’, 2023

work page 2023
[2]

G. Katz, C. Barrett, D. Dill, K. Julian, and M. Kochenderfer, ‘Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks’, May 19, 2017, arXiv: arXiv:1702.01135. doi: 10.48550/arXiv.1702.01135

work page Pith review doi:10.48550/arxiv.1702.01135 2017
[3]

Jiang, G

P. Pukowski, J. Spoerhase, and H. Lu, ‘SkelEx and BoundEx - Geo- metrical Framework for Interpretable ReLU Neural Networks’, in 2024 International Joint Conference on Neural Networks (IJCNN), Jun. 2024, pp. 1–8. doi: 10.1109/IJCNN60899.2024.10650882

work page doi:10.1109/ijcnn60899.2024.10650882 2024
[5]

J. A. Vincent and M. Schwager, ‘Reachable Polyhedral Marching (RPM): An Exact Analysis Tool for Deep-Learned Control Systems’, 2022, doi: 10.48550/ARXIV .2210.08339

work page internal anchor Pith review doi:10.48550/arxiv 2022
[6]

X. Yang, T. T. Johnson, H.-D. Tran, T. Yamaguchi, B. Hoxha, and D. Prokhorov, ‘Reachability analysis of deep ReLU neural networks using facet-vertex incidence’, in Proceedings of the 24th International Conference on Hybrid Systems: Computation and Control, Nashville Tennessee: ACM, May 2021, pp. 1–7. doi: 10.1145/3447928.3456650

work page doi:10.1145/3447928.3456650 2021
[7]

S. Xu, J. Vaughan, J. Chen, A. Zhang, and A. Sudjianto, ‘Traversing the Local polytopes of ReLU Neural Networks: A Unified Approach for Network Verification’, ArXiv, Nov. 2021,

work page 2021
[8]

Madumal, T

P. Madumal, T. Miller, L. Sonenberg, and F. Vetere, ‘Explainable Reinforcement Learning Through a Causal Lens’, Nov. 20, 2019, arXiv: arXiv:1905.10958

work page arXiv 2019
[9]

Puiutta, E

E. Puiutta and E. M. Veith, ‘Explainable Reinforcement Learning: A Survey’, May 13, 2020, arXiv: arXiv:2005.06247

work page arXiv 2020
[10]

Chakraborti, S

T. Chakraborti, S. Sreedharan, Y . Zhang, and S. Kambhampati, ‘Plan Explanations as Model Reconciliation: Moving Beyond Explanation as Soliloquy’, in Proceedings of the Twenty-Sixth International Joint Con- ference on Artificial Intelligence, Melbourne, Australia: International Joint Conferences on Artificial Intelligence Organization, Aug. 2017, pp. 156–...

work page doi:10.24963/ijcai.2017/23 2017
[11]

Sakai, K

T. Sakai, K. Miyazawa, T. Horii, and T. Nagai, ‘A Framework of Explanation Generation toward Reliable Autonomous Robots’, May 06, 2021, arXiv: arXiv:2105.02670

work page arXiv 2021
[12]

A. Bobu, A. Peng, P. Agrawal, J. Shah, and A. D. Dragan, ‘Aligning Robot and Human Representations’, Jan. 28, 2024, arXiv: arXiv:2302.01928

work page arXiv 2024
[13]

International Maritime Organisation, ‘Development of a Goal-Based Instrument for Maritime Autonomous Surface Ships (MASS)’, MSC 108/4, 13 February 2024

work page 2024
[14]

M. J. Villani et al., ‘PICE: Polyhedral Complex Informed Counter- factual Explanations’, Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, vol. 7, no. 1, Art. no. 1, Oct. 2024, doi: 10.1609/aies.v7i1.31742

work page doi:10.1609/aies.v7i1.31742 2024
[15]

Hanin and D

B. Hanin and D. Rolnick, ‘Deep ReLU Networks Have Surprisingly Few Activation Patterns’, Oct. 20, 2019, arXiv: arXiv:1906.00904. doi: 10.48550/arXiv.1906.00904

work page doi:10.48550/arxiv.1906.00904 2019
[16]

R. P. Stanley, ‘An Introduction to Hyperplane Arrangements’

work page
[17]

arXiv preprint arXiv:2401.11188 , year =

R. Balestriero and Y . LeCun, ‘Fast and Exact Enumeration of Deep Networks Partitions Regions’, Jan. 20, 2024, arXiv: arXiv:2401.11188. doi: 10.48550/arXiv.2401.11188

work page doi:10.48550/arxiv.2401.11188 2024
[18]

Proceedings of the ACM on Programming Languages , volume =

G. Singh, T. Gehr, M. P ¨uschel, and M. Vechev, “An abstract domain for certifying neural networks,” vol. 3, no. POPL, pp. 1–30, Jan. 2019, doi: https://doi.org/10.1145/3290354

work page doi:10.1145/3290354 2019