CUBE: Contrastive Understanding by Balanced Experiments

Dongseok Kim; Gisung Oh; Hyoungsun Choi; Mohamed Jismy Aashik Rasool

arxiv: 2509.10825 · v5 · pith:ECTCUHKRnew · submitted 2025-09-13 · 💻 cs.LG · cs.AI· stat.ML

CUBE: Contrastive Understanding by Balanced Experiments

Dongseok Kim , Hyoungsun Choi , Mohamed Jismy Aashik Rasool , Gisung Oh This is my paper

Pith reviewed 2026-05-21 22:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords post-hoc explanationsfactorial designblack-box modelsmodel interpretabilitycontrastive analysistabular dataquery-efficient explanations

0 comments

The pith

CUBE explains black-box models by summarizing responses to balanced low-high probes as factorial effects.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CUBE as a method that brings factorial experimental design to the analysis of trained predictors. It queries the model on balanced combinations of low and high feature values and decomposes the outputs into main effects and pairwise interactions. These are read as controlled contrasts within a chosen region of the input space. A sympathetic reader would care because the approach offers a structured way to trace how features drive predictions, while also exposing when fewer queries still allow reliable recovery of those drivers.

Core claim

CUBE evaluates a trained predictor on balanced low-high probe combinations and summarizes the responses as factorial effects. Main effects and pairwise interactions are interpreted as controlled contrasts on a specified explanation region. Complete factorial probes identify these effects exactly on the selected design space, while fractional probes reduce query cost and expose aliasing and resolution constraints.

What carries the argument

Factorial experimental design that generates balanced low-high probe combinations and extracts main effects plus pairwise interactions as contrasts on the explanation region.

If this is right

Experiments on synthetic and real tabular tasks recover the dominant learned effect structure.
The method clarifies the identifiability limits of query-efficient explanations.
Fractional probes lower the number of model queries while revealing aliasing between effects.
Complete designs give exact identification of effects on the chosen design space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Different choices of the explanation region could surface different dominant effects in the same model.
The approach might extend to models trained on non-tabular inputs by adapting how probes are generated.
Comparing effect structures across multiple models on identical probe sets could highlight behavioral differences.

Load-bearing premise

Responses to balanced low-high probe combinations can be summarized as factorial main effects and interactions that accurately represent the model's learned behavior on the chosen explanation region.

What would settle it

Direct comparison on a model with known ground-truth structure, such as a linear model or low-order polynomial, where the main effects and interactions recovered by CUBE fail to match the true coefficients or interaction terms.

Figures

Figures reproduced from arXiv: 2509.10825 by Dongseok Kim, Gisung Oh, Hyoungsun Choi, Mohamed Jismy Aashik Rasool.

**Figure 1.** Figure 1: Grouped bar plots of mean interaction metrics across datasets. Each panel shows one metric (NDCG@K, Peak-IoU@q, Xfer-NDCG@K, CCC, IG@K,B with K = 5, q= 0.10, B = 3); within each dataset, bars compare ORACLE to SHAP-family interaction methods. Higher values indicate better interaction detection, localization, transfer, scale agreement, or intervention utility. 0 2000 4000 6000 8000 Frequency 120 122 124 126… view at source ↗

**Figure 2.** Figure 2: Classical ORACLE main-effect plots for the Airfoil dataset (Backbone A). Each panel shows the marginal response µ + mj (xj ) as a function of the bin centers for feature j, with all other features integrated out under the empirical input distribution. low- to medium-dimensional tabular tasks and MLP architectures, so the conclusions may not directly transfer to high-dimensional vision or language models w… view at source ↗

**Figure 3.** Figure 3: Classical ORACLE interaction plots for the Airfoil dataset (Backbone A). Rows and columns correspond to factors; each upper-triangular panel shows the predicted SPL across bins of the column factor for three representative bins (Bin 0, Bin 2, Bin 4) of the row factor. Non-parallel or crossing lines indicate strong non-additive interactions, while nearly parallel lines correspond to approximately additive b… view at source ↗

read the original abstract

Explaining a trained model requires a clear account of how explanatory evidence is generated. We propose CUBE, a post-hoc explanation framework that brings factorial experimental design to black-box model analysis. CUBE evaluates a trained predictor on balanced low--high probe combinations and summarizes the responses as factorial effects. Main effects and pairwise interactions are interpreted as controlled contrasts on a specified explanation region. Complete factorial probes identify these effects exactly on the selected design space, while fractional probes reduce query cost and expose aliasing and resolution constraints. Experiments on synthetic and real tabular tasks show that CUBE recovers dominant learned effect structure and clarifies the identifiability limits of query-efficient explanations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CUBE adapts factorial designs for post-hoc tabular explanations but the abstract leaves the recovery claims without visible quantitative backing.

read the letter

Hey, the main point is that this paper takes factorial experimental design and applies it to probe black-box models with balanced low-high combinations, then summarizes the outputs as main effects and interactions for explanations on tabular data. Complete factorials give exact effects on the design space while fractional ones cut queries and flag aliasing. That framing of controlled contrasts and identifiability limits is the clearest new piece. It is not something I have seen packaged this way in the usual explanation literature. The approach also keeps the explanation region explicit, which is a small but useful discipline. The experiments are said to recover dominant structure on synthetic and real tasks, and if the full paper shows clean metrics against ground truth or sensible baselines that would be the part worth checking. The soft spot is exactly the one the stress test flags: effects recovered at those discrete probe points do not automatically tell you what the model has learned everywhere else, especially if higher-order terms or behavior outside the low-high levels matter. The abstract does not give numbers, error bars, or comparison methods, so it is hard to judge how large that gap actually is in practice. This is aimed at interpretability researchers and auditors who already work with tabular models and want a more structured statistical lens rather than local approximations. A serious referee should see it because the core idea is coherent and the design concepts are imported cleanly, even though the empirical support needs tightening to match the claims.

Referee Report

2 major / 1 minor

Summary. The paper proposes CUBE, a post-hoc explanation framework that imports factorial experimental design to analyze black-box models. It evaluates a trained predictor on balanced low-high probe combinations within a specified explanation region and summarizes responses as main effects and pairwise interactions, interpreted as controlled contrasts. Complete factorial probes are claimed to identify effects exactly on the design space, while fractional probes reduce query cost and expose aliasing; experiments on synthetic and real tabular tasks are stated to recover dominant learned effect structure and clarify identifiability limits of query-efficient explanations.

Significance. If the central empirical claims hold, CUBE would offer a structured, design-theoretic approach to model explanation that could improve control and transparency over perturbation-based methods in XAI, particularly for tabular data. It usefully emphasizes trade-offs between query efficiency and effect resolution.

major comments (2)

Abstract: the claim that 'experiments on synthetic and real tabular tasks show that CUBE recovers dominant learned effect structure' is load-bearing for the paper's contribution, yet the abstract (and available text) provides no metrics, baselines, error analysis, or description of how recovery or dominance was quantified or validated.
Framework and Experiments sections: the assumption that main effects and interactions recovered from balanced low-high probes accurately represent the model's learned behavior is central to the recovery claim, but the manuscript does not address or test whether higher-order interactions, non-additive behavior, or sensitivity outside the chosen probe levels and region would invalidate the low-order summary.

minor comments (1)

Clarify notation for the explanation region, probe levels, and effect definitions (e.g., how main effect A is computed from responses) before presenting equations or results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. These points help clarify the scope of our claims and the presentation of experimental validation. We address each major comment below and indicate the revisions we will incorporate.

read point-by-point responses

Referee: Abstract: the claim that 'experiments on synthetic and real tabular tasks show that CUBE recovers dominant learned effect structure' is load-bearing for the paper's contribution, yet the abstract (and available text) provides no metrics, baselines, error analysis, or description of how recovery or dominance was quantified or validated.

Authors: We agree that the abstract would benefit from greater specificity on this central claim. The Experiments section quantifies recovery on synthetic tasks (where ground-truth effects are known) using effect-size correlations and dominance ranking accuracy, and validates real tabular tasks via consistency with domain knowledge and comparisons to other post-hoc methods. We will revise the abstract to include a concise clause summarizing the evaluation approach and key quantitative findings. revision: yes
Referee: Framework and Experiments sections: the assumption that main effects and interactions recovered from balanced low-high probes accurately represent the model's learned behavior is central to the recovery claim, but the manuscript does not address or test whether higher-order interactions, non-additive behavior, or sensitivity outside the chosen probe levels and region would invalidate the low-order summary.

Authors: This observation correctly identifies a boundary of our current claims. CUBE recovers exact main effects and pairwise interactions on the discrete design space defined by the chosen probe levels and region (as guaranteed by complete factorial designs), but does not claim to capture higher-order terms or behavior outside that space. Fractional designs already surface aliasing of higher-order effects. We will add an explicit limitations paragraph to the Framework section stating these scope conditions and include a targeted synthetic experiment in the Experiments section that perturbs probe levels to assess stability of the low-order summary. revision: yes

Circularity Check

0 steps flagged

CUBE applies external factorial design concepts without internal reduction or self-referential derivation

full rationale

The paper presents CUBE as a post-hoc framework that imports standard factorial experimental design to evaluate black-box models on balanced low-high probes and summarize responses as main effects and interactions. These summaries are defined via established statistical contrasts on a chosen design space, with complete factorials identifying effects exactly on that space and fractional designs exposing aliasing. No equations or claims reduce the recovered effect structure to parameters fitted inside the paper itself; the method is self-contained by relying on external experimental design principles and external benchmarks from synthetic and real tabular experiments. The central claims about identifiability limits and dominant structure recovery do not loop back to self-defined inputs or self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on standard black-box access assumptions common to post-hoc explanation work and imports factorial design concepts from statistics without introducing new fitted parameters or postulated entities in the abstract.

axioms (1)

domain assumption The trained predictor can be queried arbitrarily as a deterministic black-box function on chosen input points.
Required for the probing step that generates the factorial responses.

pith-pipeline@v0.9.0 · 5645 in / 1275 out tokens · 61976 ms · 2026-05-21T22:43:06.473340+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We instead revisit the DoE perspective and treat a trained neural network as a black-box response surface on which a factorial ANOVA surrogate is fitted. Our framework, ORACLE, learns an orthogonal surrogate on a discretized input grid and applies residual centering and µ-rebalancing to recover main-effect and pairwise-interaction tables
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the oracle ANOVA decomposition is f(x) = µ + Σ mj(xj) + Σ gjk(xj,xk) + r(x) with gjk the pairwise interaction component

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages

[1]

Proof of (1).For eachj, E[mj(Xj)] =E E[f(X)|X j]−µ =E[f(X)]−µ= 0, som j(Xj)is centered and depends only onX j, hencem j(Xj)∈H j

This decomposition is unique in the sense that if f(X) = ˜µ+ dX j=1 ˜mj(Xj) + X 1≤j<k≤d ˜gjk(Xj, Xk) + ˜r(X) with˜µ∈H 0,˜mj ∈H j,˜gjk ∈H jk, and˜r∈R, then˜µ=µ,˜m j =m j,˜gjk =g jk, and˜r=ralmost surely. Proof of (1).For eachj, E[mj(Xj)] =E E[f(X)|X j]−µ =E[f(X)]−µ= 0, som j(Xj)is centered and depends only onX j, hencem j(Xj)∈H j. Forg jk, note first that ...

work page
[2]

There exists an invertible matrix T∈R p×p such that the ANOVA design matrixΦ and the classical factorial design matrixXsatisfy Φ =XT

work page
[3]

The population ANOVA surrogatef L(Z) =β ⋆⊤ϕ(Z)from Proposition A.7 coincides withf ⋆(U)on the2 5 design: fL(Z) =f ⋆(U)almost surely

work page
[4]

In particular, for each pair(j, k), gL jk(Zj, Zk) =ϕ jk(Z) ⊤β⋆ jk =c jk γjk UjUk, for some nonzero constantc jk depending only on the chosen contrast scaling

The ANOVA coefficient blocks β⋆ 0 ,{β ⋆ j },{β ⋆ jk } are an invertible linear reparameterization of(β0,{β j},{γ jk }). In particular, for each pair(j, k), gL jk(Zj, Zk) =ϕ jk(Z) ⊤β⋆ jk =c jk γjk UjUk, for some nonzero constantc jk depending only on the chosen contrast scaling

work page
[5]

correctly specified

The target ANOVA interaction strengths satisfy S⋆ jk = gjk L2(PUj ,Uk ) ∝ |γ jk |, and the discrete strengthsS L jk from Appendix A.5 satisfy SL jk = gL jk L2(PZj ,Zk ) ∝ |γ jk |, with the same ordering over pairs(j, k). 21 ORACLE: Explaining Feature Interactions in Neural Networks with ANOV A Proof of 1. Each column of X is a {−1,+1} -valued contrast on ...

work page 2000
[6]

Ranking rows bySum Sqor the normalized%Totalquickly identifies the main effects and interactions that account for most of the ORACLE importance (here,Frequency,Suction side, and their interactions)

work page
[7]

The%Expl.column rescales contributions relative to the explained variance only (i.e., excluding the residual term), and is therefore useful for understanding how the most important terms share the portion of variation that the two-way ORACLE surrogate is actually able to capture

work page
[8]

The residual %Total provides a simple diagnostic of how faithfully the two-way decomposition approximates the full ORACLE surrogate: when the residual contribution is small, patterns in the main-effect and interaction plots can be read as describing most of the predictor’s behaviour; when the residual is large, it is safer to focus interpretation on the t...

work page 1980
[9]

This provides a quick visual sanity check that the surrogate decomposition is consistent

Compare thevertical rangein each panel; larger ranges indicate factors with stronger main effects and should roughly agree with the ordering in the ANOV A table. This provides a quick visual sanity check that the surrogate decomposition is consistent

work page
[10]

lower frequency ⇒ lower SPL

Inspect theshapeof each curve: near-linear trends suggest approximately additive, monotone effects that are easy to reason about and to intervene on (e.g., “lower frequency ⇒ lower SPL”), whereas pronounced curvature or non-monotonicity (as forAngle of attackandSuction side) reveals regimes where the effect changes sign or saturates

work page
[11]

Use the bin centers on the x-axis as candidate design points for one-factor-at-a-time interventions: by moving along the curve in Figure 2, practitioners can anticipate how much reduction in SPL is attainable by adjusting a single factor while keeping the others near their empirical distribution. These main-effect plots thus complement the ANOV A-style ta...

work page
[12]

This cross-check ensures that the numerical interaction ranking aligns with visually interpretable patterns

Start from the pairs with large interaction sums of squares in Table 9 and verify that their panels show clearly non-parallel or crossing lines. This cross-check ensures that the numerical interaction ranking aligns with visually interpretable patterns

work page
[13]

design grid

Within a given panel, treat the three lines (Bin 0, 2, 4) as a coarse “design grid”: steep changes in slope or reversals between lines mark regimes where simultaneous adjustment of both factors can yield larger gains than tuning either factor alone

work page
[14]

This helps focus attention on a small number of genuinely interacting pairs rather than over-interpreting small visual deviations

For panels with almost parallel lines, interpret the corresponding pair as approximately additive and fall back to the main-effect plots in Figure 2 when reasoning about interventions. This helps focus attention on a small number of genuinely interacting pairs rather than over-interpreting small visual deviations. Together, Figure 2 and Figure 3 provide a...

work page 1938

[1] [1]

Proof of (1).For eachj, E[mj(Xj)] =E E[f(X)|X j]−µ =E[f(X)]−µ= 0, som j(Xj)is centered and depends only onX j, hencem j(Xj)∈H j

This decomposition is unique in the sense that if f(X) = ˜µ+ dX j=1 ˜mj(Xj) + X 1≤j<k≤d ˜gjk(Xj, Xk) + ˜r(X) with˜µ∈H 0,˜mj ∈H j,˜gjk ∈H jk, and˜r∈R, then˜µ=µ,˜m j =m j,˜gjk =g jk, and˜r=ralmost surely. Proof of (1).For eachj, E[mj(Xj)] =E E[f(X)|X j]−µ =E[f(X)]−µ= 0, som j(Xj)is centered and depends only onX j, hencem j(Xj)∈H j. Forg jk, note first that ...

work page

[2] [2]

There exists an invertible matrix T∈R p×p such that the ANOVA design matrixΦ and the classical factorial design matrixXsatisfy Φ =XT

work page

[3] [3]

The population ANOVA surrogatef L(Z) =β ⋆⊤ϕ(Z)from Proposition A.7 coincides withf ⋆(U)on the2 5 design: fL(Z) =f ⋆(U)almost surely

work page

[4] [4]

In particular, for each pair(j, k), gL jk(Zj, Zk) =ϕ jk(Z) ⊤β⋆ jk =c jk γjk UjUk, for some nonzero constantc jk depending only on the chosen contrast scaling

The ANOVA coefficient blocks β⋆ 0 ,{β ⋆ j },{β ⋆ jk } are an invertible linear reparameterization of(β0,{β j},{γ jk }). In particular, for each pair(j, k), gL jk(Zj, Zk) =ϕ jk(Z) ⊤β⋆ jk =c jk γjk UjUk, for some nonzero constantc jk depending only on the chosen contrast scaling

work page

[5] [5]

correctly specified

The target ANOVA interaction strengths satisfy S⋆ jk = gjk L2(PUj ,Uk ) ∝ |γ jk |, and the discrete strengthsS L jk from Appendix A.5 satisfy SL jk = gL jk L2(PZj ,Zk ) ∝ |γ jk |, with the same ordering over pairs(j, k). 21 ORACLE: Explaining Feature Interactions in Neural Networks with ANOV A Proof of 1. Each column of X is a {−1,+1} -valued contrast on ...

work page 2000

[6] [6]

Ranking rows bySum Sqor the normalized%Totalquickly identifies the main effects and interactions that account for most of the ORACLE importance (here,Frequency,Suction side, and their interactions)

work page

[7] [7]

The%Expl.column rescales contributions relative to the explained variance only (i.e., excluding the residual term), and is therefore useful for understanding how the most important terms share the portion of variation that the two-way ORACLE surrogate is actually able to capture

work page

[8] [8]

The residual %Total provides a simple diagnostic of how faithfully the two-way decomposition approximates the full ORACLE surrogate: when the residual contribution is small, patterns in the main-effect and interaction plots can be read as describing most of the predictor’s behaviour; when the residual is large, it is safer to focus interpretation on the t...

work page 1980

[9] [9]

This provides a quick visual sanity check that the surrogate decomposition is consistent

Compare thevertical rangein each panel; larger ranges indicate factors with stronger main effects and should roughly agree with the ordering in the ANOV A table. This provides a quick visual sanity check that the surrogate decomposition is consistent

work page

[10] [10]

lower frequency ⇒ lower SPL

Inspect theshapeof each curve: near-linear trends suggest approximately additive, monotone effects that are easy to reason about and to intervene on (e.g., “lower frequency ⇒ lower SPL”), whereas pronounced curvature or non-monotonicity (as forAngle of attackandSuction side) reveals regimes where the effect changes sign or saturates

work page

[11] [11]

Use the bin centers on the x-axis as candidate design points for one-factor-at-a-time interventions: by moving along the curve in Figure 2, practitioners can anticipate how much reduction in SPL is attainable by adjusting a single factor while keeping the others near their empirical distribution. These main-effect plots thus complement the ANOV A-style ta...

work page

[12] [12]

This cross-check ensures that the numerical interaction ranking aligns with visually interpretable patterns

Start from the pairs with large interaction sums of squares in Table 9 and verify that their panels show clearly non-parallel or crossing lines. This cross-check ensures that the numerical interaction ranking aligns with visually interpretable patterns

work page

[13] [13]

design grid

Within a given panel, treat the three lines (Bin 0, 2, 4) as a coarse “design grid”: steep changes in slope or reversals between lines mark regimes where simultaneous adjustment of both factors can yield larger gains than tuning either factor alone

work page

[14] [14]

This helps focus attention on a small number of genuinely interacting pairs rather than over-interpreting small visual deviations

For panels with almost parallel lines, interpret the corresponding pair as approximately additive and fall back to the main-effect plots in Figure 2 when reasoning about interventions. This helps focus attention on a small number of genuinely interacting pairs rather than over-interpreting small visual deviations. Together, Figure 2 and Figure 3 provide a...

work page 1938