Recognition: unknown
TabSHAP
Pith reviewed 2026-05-10 00:00 UTC · model grok-4.3
The pith
TabSHAP adapts a Shapley-style estimator with Jensen-Shannon divergence on masked key-value fields to attribute how each feature shifts the full class distribution in LLM tabular classifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TabSHAP is a model-agnostic framework that applies a sampled-coalition approach inspired by Shapley values, where each feature's contribution is quantified by the Jensen-Shannon divergence between the class probability distributions obtained from the complete serialized input and from the input with that feature's key-value pair masked. This replaces traditional scalar probability shifts with a measure of how masking affects the entire output distribution, and the masking is performed at the semantic field level to respect tabular structure in the prompt.
What carries the argument
TabSHAP, a Shapley-style sampled-coalition estimator that computes Jensen-Shannon divergence between the full-input class distribution and the masked-input class distribution for each serialized key:value field.
If this is right
- Attributions capture impact on the model's full probabilistic output rather than only on the top prediction.
- Field-level masking aligns with the semantic units present in serialized tabular prompts.
- The method records higher deletion faithfulness than random baselines and XGBoost proxies on the Adult Income and Heart Disease benchmarks.
- The same test instances and settings can be reused to compare attribution quality when Jensen-Shannon divergence is replaced by KL divergence or L1 distance.
Where Pith is reading between the lines
- The distributional focus could help detect when an LLM relies on features that shift uncertainty without changing the top label.
- Practitioners might use the same masking procedure to compare how different serialization formats affect feature attributions for identical tables.
- The ablation results imply that metric choice matters and that Jensen-Shannon divergence may better preserve faithfulness than alternatives in this setting.
Load-bearing premise
Masking entire serialized key-value fields produces a faithful representation of the LLM's internal decision logic and Jensen-Shannon divergence on the resulting class distributions isolates causal feature contributions without prompt-specific artifacts.
What would settle it
If ordered deletion of features according to TabSHAP attributions yields no higher faithfulness scores than random or XGBoost-based orderings on the Adult Income and Heart Disease test instances, the claim of improved faithfulness would fail.
Figures
read the original abstract
Large Language Models (LLMs) fine-tuned on serialized tabular data are emerging as powerful alternatives to traditional tree-based models, particularly for heterogeneous or context-rich datasets. However, their deployment in high-stakes domains is hindered by a lack of faithful interpretability; existing methods often rely on global linear proxies or scalar probability shifts that fail to capture the model's full probabilistic uncertainty. In this work, we introduce TabSHAP, a model-agnostic interpretability framework designed to directly attribute local query decision logic in LLM-based tabular classifiers. By adapting a Shapley-style sampled-coalition estimator with Jensen-Shannon divergence between full-input and masked-input class distributions, TabSHAP quantifies the distributional impact of each feature rather than simple prediction flips. To align with tabular semantics, we mask at the level of serialized key:value fields (atomic in the prompt string), not individual subword tokens. Experimental validation on the Adult Income and Heart Disease benchmarks demonstrates that TabSHAP isolates critical diagnostic features, achieving significantly higher faithfulness than random baselines and XGBoost proxies. We further run a distance-metric ablation on the same test instances and TabSHAP settings: attributions are recomputed with KL or L1 replacing JSD in the similarity step (results cached per metric), and we compare deletion faithfulness across all three.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TabSHAP, a model-agnostic interpretability framework for LLMs fine-tuned on serialized tabular data. It adapts a Shapley-style sampled-coalition estimator, replacing the usual value function with Jensen-Shannon divergence between the class-probability distributions produced by the full prompt and by prompts in which individual serialized key:value fields have been masked. Experiments on the Adult Income and Heart Disease benchmarks report that the resulting attributions achieve higher deletion faithfulness than random baselines and than attributions derived from an XGBoost proxy; an ablation recomputes the same attributions under KL and L1 divergences and compares deletion faithfulness across the three metrics.
Significance. If the central claim holds, TabSHAP would supply a practical, distribution-aware attribution method that respects the serialized prompt format of tabular LLMs and avoids the reduction to scalar probability shifts common in prior work. The explicit comparison to both random and XGBoost baselines, together with the metric ablation, provides a falsifiable test of the method's advantage on public benchmarks.
major comments (3)
- [Experimental validation paragraph] The experimental validation paragraph states that TabSHAP 'achieves significantly higher faithfulness than random baselines and XGBoost proxies' yet supplies neither numerical faithfulness scores, standard errors, nor the precise deletion protocol (number of features removed per step, ordering criterion, or how masked prompts are constructed during the deletion test). Without these quantities it is impossible to judge whether the reported improvement is practically meaningful or statistically reliable.
- [Experimental validation paragraph] The central claim that JSD between full-input and masked-input class distributions isolates each feature's causal distributional impact rests on the untested assumption that masking entire serialized key:value fields is a semantically clean intervention. The distance-metric ablation varies only the divergence (JSD/KL/L1) while leaving the masking procedure fixed; no control experiments (e.g., null-token substitution, rephrasing, or random token masking) are reported to rule out prompt-formatting artifacts. This is the load-bearing link between the method and the faithfulness numbers.
- [Experimental validation paragraph] The abstract and experimental description claim superiority over 'XGBoost proxies' but do not specify how the XGBoost attributions are obtained (TreeSHAP, permutation importance, etc.) or whether the same deletion-faithfulness protocol is applied to both TabSHAP and the proxy. Direct comparability therefore cannot be verified from the given information.
minor comments (2)
- The phrase 'atomic in the prompt string' is used to justify field-level masking; a short clarifying sentence on tokenization boundaries would prevent readers from wondering whether subword splits ever cross field boundaries.
- The ablation description states that 'results [are] cached per metric'; a brief note on whether the same coalition samples are reused across the three divergences would clarify computational cost and reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and for highlighting the need for greater experimental transparency. We agree that the original experimental validation paragraph was insufficiently detailed and have revised the manuscript to supply the missing quantitative results, protocol specifications, and controls. Below we respond point by point.
read point-by-point responses
-
Referee: The experimental validation paragraph states that TabSHAP 'achieves significantly higher faithfulness than random baselines and XGBoost proxies' yet supplies neither numerical faithfulness scores, standard errors, nor the precise deletion protocol (number of features removed per step, ordering criterion, or how masked prompts are constructed during the deletion test). Without these quantities it is impossible to judge whether the reported improvement is practically meaningful or statistically reliable.
Authors: We agree that the original text omitted the necessary numerical results and protocol details. In the revised manuscript we have added a new table (Table 2) that reports deletion-faithfulness AUC scores (mean and standard error over five random seeds) for TabSHAP, random, and XGBoost-proxy attributions on both Adult Income and Heart Disease. Section 4.2 now specifies the exact protocol: features are ranked by absolute attribution value and removed one at a time in that order; each removal is performed by replacing the corresponding serialized key:value substring with a null token in the prompt; faithfulness is quantified as the area under the curve of the increase in Jensen-Shannon divergence between the original and the successively masked output distributions. revision: yes
-
Referee: The central claim that JSD between full-input and masked-input class distributions isolates each feature's causal distributional impact rests on the untested assumption that masking entire serialized key:value fields is a semantically clean intervention. The distance-metric ablation varies only the divergence (JSD/KL/L1) while leaving the masking procedure fixed; no control experiments (e.g., null-token substitution, rephrasing, or random token masking) are reported to rule out prompt-formatting artifacts. This is the load-bearing link between the method and the faithfulness numbers.
Authors: The concern about prompt-formatting artifacts is legitimate. While the original manuscript did not include explicit controls, we have added a new ablation subsection (Section 4.4) that compares our field-level masking against two controls: (i) random subword-token masking within the same fields and (ii) replacement of field names with semantically neutral placeholders. The field-level masking continues to produce higher deletion faithfulness than either control, supporting the claim that the intervention is relatively clean. We acknowledge that no finite set of controls can exhaustively eliminate all possible artifacts in LLM prompting; the added experiments nevertheless provide direct evidence that the reported advantage is not an artifact of the particular masking choice. revision: yes
-
Referee: The abstract and experimental description claim superiority over 'XGBoost proxies' but do not specify how the XGBoost attributions are obtained (TreeSHAP, permutation importance, etc.) or whether the same deletion-faithfulness protocol is applied to both TabSHAP and the proxy. Direct comparability therefore cannot be verified from the given information.
Authors: We apologize for the lack of specification. The revised Section 3.3 now states that the XGBoost proxy attributions were computed with TreeSHAP on an XGBoost model trained on the identical tabular feature set (no serialization). The deletion-faithfulness evaluation uses the identical protocol for both methods: attributions determine the removal order, and faithfulness is measured by the same JSD-based curve on the LLM outputs. For the XGBoost proxy the removal is performed on the tabular input to the LLM prompt, ensuring direct comparability. revision: yes
Circularity Check
No significant circularity in TabSHAP derivation chain
full rationale
TabSHAP is defined directly as an adaptation of Shapley-style sampled-coalition estimation using Jensen-Shannon divergence between class distributions on full versus masked serialized key:value prompts. This construction is explicit and does not reduce to its own outputs or fitted parameters. Faithfulness is measured externally via deletion on public benchmarks (Adult Income, Heart Disease) against random and XGBoost baselines, with an ablation over divergence metrics (JSD/KL/L1) that tests alternatives without self-reference. No equations, self-citations, or uniqueness claims create a closed loop; the method and its empirical validation remain independent.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Shapley values can be approximated by sampling coalitions
- domain assumption Jensen-Shannon divergence between full and masked class distributions quantifies feature impact
Reference graph
Works this paper leans on
-
[1]
Proceedings of AISTATS , year=
TabLLM: Few-shot Classification of Tabular Data with Large Language Models , author=. Proceedings of AISTATS , year=
-
[2]
Advances in Neural Information Processing Systems , volume=
Lift: Language-interfaced fine-tuning for non-language machine learning tasks , author=. Advances in Neural Information Processing Systems , volume=
-
[3]
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Deep inside convolutional networks: Visualising image classification models and saliency maps , author=. arXiv preprint arXiv:1312.6034 , year=
-
[4]
International conference on machine learning , pages=
Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=
2017
-
[5]
arXiv preprint arXiv:2407.10114 , year=
TokenSHAP: Interpreting Large Language Models with Monte Carlo Shapley Value Estimation , author=. arXiv preprint arXiv:2407.10114 , year=
-
[6]
Building Trust Workshop at ICLR , year=
SPEX: Scaling Feature Interaction Explanations for LLMs , author=. Building Trust Workshop at ICLR , year=
-
[7]
arXiv preprint arXiv:2312.01279 , year=
TextGenSHAP: Scalable post-hoc explanations in text generation , author=. arXiv preprint arXiv:2312.01279 , year=
-
[8]
Advances in neural information processing systems , volume=
A unified approach to interpreting model predictions , author=. Advances in neural information processing systems , volume=
-
[9]
Advances in Neural Information Processing Systems , volume=
QLoRA: Efficient Finetuning of Quantized LLMs , author=. Advances in Neural Information Processing Systems , volume=
-
[10]
LLaMA: Open and Efficient Foundation Language Models
Llama: Open and Efficient Foundation Language Models , author=. arXiv preprint arXiv:2302.13971 , year=
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Attention, Please!
Goldshmidt, Roni and others , journal=. Attention, Please!
-
[12]
Proceedings of the AAAI Conference on Artificial Intelligence , volume=
Tablebench: A comprehensive and complex benchmark for table question answering , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.