Interpreto: An Explainability Library for Transformers

· 2025 · cs.CL · arXiv 2512.09730

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Interpreto is an open-source Python library for interpreting HuggingFace language models, from early BERT variants to LLMs. It provides two complementary families of methods: attribution methods and concept-based explanations. The library bridges recent research and practical tooling by exposing explanation workflows through a unified API for both classification and text generation. A key differentiator is its end-to-end concept-based pipeline (from activation extraction to concept learning, interpretation, and scoring), which goes beyond feature-level attributions and is uncommon in existing libraries. See GitHub: https://github.com/FOR-sight-ai/interpreto and the demo website: https://for-sight-ai.github.io/interpreto-demo/.

representative citing papers

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling

cs.LG · 2026-06-08 · unverdicted · novelty 6.0

Introduces BrainSurgery, a declarative YAML-based tool for reproducible tensor surgery on deep learning checkpoints with built-in validation assertions, shown via examples and case studies in model upcycling and LoRA extraction.

citing papers explorer

Showing 1 of 1 citing paper.

BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling cs.LG · 2026-06-08 · unverdicted · none · ref 14 · internal anchor
Introduces BrainSurgery, a declarative YAML-based tool for reproducible tensor surgery on deep learning checkpoints with built-in validation assertions, shown via examples and case studies in model upcycling and LoRA extraction.

Interpreto: An Explainability Library for Transformers

fields

years

verdicts

representative citing papers

citing papers explorer