pith. sign in

arxiv: 2606.21992 · v1 · pith:NZ7QVT5Onew · submitted 2026-06-20 · 💻 cs.PL

Analyzing the Analyzers: Model Counting Meets Abstract Interpretation

Pith reviewed 2026-06-26 11:04 UTC · model grok-4.3

classification 💻 cs.PL
keywords abstract interpretationmodel countingprecision measurementabstract domainsstatic analysissymbolic abstractionimprecision metricabstract transformers
0
0 comments X

The pith

MCAI encodes concrete semantics and abstract values as logical formulas so model counting can measure the inherent semantic loss of any abstract domain without reference to a client analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Abstract interpretation approximates program behavior with abstract domains, but comparing how much precision each domain loses has usually required tying the comparison to particular analysis queries. MCAI instead turns both the concrete semantics and the abstract values into logical formulas whose model counts directly yield a ratio that reflects only the semantic gap created by the abstraction. This ratio serves as a single, client-independent number for imprecision. When the method is applied to the best abstract transformers of four domains via symbolic abstraction, it shows that Interval often loses little more precision than Octagon, that many octagonal constraints are redundant, and that bit-level KnownBit domains consistently outperform word-level ones.

Core claim

The paper introduces MCAI, a methodology that encodes both concrete program semantics and abstract domain values as logical formulas and then employs model counting to compute a quantitative, client-independent metric of the imprecision that each abstract transformer introduces. The metric isolates the semantic loss inherent in the abstraction itself. Application to Interval, Octagon, and KnownBit domains via symbolic abstraction produces the findings that Interval precision is often comparable to Octagon, many octagonal constraints add no new semantic information, and bit-level KnownBit domains are more precise than word-level domains.

What carries the argument

MCAI (Model Counting meets Abstract Interpretation), the encoding of concrete semantics and abstract values as logical formulas whose model counts serve as a direct measure of semantic loss.

If this is right

  • Precision comparisons between abstract domains can be performed without dependence on any particular client query or analysis goal.
  • Redundant constraints within relational domains such as Octagon can be identified and removed on the basis of model-count evidence.
  • Bit-level KnownBit domains can be selected over word-level domains when higher precision is required for an analysis.
  • The precision of transformers produced by symbolic abstraction can be evaluated directly with the same model-count metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same formula-encoding technique could be used to compare precision loss across static analyses that rely on over-approximations outside the abstract-interpretation setting.
  • Analysis frameworks could integrate model counting to automatically select, for a given program, the least expensive domain that meets a chosen precision threshold.
  • The redundancy result points to possible simplifications in the implementation of octagon-like domains that would reduce computational cost with no change in precision.

Load-bearing premise

The logical formulas faithfully represent the meanings of the concrete semantics and abstract values without adding or subtracting any imprecision beyond what the abstract domain itself causes.

What would settle it

An exhaustive enumeration of all concrete states for a small program fragment in which the model-count ratio for a given abstract value differs from the actual fraction of concrete states that value over-approximates.

Figures

Figures reproduced from arXiv: 2606.21992 by Junda Zheng, Peisen Yao.

Figure 1
Figure 1. Figure 1: Illustration of an example of a concrete semantics formula and its best [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of the architecture and overall workflow of MCAI. [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of variables and atomic predicates across the benchmark [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Precision comparison across domains: Distribution of precision scores [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of benchmark formulas for precision comparison across differ [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Specific false positive rate comparison between Interval and KnownBit: [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of constraint categories in Octagon: Precision when removing [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
read the original abstract

Abstract interpretation offers a principled foundation for static analysis by approximating concrete program semantics via abstract domains. However, quantitatively comparing the precision of different domains remains a longstanding challenge. We present MCAI (Model Counting meets Abstract Interpretation), a new methodology that employs model counting to measure the precision of abstract domains. Unlike prior approaches that assess precision relative to specific analysis queries, MCAI encodes both concrete semantics and abstract values as logical formulas, enabling a client-independent, quantitative metric of imprecision that captures the inherent semantic loss in the abstractions. We apply MCAI to four abstract domains and evaluate the precision of their best abstract transformers via symbolic abstraction. Our results yield several insights: the Interval domain, despite its simplicity, often achieves precision comparable to that of Octagon; many octagonal constraints are semantically redundant; and the bit-level KnownBit domains consistently outperform the word-level domains. MCAI offers both theoretical insights into the precision of abstract domains and practical guidance for selecting suitable abstractions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MCAI, a methodology that encodes both concrete program semantics and abstract values as logical formulas, then applies model counting to obtain a client-independent quantitative measure of imprecision (semantic loss) induced by an abstract domain. It evaluates the approach on four domains by computing precision of their best abstract transformers via symbolic abstraction, and reports concrete insights including that the Interval domain often matches Octagon precision, many octagonal constraints are redundant, and bit-level KnownBit domains outperform word-level ones.

Significance. If the encoding step is faithful, MCAI supplies a novel, query-independent metric for comparing abstract-domain precision that could guide domain selection in static analysis. The evaluation on multiple domains yields actionable comparisons and identifies redundancy patterns; the absence of fitted parameters or circular definitions is a strength.

major comments (2)
  1. [§3 (MCAI methodology)] The central claim rests on the assertion that the logical encoding of concrete semantics and abstract values introduces no additional bias beyond the domain's inherent loss. No explicit validation, soundness proof, or error analysis of this encoding step is supplied; without it the quantitative metric cannot be confirmed to isolate domain imprecision alone.
  2. [§5 (Evaluation)] The reported results on the four domains presuppose that model counting on the encodings accurately reflects semantic loss; the manuscript provides no cross-check against known precision orderings or manual inspection of small programs to confirm the counts match expected loss.
minor comments (2)
  1. [§3] Notation for the model-counting metric and the symbolic-abstraction procedure should be introduced with explicit definitions before first use.
  2. [§5] The abstract claims 'several insights' but the evaluation section would benefit from a table summarizing the numeric precision values across domains for direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The two major comments correctly identify areas where additional justification would strengthen the presentation of the MCAI methodology and its empirical results. We address each point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§3 (MCAI methodology)] The central claim rests on the assertion that the logical encoding of concrete semantics and abstract values introduces no additional bias beyond the domain's inherent loss. No explicit validation, soundness proof, or error analysis of this encoding step is supplied; without it the quantitative metric cannot be confirmed to isolate domain imprecision alone.

    Authors: We agree that the manuscript would benefit from an explicit discussion of the encoding's faithfulness. The encodings follow standard SMT representations of concrete semantics (as transition relations) and abstract values (as conjunctions of domain constraints), which are known to be semantics-preserving when constructed directly from the domain definitions. Nevertheless, to address the concern directly we will add a dedicated paragraph in §3 that sketches why the encoding introduces no extraneous models or losses beyond the abstraction itself, together with a short note on the exactness of the model counter used. This addition will make the isolation of domain imprecision explicit. revision: yes

  2. Referee: [§5 (Evaluation)] The reported results on the four domains presuppose that model counting on the encodings accurately reflects semantic loss; the manuscript provides no cross-check against known precision orderings or manual inspection of small programs to confirm the counts match expected loss.

    Authors: The evaluation section relies on the assumption that the model counts correctly quantify the loss, and the reported orderings (KnownBit bit-level > word-level, Interval often close to Octagon) are consistent with prior qualitative knowledge. We acknowledge, however, that an explicit cross-check is absent. We will therefore augment §5 with a small-scale validation: for a handful of tiny programs we will manually enumerate the concrete and abstract reachable states and compare them against the model-counting results, confirming that the quantitative metric reproduces the expected precision ordering. This material will be added as a new subsection or appendix. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces MCAI as a direct methodology that encodes concrete semantics and abstract values into logical formulas and applies model counting to quantify imprecision. No equations, fitted parameters, self-citations, or ansatzes are referenced in the provided text that would reduce the central metric to an input by construction. The derivation is self-contained as a new measurement technique without load-bearing reliance on prior author results or renaming of known patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; the ledger therefore records only the assumptions stated or implied in that text. The approach rests on the correctness of the encoding step and on model counting being an accurate proxy for semantic loss.

axioms (1)
  • domain assumption Encoding concrete program semantics and abstract values as logical formulas preserves the semantic relationship without introducing extraneous loss.
    This premise is required for the model-counting step to measure only the imprecision of the abstract domain itself.

pith-pipeline@v0.9.1-grok · 5687 in / 1249 out tokens · 31457 ms · 2026-06-26T11:04:20.072203+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 4 canonical work pages

  1. [1]

    Static determination of dynamic properties of programs

    Patrick Cousot and Radhia Cousot. Static determination of dynamic properties of programs. InProceedings of the 2nd International Symposium on Programming, Paris, France, pages 106–130. Dunod, 1976

  2. [2]

    Automatic discovery of linear restraints among variables of a program

    Patrick Cousot and Nicolas Halbwachs. Automatic discovery of linear restraints among variables of a program. InProceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL ’78, page 84–96, New York, NY, USA, 1978. Association for Computing Machinery. doi: 10.1145/512760. 512770. URLhttps:doi.org/10.1145/512760.512770

  3. [3]

    The dacapo benchmarks: Java benchmarking develop- ment and analysis

    Stephen M Blackburn, Robin Garner, Chris Hoffmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, et al. The dacapo benchmarks: Java benchmarking develop- ment and analysis. In21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 169–19...

  4. [4]

    OWASP Benchmark.https://owasp.org/ www-project-benchmark/

    OWASP Foundation. OWASP Benchmark.https://owasp.org/ www-project-benchmark/

  5. [5]

    Towards a quantitative estimation of abstract interpretations

    Francesco Logozzo, Corneliu Popeea, and Vincent Laviron. Towards a quantitative estimation of abstract interpretations. InWorkshop on Quantitative Analysis of Software. Citeseer, 2009

  6. [6]

    Relational analysis and precision via probabilistic abstract interpretation.Electronic Notes in Theoretical Computer Science, 220(3):23–42, 2008

    Alessandra Di Pierro, Pascal Sotin, and Herbert Wiklicky. Relational analysis and precision via probabilistic abstract interpretation.Electronic Notes in Theoretical Computer Science, 220(3):23–42, 2008

  7. [7]

    Quantifying the precision of numerical abstract domains

    Pascal Sotin. Quantifying the precision of numerical abstract domains. 2010

  8. [8]

    Symbolic implementation of the best transformer

    Thomas Reps, Mooly Sagiv, and Greta Yorsh. Symbolic implementation of the best transformer. InVMCAI, volume 2937, pages 252–266. Springer, 2004

  9. [9]

    A new numerical abstract domain based on difference-bound matrices

    Antoine Min´ e. A new numerical abstract domain based on difference-bound matrices. InProceedings of the Second Symposium on Programs As Data Ob- jects, PADO ’01, pages 155–172, London, UK, UK, 2001. Springer-Verlag. URL http://dl.acm.org/citation.cfm?id=645774.668110

  10. [10]

    The octagon abstract domain.Higher Order Symbol

    Antoine Min´ e. The octagon abstract domain.Higher Order Symbol. Comput., 19 (1):31–100, March 2006. doi: 10.1007/s10990-006-8609-1. URLhttps:doi.org/ 10.1007/s10990-006-8609-1

  11. [11]

    Formal language, grammar and set-constraint- based program analysis by abstract interpretation

    Patrick Cousot and Radhia Cousot. Formal language, grammar and set-constraint- based program analysis by abstract interpretation. InProceedings of the seventh international conference on Functional programming languages and computer ar- chitecture, pages 170–181, 1995

  12. [12]

    Logical abstract do- mains and interpretations

    Patrick Cousot, Radhia Cousot, and Laurent Mauborgne. Logical abstract do- mains and interpretations. InThe Future of Software Engineering, pages 48–71. Springer, 2011

  13. [13]

    Cbmc – c bounded model checker

    Daniel Kroening and Michael Tautschnig. Cbmc – c bounded model checker. In Erika ´Abrah´ am and Klaus Havelund, editors,Tools and Algorithms for the Construction and Analysis of Systems, pages 389–391, Berlin, Heidelberg, 2014. Springer Berlin Heidelberg

  14. [14]

    Frans Kaashoek

    Xi Wang, Haogang Chen, Zhihao Jia, Nickolai Zeldovich, and M. Frans Kaashoek. Improving integer security for systems with kint. InProceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, page 163–177, USA, 2012. USENIX Association

  15. [15]

    Symbolic optimization with smt solvers

    Yi Li, Aws Albarghouthi, Zachary Kincaid, Arie Gurfinkel, and Marsha Chechik. Symbolic optimization with smt solvers. InProceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, pages 607–618, New York, NY, USA, 2014. ACM. doi: 10.1145/2535838. 2535857. URLhttp:doi.acm.org/10.1145/2535838.2535857

  16. [16]

    Block-wise abstract interpretation by combining abstract domains with smt

    Jiahong Jiang, Liqian Chen, Xueguang Wu, and Ji Wang. Block-wise abstract interpretation by combining abstract domains with smt. InInternational Confer- ence on Verification, Model Checking, and Abstract Interpretation, pages 310–329. Springer, 2017

  17. [17]

    Program analy- sis via efficient symbolic abstraction.Proceedings of the ACM on Programming Languages, 5(OOPSLA):1–32, 2021

    Peisen Yao, Qingkai Shi, Heqing Huang, and Charles Zhang. Program analy- sis via efficient symbolic abstraction.Proceedings of the ACM on Programming Languages, 5(OOPSLA):1–32, 2021

  18. [18]

    Learning fast and precise numerical analysis

    Jingxuan He, Gagandeep Singh, Markus P¨ uschel, and Martin Vechev. Learning fast and precise numerical analysis. InProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 1112– 1127, 2020

  19. [19]

    Evaluating the imprecision of static analysis

    Atanas Rountev, Scott Kagan, and Michael Gibas. Evaluating the imprecision of static analysis. InProceedings of the 5th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 14–16, 2004

  20. [20]

    Systematic design of program analysis frame- works

    Patrick Cousot and Radhia Cousot. Systematic design of program analysis frame- works. InProceedings of the 6th ACM SIGACT-SIGPLAN symposium on Prin- ciples of programming languages, pages 269–282, 1979

  21. [21]

    Completeness in abstract interpreta- tion: A domain perspective

    Roberto Giacobazzi and Francesco Ranzato. Completeness in abstract interpreta- tion: A domain perspective. InInternational Conference on Algebraic Methodology and Software Technology, pages 231–245. Springer, 1997

  22. [22]

    Making abstract interpretations complete.Journal of the ACM (JACM), 47(2):361–416, 2000

    Roberto Giacobazzi, Francesco Ranzato, and Francesca Scozzari. Making abstract interpretations complete.Journal of the ACM (JACM), 47(2):361–416, 2000

  23. [23]

    A logic for locally complete abstract interpretations

    Roberto Bruni, Roberto Giacobazzi, Roberta Gori, and Francesco Ranzato. A logic for locally complete abstract interpretations. In2021 36th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–13. IEEE, 2021

  24. [24]

    sharpsat – counting models with advanced component caching and implicit bcp

    Marc Thurley. sharpsat – counting models with advanced component caching and implicit bcp. In Armin Biere and Carla P. Gomes, editors,Theory and Applica- tions of Satisfiability Testing - SAT 2006, pages 424–429, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg

  25. [25]

    Meel, and Moshe Y

    Supratik Chakraborty, Kuldeep S. Meel, and Moshe Y. Vardi. A scalable approx- imate model counter, 2013

  26. [26]

    Model counting for recursively-defined strings

    Minh-Thai Trinh, Duc-Hiep Chu, and Joxan Jaffar. Model counting for recursively-defined strings. InInternational Conference on Computer Aided Ver- ification, pages 399–418. Springer, 2017

  27. [27]

    Subformula caching for model counting and quantitative program analysis

    William Eiers, Seemanta Saha, Tegan Brennan, and Tevfik Bultan. Subformula caching for model counting and quantitative program analysis. In2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 453–464. IEEE, 2019

  28. [28]

    Satisfiability modulo counting: A new ap- proach for analyzing privacy properties

    Matthew Fredrikson and Somesh Jha. Satisfiability modulo counting: A new ap- proach for analyzing privacy properties. InProceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–10, 2014

  29. [29]

    A model counter for constraints over unbounded strings

    Loi Luu, Shweta Shinde, Prateek Saxena, and Brian Demsky. A model counter for constraints over unbounded strings. InProceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 565– 576, 2014

  30. [30]

    It- erative distribution-aware sampling for probabilistic symbolic execution

    Mateus Borges, Antonio Filieri, Marcelo d’Amorim, and Corina S P˘ as˘ areanu. It- erative distribution-aware sampling for probabilistic symbolic execution. InPro- ceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pages 866–877, 2015

  31. [31]

    Automata-based model count- ing for string constraints

    Abdulbaki Aydin, Lucas Bang, and Tevfik Bultan. Automata-based model count- ing for string constraints. InInternational Conference on Computer Aided Veri- fication, pages 255–272. Springer, 2015

  32. [32]

    Parameterized model counting for string and numeric constraints

    Abdulbaki Aydin, William Eiers, Lucas Bang, Tegan Brennan, Miroslav Gavrilov, Tevfik Bultan, and Fang Yu. Parameterized model counting for string and numeric constraints. InProceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 400–410, 2018

  33. [33]

    Automatic discovery and quantification of information leaks

    Michael Backes, Boris K¨ opf, and Andrey Rybalchenko. Automatic discovery and quantification of information leaks. In2009 30th IEEE Symposium on Security and Privacy, pages 141–153. IEEE, 2009

  34. [34]

    String analysis for side channels with segmented oracles

    Lucas Bang, Abdulbaki Aydin, Quoc-Sang Phan, Corina S P˘ as˘ areanu, and Tevfik Bultan. String analysis for side channels with segmented oracles. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 193–204, 2016

  35. [35]

    Online synthesis of adaptive side- channel attacks based on noisy observations

    Lucas Bang, Nicol´ as Rosner, and Tevfik Bultan. Online synthesis of adaptive side- channel attacks based on noisy observations. In2018 IEEE European Symposium on Security and Privacy (EuroS&P), pages 307–322. IEEE, 2018

  36. [36]

    Quantifying information leakage using model counting constraint solvers

    Tevfik Bultan. Quantifying information leakage using model counting constraint solvers. In Supratik Chakraborty and Jorge A. Navas, editors,Verified Software. Theories, Tools, and Experiments, pages 30–35, Cham, 2020. Springer Interna- tional Publishing