arxiv: 2604.24579 · v1 · submitted 2026-04-27 · 💻 cs.SE

Recognition: unknown

Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

Phat T. Tran-Truong, Xuan-Bach Le

Authors on Pith no claims yet

Pith reviewed 2026-05-08 02:47 UTC · model grok-4.3

classification 💻 cs.SE

keywords LLM agentsMarkov chainsreliability modelingsuccess-time distributionabsorbing DTMCreliability decay curveTraceToChainfirst-passage analysis

0 comments

The pith

Fitting LLM agent execution traces to absorbing Markov chains accurately predicts their success-time distributions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scalar metrics such as pass@k summarize LLM agent performance but leave the underlying distribution of time-to-success unidentified and offer no built-in uncertainty. The paper develops TraceToChain, a pipeline that automatically clusters traces into states and fits an absorbing discrete-time Markov chain whose first-passage times supply the full success-time distribution. This single object recovers pass@k, pass^k, and the reliability decay curve as different projections of the same distribution. On seven controlled frameworks using a strict 50/50 fit/test split, the analytic curves overlay held-out empirical curves with maximum L-infinity deviation of 0.053, and Kolmogorov-Smirnov tests accept the fit at p greater than 0.05 in every case. The approach therefore converts reliability measurement from a post-hoc scalar into a testable, extrapolatable probabilistic model.

Core claim

Agent traces, once clustered into states, can be represented by an absorbing DTMC whose transition matrix yields a first-passage distribution that matches observed reliability behavior. On the seven evaluated frameworks the fitted chains produce analytic reliability decay curves that stay within 0.053 of held-out empirical curves, the two-sample KS test on the first-passage CDF accepts the model in all cases, and posterior and bootstrap interval estimates agree closely.

What carries the argument

The TraceToChain pipeline, which performs automatic state clustering, Laplace-smoothed MLE transition estimation for an absorbing DTMC, and composite AIC-plus-KS validation while reporting Dirichlet posterior and bootstrap uncertainty intervals.

Load-bearing premise

Agent execution traces, after automatic clustering, satisfy the Markov property well enough that an absorbing DTMC captures the true success-time distribution and supports extrapolation to unseen traces.

What would settle it

A new collection of agent frameworks processed with the identical 50/50 protocol that yields either a KS p-value below 0.05 on the first-passage CDF or an L-infinity RDC deviation above 0.1 on held-out data would falsify the claim of reliable modeling.

Figures

Figures reproduced from arXiv: 2604.24579 by Phat T. Tran-Truong, Xuan-Bach Le.

**Figure 1.** Figure 1: Motivating example: a fitted absorbing chain turns one trace corpus view at source ↗

**Figure 2.** Figure 2: Conceptual translation from an agent’s interactive loop to a DTMC. view at source ↗

**Figure 3.** Figure 3: Illustration of a fitted absorbing DTMC. Transitions among transient view at source ↗

**Figure 4.** Figure 4: SS1 closed-form validation: absolute error view at source ↗

**Figure 5.** Figure 5: Additional theorem checks retained in the artifact: (a) perturbation view at source ↗

**Figure 6.** Figure 6: Goel–Okumoto rare-failure limit check: the cumulative first-passage view at source ↗

**Figure 7.** Figure 7: SS7 tests the GoF safeguard across Markov ground truth, 2nd-order view at source ↗

**Figure 9.** Figure 9: SS9 tests held-out recovery on controlled MAST-style traces: empirical ˆ view at source ↗

read the original abstract

Large language model (LLM) agents increasingly operate as sequential software systems, but their reliability is often summarized by scalar benchmark metrics. Metrics such as pass$@k$, pass$^k$, and the reliability decay curve (RDC) are useful summaries, but they do not identify the success-time distribution being estimated, test whether traces support that distribution, or quantify finite-trace uncertainty. We present \textsc{TraceToChain}, a reproducible pipeline that fits agent execution traces to an absorbing discrete-time Markov chain (DTMC), $\hat M=(\hat Q,\hat R_\oplus,\hat R_\ominus)$, with explicit diagnostics and uncertainty. The pipeline builds an automatic cluster taxonomy, estimates transitions with Laplace-smoothed maximum-likelihood estimation (MLE), checks fit with a composite Akaike information criterion (AIC) and Kolmogorov--Smirnov (KS) goodness-of-fit certificate, and reports Dirichlet-posterior credible intervals and non-parametric bootstrap intervals. We adapt classical reliability mathematics (Kemeny--Snell~\cite{kemenysnell}, Cheung~\cite{cheung1980}, Goel--Okumoto~\cite{goelokt}) to agent traces. The resulting first-passage view reconciles metrics usually reported separately: pass$@k$, pass$^k$, and the RDC are projections of one success-time distribution. On seven controlled MAST-style frameworks with a strict 50/50 fit/test protocol, held-out empirical RDCs overlay their analytic counterparts with max $L_\infty^{\mathrm{RDC}} = 0.053$ (median $0.048$). A two-sample KS test on the first-passage cumulative distribution function (CDF) accepts the fitted chain with $p>0.05$ on $7/7$ frameworks (min $p = 0.78$), and per-entry $95\%$ posterior and bootstrap intervals agree to $\approx\!0.01$ at the median.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TraceToChain turns agent traces into absorbing Markov chains with a full diagnostic pipeline and shows decent held-out matches on seven frameworks, though the Markov property after clustering gets only indirect support.

read the letter

This paper introduces TraceToChain, which clusters LLM agent execution traces into states, fits an absorbing DTMC with Laplace-smoothed MLE, runs AIC and KS checks, and pulls uncertainty from Dirichlet posteriors and bootstrap. It then uses first-passage times to treat pass@k, pass^k, and the reliability decay curve as projections of one underlying success-time distribution. That unification plus the end-to-end pipeline with classical reliability references is the actual new piece; nothing quite like the specific combination appears in the cited prior work. The empirical side is handled cleanly. They enforce a 50/50 fit/test split across seven MAST-style frameworks, and the analytic RDCs sit close to the held-out empirical curves (max L-infinity error 0.053, median 0.048). The two-sample KS test on the first-passage CDF clears p > 0.05 on all seven cases, with the weakest at 0.78, and the credible intervals line up with the bootstrap ones to about 0.01 at the median. Those numbers give concrete evidence that the fitted chains reproduce the observed distributions in their controlled setups. The main soft spot is that the KS and overlay checks only confirm the marginal success-time distribution. They do not directly test whether the clustered states satisfy the first-order Markov property or whether residual history dependence remains after the automatic taxonomy step. Without higher-order Markov tests or conditional-independence checks on the sequences, the model could still be a useful projection even if the generative assumption is only approximate. The abstract does not report those extra diagnostics, so the extrapolation reliability stays partly unverified. This is aimed at people who evaluate sequential agent systems and want something beyond scalar benchmarks. It deserves peer review because the validation protocol is strict and the numbers are reported transparently; referees can press on the clustering validity and Markov tests without the paper being dismissed outright.

Referee Report

2 major / 2 minor

Summary. The paper introduces TraceToChain, a reproducible pipeline that automatically clusters LLM agent execution traces, fits them to an absorbing DTMC (Q̂, R̂⊕, R̂⊖) via Laplace-smoothed MLE, and adapts classical reliability results (Kemeny–Snell, Cheung, Goel–Okumoto) to unify pass@k, pass^k, and the reliability decay curve (RDC) as projections of a single success-time distribution. It supplies AIC/KS diagnostics, Dirichlet posterior credible intervals, and bootstrap intervals. On seven MAST-style frameworks under a strict 50/50 fit/test split, held-out empirical RDCs overlay analytic ones with max L∞RDC = 0.053 (median 0.048), and two-sample KS tests on the first-passage CDF accept the model (p > 0.05 on all 7, min p = 0.78).

Significance. If the post-clustering Markov property holds, the work supplies a statistically grounded, extrapolative model for agent reliability that replaces scalar benchmarks with a full distribution plus uncertainty quantification. The strict held-out protocol, adaptation of classical reliability mathematics, and explicit diagnostics are genuine strengths that would allow falsifiable predictions and reproducible analysis of sequential LLM systems.

major comments (2)

[Abstract / validation procedure] Abstract and validation section: the reported two-sample KS test on the first-passage CDF (p > 0.05 on 7/7) together with composite AIC only confirm that the fitted chain reproduces the observed marginal success-time distribution. They do not test the Markov property itself (i.e., whether P(next state | current state) is independent of history after automatic clustering). Because the central claim is that the DTMC yields the correct success-time distribution and reliable analytic RDC, this gap is load-bearing; a direct test (conditional independence or order-2 Markov check on the clustered sequences) is required.
[Methods / clustering and transition estimation] Methods / clustering step: the automatic cluster taxonomy is presented as producing states suitable for an absorbing DTMC, yet no quantitative check (e.g., memory test or silhouette analysis tied to transition independence) is described. If residual history dependence remains, the Laplace-smoothed MLE transitions and the subsequent first-passage formulas become a convenient projection rather than a verified model, undermining extrapolation claims even when L∞RDC ≤ 0.053.

minor comments (2)

[Model definition] Notation: the symbols R̂⊕ and R̂⊖ are introduced without an explicit equation linking them to the standard Kemeny–Snell canonical form; a short display equation would remove ambiguity.
[Results / figures] Figure clarity: the held-out RDC overlay plots would benefit from shaded 95 % posterior/bootstrap bands so readers can visually assess the reported median agreement of ≈0.01.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and precise comments, which help clarify the scope of our validation. We address each major point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / validation procedure] Abstract and validation section: the reported two-sample KS test on the first-passage CDF (p > 0.05 on 7/7) together with composite AIC only confirm that the fitted chain reproduces the observed marginal success-time distribution. They do not test the Markov property itself (i.e., whether P(next state | current state) is independent of history after automatic clustering). Because the central claim is that the DTMC yields the correct success-time distribution and reliable analytic RDC, this gap is load-bearing; a direct test (conditional independence or order-2 Markov check on the clustered sequences) is required.

Authors: We agree that the KS test on the first-passage CDF and the composite AIC primarily validate agreement between the empirical and analytic marginal success-time distributions rather than directly confirming the Markov property after clustering. While the strong held-out RDC overlay (max L∞ = 0.053) and high p-values provide indirect support—non-Markovian residuals would be expected to degrade the first-passage match—we acknowledge that this is not a substitute for an explicit test. In the revised manuscript we will add an order-2 Markov test (likelihood-ratio comparison of order-1 vs. order-2 transitions on the clustered sequences) together with a conditional-independence diagnostic, reporting the results in the validation section. revision: yes
Referee: [Methods / clustering and transition estimation] Methods / clustering step: the automatic cluster taxonomy is presented as producing states suitable for an absorbing DTMC, yet no quantitative check (e.g., memory test or silhouette analysis tied to transition independence) is described. If residual history dependence remains, the Laplace-smoothed MLE transitions and the subsequent first-passage formulas become a convenient projection rather than a verified model, undermining extrapolation claims even when L∞RDC ≤ 0.053.

Authors: The referee is correct that the manuscript currently lacks an explicit quantitative check linking the clustering output to transition independence. The automatic taxonomy is intended to yield states for which the Markov assumption is approximately valid, but without a memory test this remains an unverified modeling choice. We will therefore add, in the methods section, a memory test (order-1 vs. order-2 likelihood-ratio test on the clustered traces) and report any associated silhouette scores or transition-stability metrics. These diagnostics will be used to qualify the reliability of the subsequent first-passage formulas and extrapolation claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained via held-out validation

full rationale

The paper fits an absorbing DTMC to 50% of traces, derives first-passage success-time quantities (including unified pass@k, pass^k, and RDC as projections), and validates them directly against held-out empirical distributions on the remaining 50% using KS tests (p>0.05 on 7/7) and RDC L∞ overlays (max 0.053). This external test-set comparison prevents any reduction of predictions to training inputs by construction. Classical reliability results (Kemeny-Snell, Cheung, Goel-Okumoto) are adapted without self-citation chains, and no self-definitional, ansatz-smuggling, or renaming steps appear in the provided text. The central claim rests on empirical match rather than tautology.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the Markov property holding after clustering and on the fitting procedure producing a faithful model of the success-time distribution.

free parameters (1)

Transition probabilities (Q, R)
Estimated via Laplace-smoothed MLE from the traces

axioms (2)

domain assumption Agent traces can be represented as an absorbing discrete-time Markov chain after automatic clustering
Core modeling choice invoked for the entire pipeline
domain assumption The Markov property holds for the clustered states
Required for the DTMC to be a valid model

pith-pipeline@v0.9.0 · 5656 in / 1361 out tokens · 80333 ms · 2026-05-08T02:47:17.397407+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Why Retrying Fails: Context Contamination in LLM Agent Pipelines
cs.AI 2026-05 conditional novelty 7.0

A Context-Contaminated Restart Model derives exact success probabilities and an optimal pipeline depth T* = sqrt(B * log(1/(1-ε1)) / log(1/(1-ε0))) for fixed budget B, validated on SWE-bench where it fits data far bet...

Reference graph

Works this paper leans on

39 extracted references · 9 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

J. G. Kemeny and J. L. Snell,Finite Markov Chains. Princeton, NJ: Van Nostrand, 1960

1960
[2]

A user-oriented software reliability model,

R. C. Cheung, “A user-oriented software reliability model,”IEEE Transactions on Software Engineering, vol. SE-6, no. 2, pp. 118–125, 1980

1980
[3]

Time-dependent error-detection rate model for software reliability and other performance measures,

A. L. Goel and K. Okumoto, “Time-dependent error-detection rate model for software reliability and other performance measures,”IEEE Transactions on Reliability, vol. R-28, no. 3, pp. 206–211, 1979

1979
[4]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023

2023
[5]

Toolformer: Language Models Can Teach Themselves to Use Tools

T. Schick, J. Dwivedi-Yu, R. Dess `ı, R. Raileanu, M. Lomeli, L. Zettle- moyer, N. Cancedda, and T. Scialom, “Toolformer: Language models can teach themselves to use tools,”arXiv preprint arXiv:2302.04761, 2023

work page internal anchor Pith review arXiv 2023
[6]

A survey on large language model based autonomous agents,

L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang, Z. Chen, J. Tang, X. Chen, Y . Lin, W. X. Zhao, Z. Wei, and J.-R. Wen, “A survey on large language model based autonomous agents,”Frontiers of Computer Science, vol. 18, no. 6, p. 186345, 2024

2024
[7]

Basic concepts and taxonomy of dependable and secure computing,

A. Aviˇzienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic concepts and taxonomy of dependable and secure computing,”IEEE Transactions on Dependable and Secure Computing, vol. 1, no. 1, pp. 11–33, 2004

2004
[8]

Beyer, C

B. Beyer, C. Jones, J. Petoff, and N. R. Murphy,Site Reliability Engineering: How Google Runs Production Systems. Sebastopol, CA: O’Reilly Media, 2016

2016
[9]

τ-bench: A benchmark for tool-agent-user interaction in real-world domains,

S. Yao, N. Shinn, P. Razavi, and K. Narasimhan, “ τ-bench: A benchmark for tool-agent-user interaction in real-world domains,” inAdvances in Neural Information Processing Systems, 2024

2024
[10]

G. W. Stewart,Matrix Algorithms, Volume I: Basic Decompositions. Philadelphia, PA: SIAM, 1998, neumann-series perturbation bound on (I−Q) −1

1998
[11]

R. A. Horn and C. R. Johnson,Matrix Analysis, 2nd ed. Cambridge, UK: Cambridge University Press, 2013, jordan-form norm bounds on ∥Qd∥; see Thms. 3.1.12 and 5.6.12

2013
[12]

K. S. Trivedi and A. Bobbio,Reliability and Availability Engineering: Modeling, Analysis, and Applications. Cambridge, UK: Cambridge University Press, 2017

2017
[13]

J. D. Musa, A. Iannino, and K. Okumoto,Software Reliability: Measure- ment, Prediction, Application. New York, NY: McGraw-Hill, 1987

1987
[14]

Karlin,Total Positivity, Volume I

S. Karlin,Total Positivity, Volume I. Stanford, CA: Stanford University Press, 1968, proves that TP2 kernels are stochastically monotone

1968
[15]

Monotone matrices and monotone Markov processes,

J. Keilson and A. Kester, “Monotone matrices and monotone Markov processes,”Stochastic Processes and their Applications, vol. 5, no. 3, pp. 231–241, 1977, formal definition of stochastic monotonicity for transition kernels

1977
[16]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-V...

work page internal anchor Pith review arXiv 2021
[17]

arXiv:2603.29231 (2026)

A. Khanal, Y . Tao, and J. Zhou, “Beyond pass@1: A reliability science framework for long-horizon LLM agents,”arXiv preprint arXiv:2603.29231, 2026

work page arXiv 2026
[18]

PRISM 4.0: Verification of probabilistic real-time systems,

M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” inComputer Aided Verification. Springer, 2011, pp. 585–591

2011
[19]

A storm is coming: A modern probabilistic model checker,

C. Dehnert, S. Junges, J.-P. Katoen, and M. V olk, “A storm is coming: A modern probabilistic model checker,” inComputer Aided Verification. Springer, 2017, pp. 592–600

2017
[20]

TriCEGAR: A trace-driven abstraction mechanism for agentic AI,

R. Koohestani, A. G ¨orpelio˘glu, E. Klimov, B. K. Ozkan, and M. Izadi, “TriCEGAR: A trace-driven abstraction mechanism for agentic AI,”arXiv preprint arXiv:2601.22997, 2026

work page arXiv 2026
[21]

Pro2guard: Proactive runtime enforcement of llm agent safety via probabilistic model checking

H. Wang, C. M. Poskitt, J. Wei, and J. Sun, “ProbGuard: Probabilistic run- time monitoring for LLM agent safety,”arXiv preprint arXiv:2508.00500, 2025

work page arXiv 2025
[22]

Reflexion: Language agents with verbal reinforcement learning,

N. Shinn, F. Cassano, E. Berman, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” inAdvances in Neural Information Processing Systems, 2023

2023
[23]

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

Y . Qin, S. Liang, Y . Ye, K. Zhu, L. Yan, Y . Lu, Y . Lin, X. Cong, X. Tang, B. Qian, S. Zhao, L. Hong, R. Tian, R. Xie, J. Zhou, M. Gerstein, D. Li, Z. Liu, and M. Sun, “ToolLLM: Facilitating large language models to master 16000+ real-world APIs,”arXiv preprint arXiv:2307.16789, 2023

work page internal anchor Pith review arXiv 2023
[24]

SWE-bench: Can language models resolve real-world GitHub issues?

C. E. Jimenez, J. Yang, A. Wettig, S. Yao, K. Pei, O. Press, and K. Narasimhan, “SWE-bench: Can language models resolve real-world GitHub issues?” inInternational Conference on Learning Representations, 2024

2024
[25]

WebArena: A realistic web environment for building autonomous agents,

S. Zhou, F. F. Xu, H. Zhu, X. Zhou, R. Lo, A. Sridhar, X. Cheng, T. Ou, Y . Bisk, D. Fried, U. Alon, and G. Neubig, “WebArena: A realistic web environment for building autonomous agents,” inInternational Conference on Learning Representations, 2024

2024
[26]

ALFWorld: Aligning text and embodied environments for interactive learning,

M. Shridhar, X. Yuan, M.-A. C ˆot´e, Y . Bisk, A. Trischler, and M. Hausknecht, “ALFWorld: Aligning text and embodied environments for interactive learning,” inInternational Conference on Learning Representations, 2021

2021
[27]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review arXiv 2023
[28]

Why Do Multi-Agent LLM Systems Fail?

M. Cemri, M. Z. Pan, S. Yang, L. A. Agrawal, B. Chopra, R. Tiwari, K. Keutzer, A. Parameswaran, D. Klein, K. Ramchandran, M. Zaharia, J. E. Gonzalez, and I. Stoica, “Why do multi-agent LLM systems fail?” arXiv preprint arXiv:2503.13657, 2025

work page internal anchor Pith review arXiv 2025
[29]

AgentBench: Evaluating LLMs as agents,

X. Liu, H. Yu, H. Zhang, Y . Xu, X. Lei, H. Lai, Y . Gu, H. Ding, K. Men, K. Yang, S. Zhang, X. Deng, A. Zeng, Z. Du, C. Zhang, S. Shen, T. Zhang, Y . Su, H. Sun, M. Huang, Y . Dong, and J. Tang, “AgentBench: Evaluating LLMs as agents,” inInternational Conference on Learning Representations, 2024

2024
[30]

A logic for reasoning about time and reliability,

H. Hansson and B. Jonsson, “A logic for reasoning about time and reliability,”Formal Aspects of Computing, vol. 6, no. 5, pp. 512–535, 1994

1994
[31]

Baier and J.-P

C. Baier and J.-P. Katoen,Principles of Model Checking. Cambridge, MA: MIT Press, 2008

2008
[32]

Hierarchical grouping to optimize an objective function,

J. H. Ward, “Hierarchical grouping to optimize an objective function,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 236–244, 1963

1963
[33]

Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,

P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,”Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987

1987
[34]

A new look at the statistical model identification,

H. Akaike, “A new look at the statistical model identification,”IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974

1974
[35]

Table for estimating the goodness of fit of empirical distributions,

N. Smirnov, “Table for estimating the goodness of fit of empirical distributions,”Annals of Mathematical Statistics, vol. 19, no. 2, pp. 279–281, 1948

1948
[36]

Stochastically monotone Markov chains,

D. J. Daley, “Stochastically monotone Markov chains,”Z. Wahrschein- lichkeitstheorie verw. Gebiete, vol. 10, pp. 305–317, 1968, foundational paper on the stochastic-monotonicity notion used in A4. [Online]. Available: https://link.springer.com/journal/440/volumes-and-issues/10-4

1968
[37]

Classes of orderings of measures and related correlation inequalities. I. multivariate totally positive distributions,

S. Karlin and Y . Rinott, “Classes of orderings of measures and related correlation inequalities. I. multivariate totally positive distributions,” Journal of Multivariate Analysis, vol. 10, no. 4, pp. 467–498, 1980, log-concavity of CDFs under TP2 kernels

1980
[38]

D. A. Levin and Y . Peres,Markov Chains and Mixing Times, 2nd ed. American Mathematical Society, 2017, pre-mixing constants in non- reversible chains (Thm. 12.3)

2017
[39]

ULTRA-MC: A unified approach to learning mixtures of markov chains via hitting times,

F. Spaeh, K. Sotiropoulos, and C. E. Tsourakakis, “ULTRA-MC: A unified approach to learning mixtures of markov chains via hitting times,” arXiv preprint arXiv:2405.15094, 2024

work page arXiv 2024