pith. machine review for the scientific record. sign in
theorem proved tactic proof high

kl_nonneg

show as:
view Lean formalization →

Kullback-Leibler divergence is nonnegative for any pair of probability distributions on a finite set. Workers in variational inference and statistical mechanics cite this result when bounding free energies. The argument rearranges the standard log inequality, cancels the normalization sums, and invokes linear arithmetic on the resulting expressions.

claimFor probability distributions $p$ and $q$ on a finite index set $ι$, where each assigns strictly positive values summing to one, the Kullback-Leibler divergence satisfies $0 ≤ ∑_i p_i log(p_i / q_i)$.

background

In the module deriving variational free energy from the recognition composition law, a ProbDist on finite $ι$ is a structure whose prob function is strictly positive and sums to 1. The variational free energy is defined as VFE $q$ $E$ $β$ := ∑ $q_i E_i$ + (1/β) ∑ $q_i$ log $q_i$, which equals expected energy minus temperature times entropy for the Boltzmann reference. The module proves that the Boltzmann distribution minimizes this quantity and equates the minimum to the Helmholtz free energy.

proof idea

The proof first shows the negative KL sum is ≤ 0 by applying Real.log_le_sub_one_of_pos to each term $p_i$ log($q_i$/$p_i$) ≤ $p_i$($q_i$/$p_i$ - 1), then summing and using normalization of both distributions to reach zero. It next invokes the identity log($q$/$p$) = -log($p$/$q$) to relate the two sums, and concludes nonnegativity by linarith.

why it matters in Recognition Science

This result is invoked by boltzmann_minimizes_vfe, which establishes that the Boltzmann distribution achieves the VFE minimum and identifies that value with the Helmholtz free energy. It supplies the Gibbs inequality step in the derivation of variational free energy from the recognition composition law. The module records zero sorrys, confirming the inequality in the discrete finite setting.

scope and limits

formal statement (Lean)

  69theorem kl_nonneg (p q : ProbDist ι) :
  70    0 ≤ ∑ i, p.prob i * Real.log (p.prob i / q.prob i) := by

proof body

Tactic-mode proof.

  71  -- Equivalent: sum_i p_i log(p_i/q_i) >= 0
  72  -- Use log(x) >= 1 - 1/x (i.e. -log(1/x) <= 1 - 1/x → log(x) >= 1 - 1/x).
  73  -- Equivalent statement: -KL = sum p log(q/p) <= sum p (q/p - 1) = sum q - sum p = 0.
  74  -- So KL >= 0 follows.
  75  have h_neg_kl_le : ∑ i, p.prob i * Real.log (q.prob i / p.prob i) ≤ 0 := by
  76    have h_each : ∀ i, p.prob i * Real.log (q.prob i / p.prob i) ≤
  77        p.prob i * (q.prob i / p.prob i - 1) := by
  78      intro i
  79      have hp := p.prob_pos i
  80      have hq := q.prob_pos i
  81      have hratio_pos : 0 < q.prob i / p.prob i := div_pos hq hp
  82      have hlog_le : Real.log (q.prob i / p.prob i) ≤ q.prob i / p.prob i - 1 :=
  83        Real.log_le_sub_one_of_pos hratio_pos
  84      exact mul_le_mul_of_nonneg_left hlog_le (le_of_lt hp)
  85    calc ∑ i, p.prob i * Real.log (q.prob i / p.prob i)
  86        ≤ ∑ i, p.prob i * (q.prob i / p.prob i - 1) := Finset.sum_le_sum (fun i _ => h_each i)
  87      _ = ∑ i, (q.prob i - p.prob i) := by
  88          apply Finset.sum_congr rfl
  89          intro i _
  90          have hp := p.prob_pos i
  91          field_simp
  92      _ = (∑ i, q.prob i) - ∑ i, p.prob i := by rw [Finset.sum_sub_distrib]
  93      _ = 1 - 1 := by rw [q.prob_sum, p.prob_sum]
  94      _ = 0 := by ring
  95  have h_log_swap : ∀ i, Real.log (q.prob i / p.prob i) = -Real.log (p.prob i / q.prob i) := by
  96    intro i
  97    have hp := p.prob_pos i
  98    have hq := q.prob_pos i
  99    rw [show q.prob i / p.prob i = (p.prob i / q.prob i)⁻¹ by
 100      rw [inv_div]]
 101    rw [Real.log_inv]
 102  have h_neg_eq : ∑ i, p.prob i * Real.log (q.prob i / p.prob i) =
 103      -∑ i, p.prob i * Real.log (p.prob i / q.prob i) := by
 104    rw [← Finset.sum_neg_distrib]
 105    apply Finset.sum_congr rfl
 106    intro i _
 107    rw [h_log_swap i]
 108    ring
 109  linarith [h_neg_kl_le, h_neg_eq.symm.le]
 110
 111/-! ## Boltzmann minimizes VFE -/
 112
 113/-- VFE evaluated at the Boltzmann reference equals the Helmholtz free energy. -/

used by (1)

From the project-wide theorem graph. These declarations reference this one in their body.

depends on (6)

Lean names referenced from this declaration's body.