arithmeticCoding
Arithmetic coding encodes an arbitrary message as a single real number in the unit interval whose binary digits form the compressed representation. Information theorists cite the construction when establishing that the average code length approaches the source entropy H for long messages under any distribution. The declaration is a direct string assignment that records the asymptotic optimality property.
claimArithmetic coding is the scheme that maps a message to a real number $r$ in $[0,1)$ such that the expected code length $L$ satisfies $L$ approaches the Shannon entropy $H$ of the source as message length tends to infinity.
background
The Information.Compression module derives fundamental limits on lossless compression from J-cost. The shifted cost satisfies $H(x) = J(x) + 1$, converting the Recognition Composition Law into the d'Alembert equation $H(xy) + H(x/y) = 2 H(x) H(y)$. Entropy of a configuration equals its total defect, which vanishes for the minimum-entropy initial state. Compression is identified with J-cost minimization, so the entropy bound becomes the minimum J-cost required for faithful representation.
proof idea
The declaration is a one-line definition that assigns the descriptive string to arithmeticCoding, followed by inline comments contrasting it with dictionary-based Lempel-Ziv methods.
why it matters in Recognition Science
The definition supplies the concrete mechanism realizing the entropy bound inside the INFO-003 J-cost framework. It directly supports the module's claim that compression equals J-cost minimization and connects to the source-coding theorem via the H functional and defect-based entropy. No downstream uses are recorded, leaving its role as a benchmark for later lossless and lossy compression statements.
scope and limits
- Does not implement the arithmetic encoding procedure.
- Does not prove convergence of code length to entropy.
- Does not specify the source probability distribution.
- Does not compute explicit J-cost values for encoded messages.
formal statement (Lean)
191def arithmeticCoding : String :=
proof body
Definition body.
192 "Near-optimal, L → H for long messages"
193
194/-- Lempel-Ziv compression (LZ77, LZ78, LZW):
195 - Dictionary-based
196 - Adaptive (learns patterns)
197 - Achieves entropy in limit
198 - Used in ZIP, PNG, GIF -/