Improving PPM Algorithm Using Dictionaries

Farooq Khan; Jianzhong (Charlie) Zhang; Yichuan Hu; Ying Li

arxiv: 1012.3790 · v2 · pith:377RGW5Pnew · submitted 2010-12-17 · 💻 cs.IT · math.IT

Improving PPM Algorithm Using Dictionaries

Yichuan Hu , Jianzhong (Charlie) Zhang , Farooq Khan , Ying Li This is my paper

classification 💻 cs.IT math.IT

keywords algorithmcharacter-basedmodelsdictionaryencodetextwordsalgorithms

0 comments

read the original abstract

We propose a method to improve traditional character-based PPM text compression algorithms. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode non-words and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing; 2) it does not need any explicit codeword to identify switch between context and dictionary models; 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost. Test results show that significant improvements can be obtained over character-based PPM, especially in low order cases.

This paper has not been read by Pith yet.

Improving PPM Algorithm Using Dictionaries

discussion (0)