Content deleted Content added
No edit summary |
m →The transform: monospace |
||
(10 intermediate revisions by 4 users not shown) | |||
Line 3:
The '''move-to-front (MTF) transform''' is an [[code|encoding]] of [[data]] (typically a stream of [[byte]]s) designed to improve the performance of [[entropy encoding]] techniques of [[Data compression|compression]]. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression [[algorithm]].
This algorithm was first published by [[:ru:Рябко, Борис Яковлевич|Boris Ryabko]] under the name of "book stack" in 1980.<ref>{{cite journal |
==The transform==
Line 21:
1,1,13,1,1,1,0,0
{| class="wikitable"
|-
! Iteration
Line 80:
<syntaxhighlight lang="python">
from collections.abc import Generator, Iterable
from itertools import chain
class MoveToFront:
"""
>>> list(mtfwiki.encode("Wikipedia"))
[87, 105, 107, 1, 112, 104, 104, 3, 102]
>>> mtfwiki.decode([87, 105, 107, 1, 112, 104, 104, 3, 102])
'Wikipedia'
>>> list(mtfwiki.encode("wikipedia"))
[119, 106, 108, 1, 113, 105, 105, 3, 103]
>>> mtfwiki.decode([119, 106, 108, 1, 113, 105, 105, 3, 103])
'wikipedia'
"""
def __init__(self, common_dictionary: Iterable[int] = range(256)):
it is simpler to just agree on an initial set.
Here we use the 256 possible values of a byte.
"""
self.common_dictionary = common_dictionary
def encode(self, plain_text: str) -> Generator[int]:
# Changing the common dictionary is a bad idea. Make a copy.
dictionary
# Read in each character
for c in plain_text.encode("latin-1"): # Change to bytes for 256.
# Find the rank of the character in the dictionary [O(k)]
rank = dictionary.index(c) # the encoded character
yield rank
# Update the dictionary [O(1)]
dictionary.pop(rank)
dictionary.insert(0, c)
def decode(self, compressed_data: Iterable[int]) -> str:
"""
Inverse function that recover the original text
"""
dictionary = list(self.common_dictionary)
plain_text = []
# Read in each rank in the encoded text
for rank in compressed_data:
# Read the character of that rank from the dictionary
plain_text.append(dictionary[rank])
# Update the dictionary
e = dictionary.pop(rank)
dictionary.insert(0, e)
return bytes(plain_text).decode("latin-1") # Return original string
</syntaxhighlight>
In this example we can see the MTF code taking advantage of the three repetitive {{code|i}}'s in the input word. The common dictionary here, however, is less than ideal since it is initialized with more commonly used [[ASCII]] printable characters put after little-used control codes, against the MTF code's design intent of keeping what's commonly used in the front. If one rotates the dictionary to put the more-used characters in earlier places, a better encoding can be obtained:
<syntaxhighlight lang="python">
def block32(x):
return (x + i for i in range(32))
class MoveToFrontMoreCommon(MoveToFront):
"""
>>> mtfwiki = MoveToFrontMoreCommon()
>>> list(mtfwiki.encode("Wikipedia"))
[55, 10, 12, 1, 17, 9, 9, 3, 7]
"""
def __init__(self):
super().__init__(
chain( # Sort the ASCII blocks:
block32(ord("a") - 1), # first lowercase,
block32(ord("A") - 1), # then uppercase,
block32(ord("!") - 1), # punctuation/number,
block32(0), # the control code,
range(128, 256), # and finally the non-ASCII stuff
)
)
if __name__ == "__main__":
import doctest
doctest.testmod()
</syntaxhighlight>
Line 166 ⟶ 189:
{{main | Self-organizing list#Move to Front Method (MTF) }}
* The term Move To Front (MTF) is also used in a slightly different context, as a type of a dynamic [[linked list]]. In an MTF list, each element is moved to the front when it is accessed.<ref>{{Cite journal | doi = 10.1145/359997.360000| title = On self-organizing sequential search heuristics| journal = Communications of the ACM| volume = 19| issue = 2| pages = 63–67| year = 1976| last1 = Rivest | first1 =
==References==
|