Move-to-front transform: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 08:12, 4 March 2024 edit 95.24.165.148 (talk) No edit summary ← Previous edit		Latest revision as of 08:01, 9 August 2024 edit undo Cedar101 (talk \| contribs) Extended confirmed users 16,377 edits m →‎The transform: monospace
(10 intermediate revisions by 4 users not shown)
Line 3: The '''move-to-front (MTF) transform''' is an [[code\|encoding]] of [[data]] (typically a stream of [[byte]]s) designed to improve the performance of [[entropy encoding]] techniques of [[Data compression\|compression]]. When efficiently implemented, it is fast enough that its benefits usually justify including it as an extra step in data compression [[algorithm]]. This algorithm was first published by [[:ru:Рябко, Борис Яковлевич\|Boris Ryabko]] under the name of "book stack" in 1980.<ref>{{cite journal \|~~author~~last=Ryabko, \|first=Boris Yakovlevich \|author-link=:ru:Рябко, Борис Яковлевич \|title=Data compression by means of a "book stack” \|journal=Problems of Information Transmission \|year=1980 \|volume=16 \|issue=4 \|pages=265-269 \|url=https://www.mathnet.ru/links/185b89a03327cb0466fb2846e55f6404/ppi1458.pdf \|zbl=0466.94007}}</ref> Subsequently, it was rediscovered by J.K. Bentley et al. in 1986,<ref>{{cite journal\|url=https://www.cs.cmu.edu/~sleator/papers/Adaptive-data-compression.htm\|title=A Locally Adaptive Data Compression Scheme\|~~author1~~first1=J.Jon L.Louis \|last1=Bentley \|~~author2~~author-link1=D.Jon D.Bentley (computer scientist) \|first2=Daniel Dominic Kaplan \|last2=Sleator \|~~author3~~author-link2=R.Daniel E.Sleator \|first3=Robert Endre \|last3=Tarjan \|author-link3= Robert Tarjan \|~~author4~~first4=V. K. \|last4=Wei \|journal=Communications of the ACM\|volume=29\|issue=4\|pages=320–330\|year=1986\|doi=10.1145/5684.5688\|citeseerx=10.1.1.69.807\|s2cid=5854590 }}</ref> as attested in the explanatory note.<ref>{{cite journal \|~~author1~~last1=Ryabko, B.\|first1=Boris ~~Ya.~~Yakovlevich \|~~author2~~author-link1=~~Horspool~~:ru:Рябко, Борис Яковлевич \|first2=R. Nigel \|~~author3~~last2=Horspool \|author-link2=Nigel Horspool \|first3=Gordon Villy \|last3=Cormack, \|author-link3=Gordon V.Cormack \|title=Comments to: "A locally adaptive data compression scheme" by J. L. Bentley, D. D. Sleator, R. E. Tarjan and V. K. Wei.\|journal=Comm. ACM\|volume=30\|year=1987\|issue=9\|pages=792–794\|doi=10.1145/30401.315747\|s2cid=16138142 \|doi-access=free}}</ref> ==The transform== Line 21: 1,1,13,1,1,1,0,0 {\| class="wikitable" ~~border~~style="1font-family:monospace;" \|- ! Iteration Line 80: <syntaxhighlight lang="python"> # """mtfwiki.py""" ~~from typing import Tuple, Union~~ ~~# Instead of always transmitting an "original" dictionary, it is simpler to just agree on an initial set.~~ ~~# Here we use the 256 possible values of a byte:~~ ~~common_dictionary = list(range(256))~~ from collections.abc import Generator, Iterable ~~def encode(plain_text: str) -> list[int]:~~ from itertools import chain ~~# Change to bytes for 256.~~ ~~plain_text = plain_text.encode("utf-8")~~ ~~# Changing the common dictionary is a bad idea. Make a copy.~~ ~~dictionary = common_dictionary.copy()~~ class MoveToFront: ~~# Transformation~~ """ ~~compressed_text = list() # Regular array~~ ~~rank~~>>> mtfwiki = 0MoveToFront() >>> list(mtfwiki.encode("Wikipedia")) [87, 105, 107, 1, 112, 104, 104, 3, 102] >>> mtfwiki.decode([87, 105, 107, 1, 112, 104, 104, 3, 102]) 'Wikipedia' >>> list(mtfwiki.encode("wikipedia")) [119, 106, 108, 1, 113, 105, 105, 3, 103] >>> mtfwiki.decode([119, 106, 108, 1, 113, 105, 105, 3, 103]) 'wikipedia' """ def __init__(self, common_dictionary: Iterable[int] = range(256)): ~~# Read in each character~~ ~~for~~ c in ~~plain_text:~~ """ ~~rank = dictionary.index(c) # Find the rank~~Instead of ~~the~~always ~~character~~transmitting inan ~~the~~"original" dictionary ~~[O(k)]~~, it is simpler to just agree on an initial set. ~~compressed_text.append(rank) # Update the encoded text~~ Here we use the 256 possible values of a byte. """ self.common_dictionary = common_dictionary def encode(self, plain_text: str) -> Generator[int]: ~~# Update the dictionary [O(k)]~~ # Changing the common dictionary is a bad idea. Make a copy.~~pop(rank)~~ dictionary~~.insert(0,~~ c= list(self.common_dictionary) # Read in each character ~~return compressed_text # Return the encoded text~~ for c in plain_text.encode("latin-1"): # Change to bytes for 256. # Find the rank of the character in the dictionary [O(k)] rank = dictionary.index(c) # the encoded character yield rank # Update the dictionary [O(1)] dictionary.pop(rank) dictionary.insert(0, c) def decode(self, compressed_data: Iterable[int]) -> str: """ Inverse function that recover the original text """ dictionary = list(self.common_dictionary) plain_text = [] # Read in each rank in the encoded text for rank in compressed_data: # Read the character of that rank from the dictionary plain_text.append(dictionary[rank]) # Update the dictionary e = dictionary.pop(rank) dictionary.insert(0, e) return bytes(plain_text).decode("latin-1") # Return original string </syntaxhighlight> In this example we can see the MTF code taking advantage of the three repetitive {{code\|i}}'s in the input word. The common dictionary here, however, is less than ideal since it is initialized with more commonly used [[ASCII]] printable characters put after little-used control codes, against the MTF code's design intent of keeping what's commonly used in the front. If one rotates the dictionary to put the more-used characters in earlier places, a better encoding can be obtained: ~~The inverse will recover the original text:~~ <syntaxhighlight lang="python"> def block32(x): ~~def decode(compressed_data: list[int]) -> str:~~ return (x + i for i in range(32)) ~~compressed_text = compressed_data~~ ~~dictionary = common_dictionary.copy()~~ ~~plain_text = []~~ ~~# Read in each rank in the encoded text~~ ~~for rank in compressed_text:~~ ~~# Read the character of that rank from the dictionary~~ ~~plain_text.append(dictionary[rank])~~ class MoveToFrontMoreCommon(MoveToFront): ~~# Update the dictionary~~ """ ~~e = dictionary.pop(rank)~~ >>> mtfwiki = MoveToFrontMoreCommon() ~~dictionary.insert(0, e)~~ >>> list(mtfwiki.encode("Wikipedia")) [55, 10, 12, 1, 17, 9, 9, 3, 7] """ def __init__(self): ~~return bytes(plain_text).decode("utf-8") # Return original string~~ super().__init__( ~~</syntaxhighlight>~~ chain( # Sort the ASCII blocks: block32(ord("a") - 1), # first lowercase, block32(ord("A") - 1), # then uppercase, block32(ord("!") - 1), # punctuation/number, block32(0), # the control code, range(128, 256), # and finally the non-ASCII stuff ) ) ~~Example output:~~ ~~<syntaxhighlight lang="pycon">~~ ~~>>> import mtfwiki~~ ~~>>> mtfwiki.encode('Wikipedia')~~ ~~[87, 105, 107, 1, 112, 104, 104, 3, 102]~~ ~~>>> mtfwiki.decode([119, 106, 108, 1, 113, 105, 105, 3, 103])~~ ~~'wikipedia'~~ ~~</syntaxhighlight>~~ if __name__ == "__main__": In this example we can see the MTF code taking advantage of the three repetitive {{code\|i}}'s in the input word. The common dictionary here, however, is less than ideal since it is initialized with more commonly used [[ASCII]] printable characters put after little-used control codes, against the MTF code's design intent of keeping what's commonly used in the front. If one rotates the dictionary to put the more-used characters in earlier places, a better encoding can be obtained: import doctest doctest.testmod() ~~<syntaxhighlight lang="pycon">~~ ~~>>> import mtfwiki~~ ~~>>> block32 = lambda x : [x + i for i in range(32)]~~ ~~>>> # Sort the ASCII blocks: first lowercase, then uppercase, punctuation/number, and finally the control code and the non-ASCII stuff~~ ~~>>> mtfwiki.common_dictionary = block32(0x60) + block32(0x40) + block32(0x20) + block32(0x00) + list(range(128, 256))~~ ~~>>> mtfwiki.encode('Wikipedia')~~ ~~[55, 10, 12, 1, 17, 9, 9, 3, 7]~~ </syntaxhighlight> Line 166 ⟶ 189: {{main \| Self-organizing list#Move to Front Method (MTF) }} * The term Move To Front (MTF) is also used in a slightly different context, as a type of a dynamic [[linked list]]. In an MTF list, each element is moved to the front when it is accessed.<ref>{{Cite journal \| doi = 10.1145/359997.360000\| title = On self-organizing sequential search heuristics\| journal = Communications of the ACM\| volume = 19\| issue = 2\| pages = 63–67\| year = 1976\| last1 = Rivest \| first1 = R.Ronald Linn \| author-link1=Ron Rivest \| s2cid = 498886\| doi-access = free}}</ref> This ensures that, over time, the more frequently accessed elements are easier to access. ==References==