LZW compression has its roots in the work of Jacob Ziv and Abraham Lempel. In 1977, they published a paper on "sliding-window" compression, and followed it with another paper in 1978 on "dictionary" based compression. These algorithms were named LZ77 and LZ78, respectively. Then in 1984, Terry Welch made a modification to LZ78 which became very popular and was dubbed LZW (guess why). The LZW algorithm is what we are going to talk about here. The Concept: Many files, especially text files, have certain strings that repeat very often, for example " the ". With the spaces, the string takes 5 bytes, or 40 bits to encode. But what if we were to add the whole string to the list of characters after the last one, at 256. Then every time we came across " the ", we could send the code 256 instead of 32,116,104,101,32. This would take 9 bits instead of 40 (since 256 does not fit into 8 bits). This is exactly the approach that LZW compression takes. It starts with a "dictionary" of all the single character with indexes 0..255. It then starts to expand the dictionary as information gets sent through. Pretty soon, redundant strings will be coded as a single bit, and compression has occured. The Algorithm:
Ok, so how is this done? Here is the basic algorithm:
set w = NIL loop read a character k if wk exists in the dictionary w = wk else output the code for w add wk to the dictionary w = k endloop So what happens here? The program reads one character at a time. If the code is in the dictionary, then it adds the character to the current work string, and waits for the next one. This occurs on the first character as well. If the work string is not in the dictionary, (such as when the second character comes across), it adds the work string to the dictionary and sends over the wire the works string without the new character. It then sets the work string to the new character. How about decompression? read a character k output k w = k loop read a character k entry = dictionary entry for k output entry add w + first char of entry to the dictionary w = entry endloop The nice thing is that the decompressor builds its own dictionary on its side, that matches exactly the compressor's, so that only the codes need to be sent. If you want more detail, there is an article on LZW Compression by Mark Nelson, the author of the Data Compression Book. The Applet: To help demonstrate exactly what happens in the LZW algorithm, I wrote an Interactive LZW Compressor/Decompressor. Simply type in a string into the input and the applet will build the dictionaries and send the string across.
|