|
|
LZW Compression/Decompression By Dennis R. Cohen, Santa Clara, CA Adaptive Lempel-Zev-Welch Compression/Decompression Data compression is a topic that becomes of interest to programmers (and the people who hire us) as our programs get larger and the data sets they manipulate grow. One of the most prevalent examples of data compression for most of us is compressing and archiving files either for backup purposes or for transmission via modem or network. The most common Mac programs for this are StuffIt (by Raymond Lau) and PackIt (by Harry Chesley). PackIt uses a technique known as Huffman-encoding; StuffIt employs Lempel-Zev-Welch (LZW, for short), Huffman, or Run-Length-Encoding (RLE) based upon the characteristics of the input data. While it is not necessarily the most efficient technique for all files, LZW has proven in practice to be the most efficient general technique and is the subject of this article. The seminal discussion concerning LZW compression may be found in:
A Technique for High Performance Data Compression by Terry Welch IEEE Computer, June 1984 (v17.6) Since Macintosh files are really two files (resource fork and data fork) and have various other pieces of associated data, we start each compressed file with a header record which contains the original name of the file, the size of each fork, and the Finder information (creator, file-type, etc) so that we will be able to reconstruct an indistinguishable copy of the original file. LZW operates by finding common “substrings” and replaces them by a fixed-length (frequently 12-bit) code. This technique is deterministic and is accomplished during a single pass over the input file. The decompression procedure needs no input table, and rebuilds the table as it goes. A 12-bit code means that there are only 4K possible codes. We’ll be using a 14-bit code and thus allow 16K possible encodings before our string table fills up. The table is initialized to contain encodings of the 256 possible single-byte values. The decompression algorithm translates each received code into a prefix string and an extension byte. The extension byte is pushed onto a stack and the prefix translated again, cycling until the prefix is atomic. The entire (decompressed) code is then output by popping the stack until it is empty. Due to the nature of the algorithm, the first code encountered is known to be of a single byte (atomic) and can be directly converted. An update to the string table is made for each code received after the first one, which is known to already be in the table from the initialization. When a code has been translated, its final byte is used as the suffix and combined with the prior string to add the new entry to the string table so long as the string table hasn’t been filled. This new string is assigned a new value that is the same code the compressor assigned to that string. In this manner, the decompressor incrementally reconstructs the same string table that the compressor used. Unfortunately, there is an abnormal case for which this algorithm does not work. The abnormal case occurs whenever an input character string containing a cyclic sequence of the form byte-code-byte-code-byte already appears in the compressor string table. The decompression algorithm is modified (see step 3 of the decompression description, below) to handle the special case. To Compress 1. Initialize the table to contain all single-byte strings 2. Read the first byte, c, from the input file. Set the prefix, w, to that byte. 3. Read the next input byte into c. 4. If at end of file then go to step 7. 5. If «w»c is in the string table then set «w» to «w»c and go to 3. 6. Output Code(«w»); Put «w»c in the string table; Set «w» to c; go to 3 7. Output Code(«w») and signify completion To Decompress 1. Read the first input code and store in both code and oldcode. Since the first code is known to be atomic (cf. 2, above), code is the encoding of c, output c, and set the extension to c. 2. Read the next code and save the value in incode. If at end of file then go to 8. 3. If the code is not in the string table, we have the special case so, output extension, set code to oldcode, and set incode to the coding of «w»c 4. If the retrieved code is the code for «w»c then push c onto the stack, set code to code of «w» and repeat this step. 5. If the code is atomic, output the byte it represents and set the extension to that byte. 6. While the stack is non-empty, output the byte on top and pop the stack. 7. Put «oldcode»c into the string table, set oldcode to incode and go to 2. 8. Signify completion. In the case of adaptive LZW (which is demonstrated by the source in this article) the compression is tested periodically, in our case every 10000 bytes, to see whether the compression ratio has declined. If it has, a reserved code is saved to the output file, the string table is reinitialized, and we start the process again. Obviously, during decompression, if the reset code is encountered then the program reinitializes the string table and then continues blithely with its processing. A hashing algorithm is employed to determine the code for a string (prefix + suffix byte). In the example program, I just shift and xor the two values. This is a very simplistic hash evaluation and will result in a lot of collisions; however, a more efficient hashing algorithm could introduce additional processing overhead in its computation. One such possibility is included as a comment. The choice of hash algorithm is yours, just be sure that you use the same one for both compression and decompression. There are a lot of places where this program could be optimized; however, most of the optimizations are at the expense of clarity and are avoided here. The major difference between this straightforward presentation of adaptive LZW and that used by Raymond Lau in his StuffIt program follows. StuffIt uses the unix compress code to do its compression with 14-bit compression being selected. The unix compress code uses up all the 9-bit codes first, then the 10-bit codes, etc. until it has used up the 14-bit codes, meanwhile packing them in VAX byte/bit order. This involves a great deal of masking and shifting that obscures the general algorithm. We are using a general string table into which we hash and always store 14-bit values in the “natural” 680x0 byte order, although we do buffer our partial bytes and pack the output data. While our example is for compressing files, it could be modified to compress and decompress structures in memory by replacing the GetByte routine on the compression side and the PutByte routine on the decompression side. All the compression code cares about is that it is fed data to be compressed one byte at a time, not from whence the data came. Similarly, all the decompression code cares about is that it be fed one-byte chunks of data to be decompressed and that it will spew the result out a byte at a time. The example programs could also be consolidated into a single application with the introduction of a less modal (and more Macish) interface, but that is not the purpose of this article. I used TMLPascal II (vers. 3.0) for this article but have also recompiled with both MPW 3.0 and Think Pascal 2.01 - Think Pascal requires some changes in the area of compiler directives, USES statements, and references to _DataInit; the MPW Pascal compiler does not require any changes to the source (just the makefile). Turbo Pascal would require a lot of changes in the source due to a number of incompatible features (OR, AND, SHL, etc. for bit operations, for example). Listing: LComp.Proj ############################################################# # # Created: Monday, October 10, 1988 4:08:42 PM # # Project Name: LComp # Project Type: Application # # Finder Type: APPL # Finder Creator: ???? # Set Bundle Bit: TRUE # # Project Source Files: # LComp.r # lcomp.p lcomp.p.o ƒ � lcomp.p TMLPascal lcomp.p LComp ƒƒ � lcomp.p.o Link -w -t ‘APPL’ -c ‘LZWC’ � lcomp.p.o � “{Libraries}”Runtime.o � “{Libraries}”Interface.o � “{TMLPLibraries}”PasLib.o � -o LComp LComp ƒƒ � LComp.r Rez -append -o LComp � LComp.r SetFile -a B LComp Listing: LComp.p {$R-} {$D+} {$DEFC DEBUG} {$SETC DEBUG=TRUE} PROGRAM LComp; { Simple case LZW compression } USES MemTypes, QuickDraw, OSIntf, ToolIntf, PackIntf; CONST maxBuff = 8192; {i/o buffer size} maxTab = 16383; {Table size minus 1 ($3FFF)} noPrev = $7FFF; eofChar = -2; endList = -1; empty = -3; clearCode = 256;{Reserved code to signal adaptive reset ($100) } checkGap = 10000; {How frequently do we check for adaptive?} TYPE StringTableEntry = RECORD prevByte: Integer; follByte: Integer; next: Integer; used: Boolean; reserved: Boolean; END; StringTableArray = ARRAY [0..maxTab] OF StringTableEntry; {128K structure unless packed} StringTablePtr = ^StringTableArray; IntPtr = ^Integer; Buffer = PACKED ARRAY [1..maxBuff] OF Char; BufPtr = ^Buffer; HeaderRecord = RECORD name: String[31]; dfSize: LongInt; rfSize: LongInt; fndrInfo: FInfo; END; Remainder = (none, sixBit, fourBit, twoBit); VAR inRef: Integer; {Reference number of input file} outRef: Integer;{Reference number of output file} inVRefNum: Integer;{Volume/WD reference num. of input file} outVRefNum: Integer;{Volume/WD reference number of output file} eofSignal: Boolean; {Flag that it’s time to clean up} inBufSize: LongInt; {Count of characters in input buffer } inputPos: Integer;{Position in input buffer} outputPos: Integer; {Position in output buffer} bytesRead: LongInt; {Total bytes read from input file} bytesWritten: LongInt; {Total bytes written to output file} ratio: Extended;{Compression ratio (bytesRead/bytesWritten)} checkPoint: LongInt;{Next time we check to see whether table adaptation necessary} inputBuffer: BufPtr;{Dynamically allocated data storage} outputBuffer: BufPtr; { “ } stringTable: StringTablePtr; infileName:Str255;{Name of the file we’re compressing} tableUsed: Integer;{Number of entries currently in string table} outputCode: Integer;{Code (14-bit) that we’re going to output} carryOver: Remainder; {How many bits we have in the code we’re building} doingDFork: Boolean;{Flag that tells which fork of the file we’re compressing} fsErr: OSErr; {Result of last file system call} dataForkSize: LongInt; {Number of bytes in data fork} rsrcForkSize: LongInt; {Number of bytes in resource fork} progWindow: WindowPtr; {Window where we display progress} boundsRect: Rect; {Bounding rect of the progress window} hdrRec: HeaderRecord; {File information so that decompress will get things right} resetCode: Integer; {This is the hashCode for clearCode} PROCEDURE _DataInit; EXTERNAL; {MPW specific} PROCEDURE FileAlert(str: Str255); CONST fsAlert =1111; VAR item: Integer; BEGIN ParamText(str, ‘’, ‘’, ‘’); item := StopAlert(fsAlert, NIL); fsErr := FSClose(inRef); fsErr := FSClose(outRef); fsErr := FlushVol(NIL, outVRefnum); END {FileAlert} ; {$IFC DEBUG} PROCEDURE DebugAlert(l1, l2: LongInt); CONST dbgAlert = 1112; VAR s1, s2: Str255; item: Integer; BEGIN NumToString(l1, s1); NumToString(l2, s2); ParamText(s1, s2, ‘’, ‘’); item := NoteAlert(dbgAlert, NIL); END {DebugAlert} ; {$ENDC} PROCEDURE ShowProgress; VAR savePort: GrafPtr; aStr: Str255; BEGIN GetPort(savePort); SetPort(progWindow); EraseRect(progWindow^.portRect); NumToString(bytesWritten, aStr); MoveTo(5, 10); DrawString(aStr); NumToString(bytesRead, aStr); MoveTo(5, 25); DrawString(aStr); NumToString(tableUsed, aStr); MoveTo(5, 40); DrawString(aStr); SetPort(savePort); END {ShowProgress} ; FUNCTION HashIt(prevC, follC: Integer): Integer; {“Dumb” hash routine, must match the routine in decompress} VAR temp, local: LongInt; BEGIN {Possible alternative commented out below} {local := BOR((prevC+follC), $00008000); temp := local * local; local := BAND(BSR(temp, 7), maxTab);} HashIt := BAND(BXOR(BSL(prevC, 5), follC), maxTab); END {HashIt} ; FUNCTION GetHashCode(prevC, follC: Integer): Integer; { Return value is the hash code for <w>c string } VAR index: Integer; index2: Integer; BEGIN index := HashIt(prevC, follC); {If the entry isn’t already used we have a hash code} IF (stringTable^[index].used) THEN BEGIN {Entry already used, skip to end of collision list} WHILE stringTable^[index].next <> endList DO index := stringTable^[index].next; {Begin a linear probe down a bit from last entry in the collision list} index2 := BAND(index + 101, maxTab); {Look for an unused entry using linear probing} WHILE stringTable^[index2].used DO index2 := BAND(Succ(index2), maxTab); {Point previous end of collision list at this new node} stringTable^[index].next := index2; GetHashCode := index2; END ELSE GetHashCode := index; END {GetHashCode} ; PROCEDURE MakeTableEntry(prevC, follC: Integer); VAR aCode: Integer; BEGIN IF tableUsed <= maxTab THEN BEGIN aCode := GetHashCode(prevC, follC); WITH stringTable^[aCode] DO BEGIN used := true; next := endList; prevByte := prevC; follByte := follC; END; tableUsed := tableUsed + 1; END; END {MakeTableEntry} ; FUNCTION LookupString(prevC, follC: Integer): Integer; VAR index: Integer; found: Boolean; myEntry: StringTableEntry; BEGIN index := HashIt(prevC, follC); LookupString := endList; found := FALSE; {Search list of collision entries for one that matches <w>c} REPEAT myEntry := stringTable^[index]; IF (myEntry.prevByte = prevC) & (myEntry.follByte = follC) THEN found := true ELSE index := myEntry.next; UNTIL found OR (index = endList); { Return index if <w>c found, endList otherwise } IF found THEN LookupString := index; END {LookupString} ; PROCEDURE GetChar(VAR c: Integer); { Read a character from the input file. If the input file is the data fork and at the end. Close it and open the resource fork, inputting from it. } VAR logEOF: LongInt; BEGIN inputPos := inputPos + 1; IF inputPos > inBufSize THEN BEGIN inBufSize := maxBuff; fsErr := FSRead(inRef, inBufSize, Ptr(inputBuffer)); inputPos := 1; END; IF inBufSize = 0 THEN BEGIN {We’re in a possible eof situation} IF doingDFork THEN BEGIN {Check for resource fork} doingDFork := false; fsErr := FSClose(inRef); fsErr := OpenRF(infileName, inVRefnum, inRef); IF fsErr = noErr THEN BEGIN fsErr := GetEOF(inRef, logEOF); rsrcForkSize := logEOF; hdrRec.rfSize := logEOF; fsErr := SetFPos(inRef, fsFromStart, 0); inputPos := 1; inBufSize := maxBuff; fsErr := FSRead(inRef, inBufSize, Ptr(inputBuffer)); IF inBufSize = 0 THEN BEGIN {Empty resource fork} c := eofChar; eofSignal := true; END ELSE BEGIN c := Ord(inputBuffer^[inputPos]); bytesRead := bytesRead + 1; END; END ELSE BEGIN {No resource fork, we’re done!} rsrcForkSize := 0; hdrRec.rfSize := 0; eofSignal := true; c := eofChar; Exit(GetChar); END; END ELSE BEGIN {We are done, eof has been reached!} eofSignal := true; c := eofChar; END; END ELSE BEGIN c := Ord(inputBuffer^[inputPos]); bytesRead := bytesRead + 1; END; END {GetChar} ; PROCEDURE PutChar(c: Integer); VAR count: LongInt; BEGIN IF outputPos >= maxBuff THEN BEGIN count := maxBuff; fsErr := FSWrite(outRef, count, Ptr(outputBuffer)); IF fsErr <> noErr THEN FileAlert(‘Write error in PutChar’); outputPos := 0; ShowProgress; END; outputPos := outputPos + 1; bytesWritten := bytesWritten + 1; outputBuffer^[outputPos] := Chr(c); END {PutChar} ; PROCEDURE InitStrTable; VAR i: Integer; BEGIN tableUsed := 0; FOR i := 0 TO maxTab DO BEGIN WITH stringTable^[i] DO BEGIN prevByte := noPrev; follByte := noPrev; next := -1; used := false; reserved := false; END; END; {Enter all single ascii characters into the string table} FOR i := 0 TO clearCode DO MakeTableEntry(noPrev, i); END {InitStrTable} ; PROCEDURE Initialize; PROCEDURE InitManagers; BEGIN MaxApplZone; InitGraf(@thePort); InitFonts; FlushEvents(everyEvent, 0); InitWindows; InitMenus; TEInit; InitDialogs(NIL); InitCursor; UnLoadSeg(@_DataInit); {MPW-specific unload} END {InitManagers} ; BEGIN InitManagers; inputBuffer := BufPtr(NewPtr(SizeOf(Buffer))); outputBuffer := BufPtr(NewPtr(SizeOf(Buffer))); stringTable := StringTablePtr(NewPtr(SizeOf(StringTableArray))); inBufSize := 0; inputPos := 1; {With inBufSize set to zero this will force the 1st read} outputPos := 0; bytesRead := 0; bytesWritten := 0; doingDFork := true; outputCode := empty; carryOver := none; dataForkSize := 0; rsrcForkSize := 0; ratio := 0.0; checkPoint := checkGap; InitStrTable; resetCode := LookupString(noPrev, clearCode); END {Initialize} ; PROCEDURE GetTopLeft({using} dlogID: Integer; {returning} VAR where: Point); { -- Return the point where DLOG(dlogID) should have its top-left corner so as to be centered in the area below the menubar of the main screen. The centering is horizontal, vertically it should be one-third of the way. This is achieved by getting the DLOG resource and centering its rectangle within screenBits.bounds after adjusting screenBits.bounds by mBarHeight. } CONST mBarHeight = $0BAA; {Address of global integer containing menu bar height} VAR screenRect, dlogRect:Rect; mBarAdjustment: IntPtr; aDlog: DialogTHndl; BEGIN screenRect := screenBits.bounds; mBarAdjustment := IntPtr(mBarHeight); screenRect.top := screenRect.top + mBarAdjustment^; aDlog := DialogTHndl(GetResource(‘DLOG’, dlogID)); DetachResource(Handle(aDlog)); dlogRect := aDlog^^.boundsRect; WITH screenRect DO BEGIN where.v := ((bottom - top) - (dlogRect.bottom - dlogRect.top)) DIV 3; where.h := ((right - left) - (dlogRect.right - dlogRect.left)) DIV 2; END; END {GetTopLeft}; FUNCTION GetInputFile({returning} VAR refNum: Integer): Boolean; { -- Return false if the user cancels the request, true otherwise. If a file is selected for compression, open the file and pass back the refnum. The constant getDlgID is from PackIntf. Global side-effects of this routine include the initialization of a number of fields of the hdrRec global and the setting of the inVRefNum global.} CONST allFiles = -1; VAR tl: Point; reply: SFReply; typeList: SFTypeList; anErr, error: OSErr; finderInfo: FInfo; logEOF: LongInt; dtRec: DateTimeRec; BEGIN GetTopLeft(getDlgID, tl); {typeList doesn’t need to be initialized since we’re asking for all files with the -1} SFGetFile(tl, ‘’, NIL, allFiles, typeList, NIL, reply); IF reply.good THEN BEGIN error := FSOpen(reply.fName, reply.vRefnum, refNum); inVRefNum := reply.vRefnum; IF error = noErr THEN error := SetFPos(refNum, fsFromStart, 0) ELSE anErr := FSClose(refNum); IF error = noErr THEN BEGIN GetInputFile := true; infileName := reply.fName; anErr := GetEOF(refNum, logEOF); dataForkSize := logEOF; rsrcForkSize := 0;{for the moment -- corrected when the resource fork is opened} hdrRec.name := infileName; hdrRec.dfSize := dataForkSize; anErr := GetFInfo(reply.fName, inVRefnum, finderInfo); hdrRec.fndrInfo := finderInfo; END ELSE GetInputFile := false; END ELSE GetInputFile := false; END {GetInputFile} ; FUNCTION GetOutputFile({returning} VAR refNum: Integer): Boolean; VAR tl: Point; reply: SFReply; error: OSErr; count: LongInt; BEGIN GetTopLeft(putDlgID, tl); SFPutFile(tl, ‘’, ‘’, NIL, reply); IF reply.good THEN BEGIN error := FSOpen(reply.fName, reply.vRefnum, refNum); IF error <> noErr THEN BEGIN {File didn’t already exist, need to create it} error := Create(reply.fName, reply.vRefnum, ‘LZWC’, ‘DATA’); IF error = noErr THEN error := FSOpen(reply.fName, reply.vRefnum, refNum); IF error = noErr THEN BEGIN error := SetFPos(refNum, fsFromStart, 0); count := SizeOf(HeaderRecord); error := FSWrite(refNum, count, @hdrRec); END ELSE error := FSClose(refNum); END; IF error = noErr THEN BEGIN GetOutputFile := true; outVRefNum := reply.vRefnum; END ELSE GetOutputFile := false; END ELSE GetOutputFile := false; END {GetOutputFile} ; PROCEDURE Terminate; VAR count: LongInt; BEGIN ShowProgress; count := outputPos; fsErr := FSWrite(outRef, count, Ptr(outputBuffer)); IF fsErr = noErr THEN BEGIN fsErr := SetEOF(outRef, bytesWritten+SizeOf(HeaderRecord)); IF fsErr = noErr THEN BEGIN fsErr := SetFPos(outRef, fsFromStart, 0); IF fsErr = noErr THEN BEGIN count := SizeOf(HeaderRecord); fsErr := FSWrite(outRef, count, @hdrRec); IF (fsErr <> noErr) | (count <> SizeOf(hdrRec)) THEN FileAlert(‘Header update error in Terminate’); END ELSE FileAlert(‘Positioning error in Terminate’); fsErr := FSClose(outRef); fsErr := FSClose(inRef); fsErr := FlushVol(NIL, outVRefNum); END ELSE FileAlert(‘SetEOF Error in Terminate’); END ELSE FileAlert(‘Write Error in Terminate’); END {Terminate} ; PROCEDURE PutCode(hashCode: Integer); { If the output code word is empty, then put out the first 8 bits of the compression code and save the last six bits for the next time through. If it’s not empty, then put out the (saved) n bits from above prepended to the first 8-n bits of the new code. Then put out the last eight bits of this code. } BEGIN IF carryOver = none THEN BEGIN PutChar(BAND(BSR(hashCode, 6), $00FF)); {most significant 8 bits} outputCode := BAND(hashCode, $003F);{save 6 lsb for next time} carryOver := sixBit; END ELSE IF carryOver = twoBit THEN BEGIN PutChar(BAND(BSL(outputCode, 6), $00C0) + BAND(BSR(hashCode, 8), $003F)); {leftover 2 + first 6} PutChar(BAND(hashCode, $00FF)); {least significant 8 bits} outputCode := empty;{nothing left} carryOver := none; END ELSE IF carryOver = fourBit THEN BEGIN PutChar(BAND(BSL(outputCode, 4), $00F0) + BAND(BSR(hashCode, 10), $000F)); {leftover 4 + 4 msbits} PutChar(BAND(BSR(hashCode, 2), $00FF)); {next 8 bits} outputCode := BAND(hashCode, $0003);{save these two bits} carryOver := twoBit; END ELSE IF carryOver = sixBit THEN BEGIN PutChar(BAND(BSL(outputCode, 2), $00FC) + BAND(BSR(hashCode, 12), $0003)); {leftover 6 + first 2 bits} PutChar(BAND(BSR(hashCode, 4), $00FF)); {next 8 bits} outputCode := BAND(hashCode, $000F);{four bits left} carryOver := fourBit; END; END {PutCode} ; PROCEDURE CheckReset; { -- CheckReset tests the compression ratio to guarantee that it is monotonic increasing. It modifies the global variables ratio and checkPoint. If the compression ratio has decreased since the last checkPoint, the string table is reinitialized, the code for a clearCode is issued to the output, and ratio is reset to zero. } VAR e1, e2, temp: Extended; BEGIN { Set the next checkPoint for checkGap from now } checkPoint := bytesRead + checkGap; e1 := bytesRead; e2 := bytesWritten; temp := e1 / e2; IF temp >= ratio THEN ratio := temp ELSE BEGIN ratio := 0.0; InitStrTable; PutCode(resetCode); END; END {CheckReset} ; PROCEDURE DoCompression; VAR c: Integer; w: Integer; wc: Integer; anEvent: EventRecord; BEGIN GetChar(c); w := LookupString(noPrev, c); GetChar(c); WHILE c <> eofChar DO BEGIN wc := LookupString(w, c); IF (wc = endList) THEN BEGIN PutCode(w); IF GetNextEvent(everyEvent, anEvent) THEN ; IF tableUsed <= maxTab THEN MakeTableEntry(w, c) ELSE IF bytesRead >= checkPoint THEN CheckReset; w := LookupString(noPrev, c) END ELSE w := wc; GetChar(c); END; PutCode(w); {Flush any remaining partial code to disk} IF carryOver = sixBit THEN PutChar(BAND(BSL(outputCode, 2), $00FC)) ELSE IF carryOver = fourBit THEN PutChar(BAND(BSL(outputCode, 4), $00F0)) ELSE IF carryOver = twoBit THEN PutChar(BAND(BSL(outputCode, 6), $00C0)); END {DoCompression} ; BEGIN Initialize; IF GetInputFile(inRef) THEN IF GetOutputFile(outRef) THEN BEGIN SetRect(boundsRect, 100, 50, 250, 100); progWindow := NewWindow(NIL, boundsRect, ‘Bytes Read’, true, noGrowDocProc, Pointer(-1), false, 0); DoCompression; Terminate; {$IFC DEBUG} DebugAlert(bytesRead, bytesWritten); {$ENDC} END; END. Listing: LComp.r #include “Types.r” #include “SysTypes.r” resource ‘ALRT’ (1111, “FileSytem Alert”, preload, nonpurgeable) { {100, 100, 250, 400}, 1111, { OK, visible, silent; OK, visible, silent; OK, visible, silent; OK, visible, silent } }; resource ‘DITL’ (1111, preload, nonpurgeable) { { /*1*/ {115, 75, 135, 135}, button { enabled, “OK” }; /*2*/ {30, 80, 60, 290}, StaticText { disabled, “FileSystem Error: ^0” } } }; resource ‘ALRT’ (1112, “Debugging Alert”, preload, nonpurgeable) { {100, 100, 250, 400}, 1112, { OK, visible, silent; OK, visible, silent; OK, visible, silent; OK, visible, silent } }; resource ‘DITL’ (1112, preload, nonpurgeable) { { /*1*/ {115, 75, 135, 135}, button { enabled, “OK” }; /*2*/ {30, 80, 45, 290}, StaticText { disabled, “^0” }; /*3*/ {46, 80, 61, 290}, StaticText { disabled, “^1” } } }; resource ‘BNDL’ (1001) { ‘LZWC’, 0, { /* array TypeArray: 2 elements */ /* [1] */ ‘ICN#’, { /* array IDArray: 1 element */ /* [1] */ 0, 1001 }, /* [2] */ ‘FREF’, { /* array IDArray: 1 element */ /* [1] */ 0, 1001 } } }; resource ‘FREF’ (1001) { ‘APPL’, 0, “” }; resource ‘ICN#’ (1001, “LZWC APPL”, purgeable) { { /* array: 2 elements */ /* [1] */ $”0000 0000 0000 0000 0000 0000 0000 0300" $”0000 0C80 0000 3040 0000 C820 0003 0410" $”000C 8208 0030 4104 00C8 208E 0104 105A” $”0282 0862 0441 05A2 0820 8622 1010 5A22" $”2008 6222 4005 A222 FFFE 2222 9292 2226" $”9292 2238 9292 2260 9292 2380 9292 2600" $”9292 3800 9292 6000 9293 8000 FFFE”, /* [2] */ $”0000 0000 0000 0000 0000 0000 0000 0300" $”0000 0F80 0000 3FC0 0000 FFE0 0003 FFF0" $”000F FFF8 003F FFFC 00FF FFFE 01FF FFFE” $”03FF FFFE 07FF FFFE 0FFF FFFE 1FFF FFFE” $”3FFF FFFE 7FFF FFFE FFFF FFFE FFFF FFFE” $”FFFF FFF8 FFFF FFE0 FFFF FF80 FFFF FE00" $”FFFF F800 FFFF E000 FFFF 8000 FFFE” } }; data ‘LZWC’ (0) { /* © 1988 Claris Corp by Dennis Cohen */ $”1CA9 2031 3938 3820 436C 6172 6973 2043" $”6F72 7020 6279 2044 656E 6E69 63" }; Listing: LDecomp.Proj ############################################################# # # Created: Monday, October 10, 1988 6:05:59 PM # # Project Name: LDecomp # Project Type: Application # # Finder Type: APPL # Finder Creator: LZWD # Set Bundle Bit: TRUE # # Project Source Files: # lDecomp.p # LDecomp.r lDecomp.p.o ƒ � lDecomp.p TMLPascal lDecomp.p LDecomp ƒƒ � lDecomp.p.o Link -w -t ‘APPL’ -c ‘LZWD’ � lDecomp.p.o � “{Libraries}”Runtime.o � “{Libraries}”Interface.o � “{TMLPLibraries}”PasLib.o � “{TMLPLibraries}”SANELib.o � -o LDecomp LDecomp ƒƒ � LDecomp.r Rez -append -o LDecomp � LDecomp.r SetFile -a B LDecomp Listing: lDecomp.p {$R-} {$DEFC DEBUG} {$SETC DEBUG=TRUE} PROGRAM LDecomp; { Adaptive LZW decompression } USES MemTypes, QuickDraw, OSIntf, ToolIntf, PackIntf; CONST maxBuff = 8192; {i/o buffer size} tableSize = 16383;{Table size minus 1, 14 bits for 0-based array} noPrev = $7FFF; {First entry in chain} eofChar = -2; {Got to end of input file} endList = -1; {End of chain} empty = -3;{Table entry is unused} clearCode = 256;{Reserved code signalling adaptive reset} maxStack = 4096;{Handles up to 16MB repetition before overflow} TYPE {With some older compilers, you’ll need to break the following into multiple arrays since they won’t allow data structure definitions larger than 32K bytes} StringTableEntry = RECORD prevChar: Integer; followingByte: Integer; next: Integer; used: Boolean; reserved: Boolean; END; StringTableArray = ARRAY [0..tableSize] OF StringTableEntry; {128K structure unless packed} StringTablePtr = ^StringTableArray; IntPtr = ^Integer; Buffer = PACKED ARRAY [1..maxBuff] OF Char; BufPtr = ^Buffer; HeaderRecord = RECORD name: String[31]; dfSize: LongInt; rfSize: LongInt; fndrInfo: FInfo; END; StackType = ARRAY [1..maxStack] OF Integer; StkPtr = ^StackType; Remainder = (none, sixBit, fourBit, twoBit); VAR inRef: Integer; {File reference number of the input file} outRef: Integer;{File reference number of the output file} outVRefNum: Integer;{Volume/WD reference number of output file} eofSignal: Boolean; inBufSize: Integer; {Count of characters in input buffer} inputPos: Integer;{Current position in the input buffer} outputPos: Integer; {Current position in output buffer} bytesRead: LongInt; {Total bytes read from input file} bytesWritten: LongInt; {Total bytes written to output file} bytesInBuffer: LongInt; {Number of bytes read into input buffer at last attempt} inputBuffer: BufPtr;{Where we read the compressed data} outputBuffer: BufPtr; {Where we write the uncompressed data} stringTable: StringTablePtr; {Pointer to memory structure} outfileName: Str255;{Name of file that we’re recreating} tableUsed: Integer; {How many entries currently in string table} inputCode: Integer; {The 14-bit code that we’re working on} carryOver: Remainder; {How many bits are to be prepended to next input byte} doingDFork: Boolean;{Flag to tell which fork of the file we’re decompressing} fsErr: OSErr; {For file system calls} dataForkSize: LongInt; {Size of data fork we will decompress} rsrcForkSize: LongInt; {Size of resource fork we will decompress} progWindow: WindowPtr; {Window for debugging/progress information} boundsRect: Rect; {Rectangle for creating progress window} stackPointer: Integer; {Index into decode stack array} stack: StkPtr; {Pointer into decode stack array} hdrRec: HeaderRecord; {Our header that tells about the file we’re decompressing} PROCEDURE _DataInit; EXTERNAL; {Comment this out for THINK Pascal} PROCEDURE FileAlert(str: Str255); << Same as in Listing: LComp.p >> {$IFC DEBUG} PROCEDURE DebugAlert(l1, l2: LongInt); << Same as in Listing: LComp.p >> {$ENDC} PROCEDURE ShowProgress; << Same as in Listing: LComp.p >> FUNCTION HashIt(prevC, follC: Integer): Integer; {You can come up with much better hash functions, just make sure that both the compression and decompression programs use the same one.} VAR temp, local: LongInt; BEGIN {local := BOR((prevC+follC), $00008000); temp := local * local; local := BAND(BSR(temp, 7), tableSize);} HashIt := BAND(BXOR(BSL(prevC, 5), follC), tableSize); END {HashIt} ; FUNCTION GetHashCode(prevC, follC: Integer): Integer; { Return value is the hash code for <w>c string } VAR index: Integer; index2: Integer; BEGIN index := HashIt(prevC, follC); {If the entry isn’t already used we have a hash code} IF (stringTable^[index].used) THEN BEGIN {Entry already used, skip to end of collision list} WHILE stringTable^[index].next <> endList DO index := stringTable^[index].next; {Begin a linear probe down a bit from last entry in the collision list} index2 := BAND(index + 101, tableSize); {Look for an unused entry using linear probing} WHILE stringTable^[index2].used DO index2 := BAND(Succ(index2), tableSize); {Point the previous end of collision list at this new node} stringTable^[index].next := index2; GetHashCode := index2; END ELSE GetHashCode := index; END {GetHashCode} ; PROCEDURE MakeTableEntry(prevC, follC: Integer); {We could put the conditional test before each call to MakeTableEntry instead of inside the routine} VAR aCode: Integer; BEGIN IF tableUsed <= tableSize THEN BEGIN aCode := GetHashCode(prevC, follC); WITH stringTable^[aCode] DO BEGIN used := true; next := endList; prevChar := prevC; followingByte := follC; END; tableUsed := tableUsed + 1; END; END {MakeTableEntry} ; FUNCTION LookupString(prevC, follC: Integer): Integer; VAR index: Integer; found: Boolean; BEGIN index := HashIt(prevC, follC); LookupString := endList; found := FALSE; {Search list of collision entries for one that matches <w>c } REPEAT IF (stringTable^[index].prevChar = prevC) & (stringTable^[index].followingByte = follC) THEN found := true ELSE index := stringTable^[index].next; UNTIL found OR (index = endList); { Return index if <w>c found, endList otherwise } IF found THEN LookupString := index; END {LookupString} ; PROCEDURE GetByte(VAR c: Integer); { -- Read a character from the input file. Make sure the compiler doesn’t sign -- extend anything. -- Parameter --c output -- Globals affected --inputPos, bytesInBuffer, inputBuffer^ (global because no statics in Pascal) --bytesRead} VAR count: LongInt; error: OSErr; BEGIN inputPos := inputPos + 1; { This will force a read the first time through and every time after that where inputPos has “cycled back” to 0 } IF inputPos > bytesInBuffer THEN BEGIN bytesInBuffer := maxBuff; error := FSRead(inRef, bytesInBuffer, Ptr(inputBuffer)); inputPos := 1; END; IF bytesInBuffer = 0 THEN BEGIN c := eofChar; eofSignal := true; END ELSE BEGIN bytesRead := bytesRead + 1; c := Ord(inputBuffer^[inputPos]); END; END {GetByte} ; PROCEDURE PutByte(c: Integer); VAR count: LongInt; error: OSErr; BEGIN IF outputPos = maxBuff THEN BEGIN count := maxBuff; error := FSWrite(outRef, count, Ptr(outputBuffer)); outputPos := 0; ShowProgress; END; IF doingDFork AND (bytesWritten >= dataForkSize) AND (NOT eofSignal) THEN BEGIN doingDFork := false; dataForkSize := bytesWritten; IF outputPos > 0 THEN BEGIN count := outputPos; error := FSWrite(outRef, count, Ptr(outputBuffer)); END; error := SetEOF(outRef, bytesWritten); outputPos := 0; error := FSClose(outRef); IF rsrcForkSize > 0 THEN BEGIN {only need to open it if we have something to write} error := OpenRF(outfileName, outVRefNum, outRef); IF error <> noErr THEN FileAlert(‘Error opening resource fork’); error := SetFPos(outRef, fsFromStart, 0); END; END; outputPos := outputPos + 1; outputBuffer^[outputPos] := Chr(c); bytesWritten := bytesWritten + 1; END {PutByte} ; PROCEDURE InitStrTable; VAR i: Integer; BEGIN tableUsed := 0; FOR i := 0 TO tableSize DO WITH stringTable^[i] DO BEGIN prevChar := noPrev; followingByte := noPrev; next := -1; used := false; reserved := false; END; {Enter all single ascii characters into the string table} FOR i := 0 TO clearCode DO MakeTableEntry(noPrev, i); END {InitStrTable} ; PROCEDURE Initialize; PROCEDURE InitManagers; BEGIN MaxApplZone; InitGraf(@thePort); InitFonts; FlushEvents(everyEvent, 0); InitWindows; InitMenus; TEInit; InitDialogs(NIL); InitCursor; UnLoadSeg(@_DataInit); {MPW-specific unload, comment out for THINK Pascal} END {InitManagers} ; BEGIN InitManagers; inputBuffer := BufPtr(NewPtr(SizeOf(Buffer))); IF inputBuffer = NIL THEN ExitToShell; outputBuffer := BufPtr(NewPtr(SizeOf(Buffer))); IF outputBuffer = NIL THEN ExitToShell; stringTable := StringTablePtr(NewPtr(SizeOf(StringTableArray))); IF stringTable = NIL THEN ExitToShell; inputPos := 0; outputPos := 0; inBufSize := 0; bytesRead := 0; bytesWritten := 0; bytesInBuffer := 0; doingDFork := true; inputCode := empty; carryOver := none; InitStrTable; END {Initialize} ; PROCEDURE GetTopLeft({using} dlogID: Integer; {returning} VAR where: Point); { -- Return the point where DLOG(dlogID) should have its top-left corner so as to be centered in the area below the menubar of the main screen. The centering is horizontal, vertically it should be one-third of the way. This is achieved by getting the DLOG resource and centering its rectangle within screenBits.bounds after adjusting screenBits.bounds by mBarHeight. } CONST {Probably should use Script Mgr. routine, GetMBarHeight, instead mBarHeight = $0BAA;{Address of global integer containing menu bar height} VAR screenRect, dlogRect:Rect; mBarAdjustment: IntPtr; aDlog: DialogTHndl; BEGIN screenRect := screenBits.bounds; mBarAdjustment := IntPtr(mBarHeight); screenRect.top := screenRect.top + mBarAdjustment^; aDlog := DialogTHndl(GetResource(‘DLOG’, dlogID)); DetachResource(Handle(aDlog)); dlogRect := aDlog^^.boundsRect; WITH screenRect DO BEGIN where.v := ((bottom - top) - (dlogRect.bottom - dlogRect.top)) DIV 3; where.h := ((right - left) - (dlogRect.right - dlogRect.left)) DIV 2; END; END {GetTopLeft}; FUNCTION GetInputFile({returning} VAR refNum: Integer): Boolean; { -- Return false if the user cancels, the request, true otherwise. If a file is selected for compression, open the file and pass back the refnum. The constant getDlgID is from PackIntf. Global side-effects of this routine include the initialization of a number of fields of the hdrRec global and the setting of the inVRefNum global.} CONST allFiles = -1; VAR tl: Point; reply: SFReply; typeList: SFTypeList; anErr, error: OSErr; finderInfo: FInfo; count: LongInt; dtRec: DateTimeRec; BEGIN GetTopLeft(getDlgID, tl); {typeList doesn’t need to be initialized since we’re asking for all files with the -1} SFGetFile(tl, ‘’, NIL, allFiles, typeList, NIL, reply); IF reply.good THEN BEGIN error := FSOpen(reply.fName, reply.vRefnum, refNum); IF error = noErr THEN error := SetFPos(refNum, fsFromStart, 0) ELSE anErr := FSClose(refNum); IF error = noErr THEN BEGIN GetInputFile := true; count := SizeOf(HeaderRecord); error := FSRead(refNum, count, @hdrRec); IF error = noErr THEN BEGIN dataForkSize := hdrRec.dfSize; rsrcForkSize := hdrRec.rfSize; END ELSE BEGIN anErr := FSClose(refNum); GetInputFile := false; END; END ELSE GetInputFile := false; END ELSE GetInputFile := false; END {GetInputFile} ; FUNCTION GetOutputFile({returning} VAR refNum: Integer): Boolean; VAR tl: Point; reply: SFReply; error: OSErr; count: LongInt; BEGIN GetTopLeft(putDlgID, tl); SFPutFile(tl, ‘’, hdrRec.name, NIL, reply); IF reply.good THEN BEGIN outfileName := reply.fName; error := FSOpen(reply.fName, reply.vRefnum, refNum); IF error <> noErr THEN BEGIN {File didn’t already exist, need to create it} error := Create(reply.fName, reply.vRefnum, hdrRec.fndrInfo.fdCreator, hdrRec.fndrInfo.fdType); IF error = noErr THEN IF hdrRec.dfSize > 0 THEN error := FSOpen(reply.fName, reply.vRefnum, refNum) ELSE BEGIN error := OpenRF(reply.fName, reply.vRefNum, refNum); doingDFork := false; END; IF error = noErr THEN error := SetFPos(refNum, fsFromStart, 0); END; IF error = noErr THEN BEGIN GetOutputFile := true; outVRefNum := reply.vRefnum; END ELSE GetOutputFile := false; END ELSE GetOutputFile := false; END {GetOutputFile} ; PROCEDURE Terminate; VAR count: LongInt; BEGIN ShowProgress; IF outputPos > 0 THEN BEGIN count := outputPos; fsErr := FSWrite(outRef, count, Ptr(outputBuffer)); IF fsErr = noErr THEN BEGIN IF doingDFork THEN BEGIN dataForkSize := bytesWritten; fsErr := SetEOF(outRef, dataForkSize); END ELSE IF rsrcForkSize > 0 THEN BEGIN rsrcForkSize := bytesWritten - dataForkSize; fsErr := SetEOF(outRef, rsrcForkSize); END; IF fsErr <> noErr THEN FileAlert(‘SetEOF Error in Terminate’); END ELSE FileAlert(‘Write Error in Terminate’); END; fsErr := FSClose(outRef); fsErr := FlushVol(NIL, outVRefNum); fsErr := FSClose(inRef); END {Terminate} ; PROCEDURE GetCode(VAR hashCode: Integer); VAR localBuf, localBuf2: Integer; BEGIN CASE carryOver OF none: {get two bytes and return 14 ms bits, carry over two least} BEGIN GetByte(localBuf); IF (localBuf = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; GetByte(inputCode); IF (inputCode = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; hashCode := BAND(BSL(localBuf, 6), $3FC0) + BAND(BSR(inputCode, 2), $003F); inputCode := BAND(inputCode, $0003); carryOver := twoBit; END; twoBit:{have two bits, get two bytes, return 14 ms bits, save 4 ls bits} BEGIN GetByte(localBuf); IF (localBuf = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; GetByte(localBuf2); IF (localBuf2 = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; hashCode := BAND(BSL(inputCode, 12), $3000) + BAND(BSL(localBuf, 4), $0FF0) + BAND(BSR(localBuf2, 4), $000F); inputCode := BAND(localBuf2, $000F); carryOver := fourBit; END; fourBit: {Have four bits, get two bytes, return 14 ms bits, save 6 ls bits} BEGIN GetByte(localBuf); IF (localBuf = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; GetByte(localBuf2); IF (localBuf2 = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; hashCode := BAND(BSL(inputCode, 10), $3C00) + BAND(BSL(localBuf, 2), $03FC) + BAND(BSR(localBuf2, 6), $0003); inputCode := BAND(localBuf2, $003F); carryOver := sixBit; END; sixBit:{have six bits, get a byte, return the 14 bits, carry nothing} BEGIN GetByte(localBuf); IF (localBuf = eofChar) THEN BEGIN hashCode := eofChar; Exit(GetCode); END; hashCode := BAND(BSL(inputCode, 8), $3F00) + BAND(localBuf, $00FF); inputCode := empty; carryOver := none; END; END; END {GetCode} ; PROCEDURE Push(c: Integer); BEGIN stackPointer := stackPointer + 1; stack^[stackPointer] := c; IF (stackPointer >= maxStack) THEN BEGIN {If this happens, you’ve typed something in wrong -- would take a degenerate case of over 16MB in size to do so otherwise} FileAlert(‘***STACK OVERFLOW***’); END; END {Push} ; PROCEDURE Pop(VAR c: Integer); BEGIN IF stackPointer > 0 THEN BEGIN c := stack^[stackPointer]; stackPointer := stackPointer - 1; END ELSE c := empty; END {Pop} ; PROCEDURE DoDecompression; VAR c: Integer; code: Integer; oldCode: Integer; finalByte: Integer; inCode: Integer; lastChar: Integer; unknown: Boolean; tempC: Integer; resetCode: Integer; anEvent: EventRecord; BEGIN {Initialize things and “prime the pump”} stackPointer := 0; stack := StkPtr(NewPtr(SizeOf(StackType))); unknown := false; {First string is always known as it is a single char} resetCode := LookupString(noPrev, clearCode); GetCode(oldCode); code := oldCode; c := stringTable^[code].followingByte; PutByte(c); finalByte := c; {Now, we get down to work} GetCode(inCode); WHILE inCode <> eofChar DO BEGIN code := inCode; IF (NOT stringTable^[code].used) THEN BEGIN lastChar := finalByte; code := oldCode; unknown := true; END; { Run through code extracting single bytes until no more bytes can be removed. Push these onto the stack. They will be entered in reverse order and will come out in proper order when popped. } WHILE (stringTable^[code].prevChar <> noPrev) DO WITH stringTable^[code] DO BEGIN Push(followingByte); code := prevChar; END; { We now have the first byte in the string. } finalByte := stringTable^[code].followingByte; PutByte(finalByte); { Now pop everything off the stack } Pop(tempC); WHILE tempC <> empty DO BEGIN PutByte(tempC); Pop(tempC); END; { If the code isn’t known, then output the follower byte of the last byte in the string. } IF unknown THEN BEGIN finalByte := lastChar; PutByte(finalByte); unknown := false; END; IF GetNextEvent(everyEvent, anEvent) THEN ; MakeTableEntry(oldCode, finalByte); oldCode := inCode; GetCode(inCode); IF (inCode = resetCode) THEN BEGIN {Compression ratio dropped, time to build a new table} InitStrTable; GetCode(oldCode); c := stringTable^[oldCode].followingByte; PutByte(c); finalByte := c; GetCode(inCode); END; END; END {DoDecompression} ; BEGIN Initialize; IF GetInputFile(inRef) THEN IF GetOutputFile(outRef) THEN BEGIN SetRect(boundsRect, 100, 50, 250, 100); progWindow := NewWindow(NIL, boundsRect, ‘Bytes Read’, true, noGrowDocProc, Pointer(-1), false, 0); DoDecompression; Terminate; {$IFC DEBUG} DebugAlert(bytesRead, bytesWritten); {$ENDC} END; END. Listing: LDecomp.r #include “Types.r” #include “SysTypes.r” resource ‘ALRT’ (1111, “FileSytem Alert”, preload, nonpurgeable) { {100, 100, 250, 400}, 1111, { OK, visible, silent; OK, visible, silent; OK, visible, silent; OK, visible, silent } }; resource ‘DITL’ (1111, preload, nonpurgeable) { { /*1*/ {115, 75, 135, 135}, button { enabled, “OK” }; /*2*/ {30, 80, 60, 290}, StaticText { disabled, “FileSystem Error: ^0” } } }; resource ‘ALRT’ (1112, “Debugging Alert”, preload, nonpurgeable) { {100, 100, 250, 400}, 1112, { OK, visible, silent; OK, visible, silent; OK, visible, silent; OK, visible, silent } }; resource ‘DITL’ (1112, preload, nonpurgeable) { { /*1*/ {115, 75, 135, 135}, button { enabled, “OK” }; /*2*/ {30, 80, 45, 290}, StaticText { disabled, “^0” }; /*3*/ {46, 80, 61, 290}, StaticText { disabled, “^1” } } }; resource ‘BNDL’ (1001) { ‘LZWD’, 0, { /* array TypeArray: 2 elements */ /* [1] */ ‘ICN#’, { /* array IDArray: 1 element */ /* [1] */ 0, 1001 }, /* [2] */ ‘FREF’, { /* array IDArray: 1 element */ /* [1] */ 0, 1001 } } }; resource ‘FREF’ (1001) { ‘APPL’, 0, “” }; resource ‘ICN#’ (1001, “LZWD APPL”, purgeable) { { /* array: 2 elements */ /* [1] */ $”0000 0000 0000 0000 0000 0000 0000 0300" $”0000 0C80 0000 3040 0000 C820 0003 0410" $”000C 8208 0030 4104 00C8 208E 0104 105A” $”0282 0862 0441 05A2 0820 8622 1010 5A22" $”2008 6222 4005 A222 FFFE 2222 9292 2226" $”9292 2238 9292 2260 9292 2380 9292 2600" $”9292 3800 9292 6000 9293 8000 FFFE”, /* [2] */ $”0000 0000 0000 0000 0000 0000 0000 0300" $”0000 0F80 0000 3FC0 0000 FFE0 0003 FFF0" $”000F FFF8 003F FFFC 00FF FFFE 01FF FFFE” $”03FF FFFE 07FF FFFE 0FFF FFFE 1FFF FFFE” $”3FFF FFFE 7FFF FFFE FFFF FFFE FFFF FFFE” $”FFFF FFF8 FFFF FFE0 FFFF FF80 FFFF FE00" $”FFFF F800 FFFF E000 FFFF 8000 FFFE” } }; data ‘LZWD’ (0) { $”12A9 2031 3938 3820 436C 6172 6973 2043" /* .© 1988 Claris C */ $”6F72 70" /* orp */ }; |
|