To appear
Previous: 5.3 Said and Pearlman
One objection to compressing data is that using any part of the file
requires uncompressing the whole file. We could partition the file into
pieces and compress them separately, but then the total size of all the
compressed pieces might be larger than the compressed single file. To test
this, we
split adir1024 into four
pieces. Table t:part shows that
quartering the file has little effect, but sometimes slightly improves the compression.
While this might not always obtain, it suggests that splitting the file
before compressing should not be disastrous.
This table also suggests how variable compression can be. With either method the largest compressed quarter was triple the size of the smallest. This could cause a problem in practice since a program handling compressed files would either have to dynamically allocate storage according to each file's size, or else always have enough storage reserved for the rare worst case.
Table 6: Effect of Quartering Adir1024 Before Compressing
What if we split adir1024 into 16 pieces of
and compress them separately? Gzip gives files ranging from 12,684
to 81,199, for a total size of 731,067 bytes, still smaller than the compressed
complete file.
The sp_compressed files range from 6,386 to 27,098 bytes, for a
total of 251,042 bytes, or only 1.7% larger than the original compressed
file.
What about partitioning adir1024 into 64 blocks of
? The gzipped files range from 357 to 21,739 bytes,
totaling 747,438 bytes. With sp_compress, the files range from
279 to 7,455 bytes, totaling 258,489 bytes.
What about 256 blocks of
? Gzip produced files from 45 to 5,827 bytes,
totaling 771,455 bytes. sp_compress crashed while compressing
three of the files, which were almost all zeros, so we used
progcode for them. The 256 files ranged from 27 to 2,217 bytes,
totaling 279,060 bytes. Table t:blocks summarizes these results,
which may be stated briefly thus: Compression is useful even when
we wish to use only a portion of the file.
Table 7: Effect of Partitioning Adir1024 Into Blocks Before Compressing
Figure 4: The 24 Sample USGS DEMs
Table 8: Gzip, Progcode, and Sp_compress Compressing 24 Random USGS DEMs
To appear
Previous: 5.3 Said and Pearlman