On Jul 12, 2009, at 3:15 PM, Anders Broman wrote:
(That doesn't say this is the wrong thing to do - I've been
advocating
this for a while, and made a version of the GTK 1.2[.x] GtkCList with
"dynamic" column data and prototyped the same thing - it says we need
to make random access to gzipped files faster.)
Did you say at Sharkfest that bzipped files might be better suited
for that?
If so perhaps we should go for bzip in stead of gzip?
In bzip2 format, the stream is a sequence of blocks, and, at least as
I understand it, each block can be decompressed independently, so
seeking to a particular offset in the decompressed version of a
bzip2'ed stream involves seeking to the beginning of the block
containing the data at that offset, decompressing the block, and then
moving to the right offset within the decompressed data. The default,
and maximum, block size is 900K.
In gzip/zlib format, the stream is, as far as I know, a sequence of
blocks, but the blocks can't be decompressed independently; the
dictionary doesn't get reset with each block. That means that you'd
either need to decompress the entire file, or save the state of the
dictionary periodically, or something such as that to make random
access fast.
So bzip2 format would be better as a "native" format; unfortunately,
there are gzipped files already out there, and the native compressed
format of the Windows sniffer appears to be a gzipped version of the
file format (except for the file header), so the gunzipping code could
still be useful.