On Thu, Jun 27, 2019 at 7:17 AM Guy Harris <guy@xxxxxxxxxxxx> wrote:
>
> On Jun 26, 2019, at 2:03 PM, Jaap Keuter <jaap.keuter@xxxxxxxxx> wrote:
>
> > On 26 Jun 2019, at 19:41, Guy Harris <guy@xxxxxxxxxxxx> wrote:
> >
> >> It could probably be done (note that for decompressing capture files that would require the ability to do random access I/O,
> >
> > It (http://sourceware.org/bzip2/manual/manual.html#limits) now says: "Further ahead, it would be nice to be able to do random access into files. This will require some careful design of compressed file formats."
>
> gzip format wasn't carefully designed for that, either, but it can be - and has been - made to work. It requires storing dictionary state.
Yepp. BGZIP and its library you can link with does this. I even built
a fuse filesystem to transparently "unzip" these kind of files.
What BGZIP does is that it will restart a new dictionary every ~64k
bytes and also stores an index in a separate file.
The bgzip file itself is compatible with gzip so you can uncompress it
using vanilla gzip
but in order to do random reads/seek in the file you need the index file.
It works, quite well.
The problem I found is that when you restart with a new dictionary
every ~64kb there is not much for the compression engine to work with
so compression ratio is usually (in my cases) quite poor compared to
normal gzip.