Wireshark-bugs: [Wireshark-bugs] [Bug 8563] Support reading bzip2, lzma, and 7zip compressed pca

Date: Sat, 25 May 2013 21:16:09 +0000

changed bug 8563

What Removed Added
CC   [email protected]

Comment # 2 on bug 8563 from
(In reply to comment #1)
> For bzip2, there's libbzip2:
> 
>     http://www.bzip.org
> 
> The page at
> 
>     http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#limits
> 
> says
> 
>     Further ahead, it would be nice to be able to do random access into
> files. This will require some careful design of compressed file formats.
> 
> although
> 
>     http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html#recovering
> 
> says
> 
>     bzip2 compresses files in blocks, usually 900kbytes long. Each block is
> handled independently.
> 
> which already puts it ahead of gzip format in its ability to conveniently
> support random access (there aren't convenient block boundaries of that sort
> in gzip files, so we have to checkpoint the decompression dictionary).

Yeah, shame that original bzip2 author don't support it.
We can either implement our own heuristic to search for block boundaries
(something what bzip2recover does), or we need to find other library.

Anyway, it can still be slow, according to wikipedia: ,,Because of the
first-stage RLE compression (see above), the maximum length of plaintext that a
single 900 kB bzip2 block can contain is around 46 MB ''

In typical pcap files RLE probably won't work so great, but testing it on some
random pcap file shows that 900 kB block expands to 2-6MB of data.

in gzip we're doing fast seek point every 1 MB, and gzip is much faster that
bzip2...

>     http://tukaani.org/xz/format.html
> 
> says
> 
>     Random-access reading: The data can be split into independently
> compressed blocks. Every .xz file contains an index of the blocks, which
> makes limited random-access reading possible when the block size is small
> enough.
> 
> which is very very very very very very very very very very very very very
> very very good news for xz support

Not really, cause blocks in xz are almost the same as concatenated multiple
gzip files, so it won't compress as good as one-block xz file.

I think that standard xz-utils doesn't create such files by default,
and options to create file with multiple blocks are quite new:
 --block-size since 5.1.1alpha (2011-04-12)
 --block-list since 5.1.2alpha (2012-07-04)

(ref: http://git.tukaani.org/?p=xz.git;a=blob;f=NEWS;hb=HEAD)

of course we can have full support for xz files, including compression, where
we could create lots of blocks to support random access nicely, the question is
what are good values?

Also what we should do it we hit xz-file with one block, warn user about that?
Can anyone propose good message?


You are receiving this mail because:
  • You are watching all bug changes.