Wireshark-dev: Re: [Wireshark-dev] How to skip unrecognizable packets in saved pcap files

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Mon, 19 Sep 2011 00:17:36 -0700
On Sep 18, 2011, at 9:22 PM, Ye Deng wrote:

> I have a serious issue when using libpcap functions to process pcap files.
> The error happens when I use pcap_next_ex() function to get packets from saved pcap files one-by-one. The pcap_next_ex() terminates processing, and returns an error saying, "bogus savefile header". 
> 
> Therefore I may want to know: how to skip the unrecognizable packets, and let libpcap functions to process the resting valid packets? I really prefer to use some *existing* modules/tools to do the job. 
> I tried "mergecap" and "editcap", and found they cannot skip the unrecognizable packets. Are there some "improved mergecap/editcap" can do the job, and produce pcap files without any unrecognizable packet?

None that I know of.  The program would have to use something other than either libpcap or Wireshark's Wiretap code to read the capture file, because (as you've discovered) both of them regard packets with a size bigger than 65535 as invalid.

> After I did some researches online, I found the "unrecognizable packets" may be generated by file transfers using HTTP/FTP in some text mode.
> Please search "corrupt" on this webpage below.
> http://www.winpcap.org/ntar/draft/PCAP-DumpFileFormat.html
> Therefore, I think the pcap-next-generation-dump-file can deal with this issue.

Yes, it can deal with this issue.

It deals with it by having a field in the file that will be changed if you transfer a file in text mode between systems with different line ending conventions (for example, between Windows and UN*X) and by treating a file with a wrong value in that field as being damaged.

It does *NOT* deal with it by doing anything more than that.  In particular, it does *NOT*, and cannot, magically undo the damage done to the file by transferring it in text mode.

> But I tried "pcap-ng" in Wireshark, and got an assertion failure during every capturing test, which shows that the "pcap-ng" related functions are still unfinished...

No, that shows that there's a bug somewhere.  What was the assertion failure?  We'd like to fix the bug, but we'd need to know the assertion failure.

However, even if we fix that bug, and any other bugs you run into:

	1) it will not magically be able to read pcap-ng files that have been damaged by being transferred in text mode;

	2) even if it could (which it can't, as there's no way for it to figure out where, in the file, the pair of bytes 0x0d 0x0a was turned into the single byte 0x0a, or the single byte 0x0a was turned into the pair of bytes 0x0d 0x0a - the first of those would happen if a file were transferred in text mode from Windows to UN*X, the second of those would happen if a file were transferred in text mode from UN*X to Windows), it wouldn't help you, because your file is in pcap format, not pcap-ng format.

> Also, I read the source code of libpcap, that error happens when length of captured packet is considered too big.
> In "/libpcap-1.1.1/sf-pcap.c"
> In this function below:
> static int pcap_next_packet(pcap_t *p, struct pcap_pkthdr *hdr, u_char **data)
> { 
> ... ...
> if (hdr->caplen > 65535) 
> { snprintf(p->errbuf, PCAP_ERRBUF_SIZE,"bogus savefile header");
> return (-1); }
> ... ... 
> }
> 
> Basing on the pcap file format:  http://wiki.wireshark.org/Development/LibpcapFileFormat
> I think it is possible to do a "magic number searching" when the if() above is true. The bytes holding that "magic number" can be considered as the beginning of next valid packet.
> Notice that every valid packet has a timestamp in packet header. 
> typedef struct pcaprec_hdr_s {
> guint32 ts_sec; /* timestamp seconds */
> guint32 ts_usec; /* timestamp microseconds */
> guint32 incl_len; /* number of octets of packet saved in file */
> guint32 orig_len; /* actual length of packet */
> } pcaprec_hdr_t;
> If we know the range of the capturing time, we can use some bytes in "pcaprec_hdr_s.ts_sec" as the "magic number".

There is no guarantee that

	1) the packets in the file before the first packet with a too-large captured length have not had their data damaged by transferring the file in text mode, so just because they're "valid" in that the captured length isn't > 65535 that doesn't mean they're "valid" in the sense that the data actually reflects what was captured;

	2) the same applies to packets after the ones you've skipped.