Wireshark-dev: Re: [Wireshark-dev] Enhanced PCAP-NG dissection

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Thu, 18 Apr 2013 15:06:34 -0700
On Apr 18, 2013, at 1:01 PM, Brandon Carpenter <hashstat@xxxxxxxx> wrote:

> On 04/17/2013 4:22 PM, Guy Harris wrote:
>> I'm not talking about saving/exporting from Wireshark (or "-r" and "-w" from
>> 
>> I'm talking about using *editcap*, which includes no dissectors and should not include any dissectors, to do that form of transformation.
> Yes, sorry.  I was unfamiliar with editcap (and just educated myself).  I now see the problem.  And I was wrong in my response anyway.  My change passes the whole PCAP-NG block as if it were the packet data which is something that would cause conversions with editcap to fail miserably.
> 
> And I agree with everything else you said, too (well, mostly anyway).
> 
> So what if we allow wiretap readers the ability to pass on a list of buffers, each with a type.  Then dissectors and writers can "look" through the list and use only what it is able and ignore items it doesn't understand or does not want to process.  So pcapng_read() could return something like the following (using Pythonic syntax for lists and tuples):
> 
>    1. [(PCAPNG_BLOCK, (SHB, header data))]
>    2. [(PCAPNG_BLOCK, (IDB, interface data))]
>    3. [(PCAPNG_BLOCK, (NRB, name options)), (NAME, (ip address, names, ...))]
>    4. [(PCAPNG_BLOCK, (EPB, packet options)), (FRAME, (wtap_pkthdr, packet data))]
>    5. [(PCAPNG_BLOCK, (EPB, packet options)), (FRAME, (wtap_pkthdr, packet data))]
>    6. [(PCAPNG_BLOCK, (IDB, interface data))]
>    7. [(PCAPNG_BLOCK, (EPB, packet options)), (FRAME, (wtap_pkthdr, packet data))]

I'm assuming the tokens with characters from [A-Z_] are atoms used as tags.

My idea was that what would be returned from wtap_read() would be:

	(block type, information)

where the block types were assigned by Wiretap, e.g. "section header" (which would be a file header for file formats not supporting multiple sections), "interface list" (only 1 interface per list with pcap-ng, possibly multiple interfaces for NetMon), "name resolution information", "packet", etc.

One block type is "file-type-specific", which is the "escape hatch" for all blocks that don't fit into the libwiretap abstraction; the abstraction can be extended over time, but so can the set of block types in capture file formats that support multiple block types, so the abstraction always runs the risk of running behind, and some block types might be *so* specific to a particular file format that there's no point in extending the abstraction to include them.  (The point of the abstraction is to cover items that are supported in more than one capture file type, so that non-file-type-specific blocks can be used when writing out a file in a different format, and so that code running atop libwiretap need only, for example, handle "packets", not "pcap packets" and "pcap-ng PBs" and "pcap-ng EPBs" and "pcap-ng SPBs" and "NetMon packets" and "DOS Sniffer packets" and....)

The "information" would be block-type specific.

For non-file-type-specific blocks, it would be something such as:

	some fixed information that must be present for all blocks of that type;

	a collection of optional information, with each such item tagged, with one tag being "file-type-specific" - the data for items with that tag starts with whatever tag the file format uses, e.g. the option code for pcap-ng.

For example, a packet block would have as required items the packet length and the packet data, and everything else, including relative or absolute time stamps, "captured length", and comments, as optional data.  

For file-type-specific blocks, they would include whatever identifier tag is used in the particular capture file format, e.g. the block type code if it's a pcap-ng file, followed by the length of the raw contents of the block, followed by the raw contents of the block (with some wrapper information removed, e.g. for pcap-ng it wouldn't include the block total length fields; that's implied by the length of the raw contents of the block, i.e. add 12 for the two length fields and the type field to the length of the raw contents).

> In libwireshark, the dissector would store comments from the first item, a section header block, but would not display it in the packet list.

BTW, I may have changed my mind about displaying SHBs in the block list (former "packet list"); if you've concatenated multiple captures, it might be useful to know when a new capture begins (complete with showing the comments for the section, if any).  It could conceivably, for other file formats, show whatever summary information is in the file header.

> Item 2, an interface descriptor block, might append the interface data to a separate interface list and also not add anything to the packet list.

Those might also be useful to show in the packet list, especially if additional interfaces show up in the middle of a capture.

> Item 3, a name resolution block, would provide the name resolution, which could be added to the names list while also ignoring the packet list.

That's one that doesn't represent an event in the capture process (unlike, for example:

	the SHB, which could be viewed as the start of a capture process;

	the IDB, which could be viewed as the addition of that interface to the capture process;

	the ISB, which could be viewed as a sample of the statistics at that point in the capture process)

so I wouldn't put that one in the packet list.

> An expert dissector could be enabled to also show the PCAP-NG blocks in the packet listing, along with detailed dissection (a great tool for learning PCAP-NG or for exploring new block types and options).

It's called "Fileshark":

	http://wiki.wireshark.org/GSoC2013#Fileshark

> When the data is transformed to another format, as with editcap, unknown items can be ignored.

Yup.  File-type-specific blocks and data would be ignored by all libwiretap writing modules except the module for that file type.

When *reading* a capture file, the actual file format type is implicitly that of the file you happen to be reading, although if it's more convenient, we could supply that as part of the block information when reading.  (The file type affects the interpretation of the file-type-specific block type, for "file-type-specific" blocks, and of all the file-type-specific options in the block.)

When *writing* a capture file, the file type of the file from which the block being written came would either have to be supplied as an argument to the "write block" call or carried in the block information.

My inclination might be to include the file type in the block information, so that you don't have to remember to pass the *correct* file type when writing.  (You have to fill it in if what you're writing doesn't come from a block you've read from libwiretap, but that's just the difference between "fill in a structure member" and "pass it as an argument".)

I should probably look at other file formats to see what use they could make if this, if any.