Wireshark-dev: Re: [Wireshark-dev] Some questions about the "option block" interface in libwire

From: Anthony Coddington <Anthony.Coddington@xxxxxxxxxx>
Date: Thu, 19 May 2016 00:05:11 +0000
Hi,

Having worked with the wtap_optionblock generic option block system beyond refactoring existing code, I have some thoughts on the current design that may be relevant. I have numbered paragraphs from 10 so we don't get confused.

 Michael Mann <mmann78@xxxxxxxxxxxx> wrote:
 > On May 15, 2016, at 6:40 PM, Guy Harris <guy@xxxxxxxxxxxx> wrote:
 > > 3) What mechanisms are available for handling block/record types, or options, not currently supported by pcapng, but that might be provided by other file types? Hadriel Kaplan suggested getting a Private Enterprise Number (PEN) for wireshark.org, and using custom blocks and options for this purpose; have we gotten a PEN for wireshark.org yet?
 > The current interface may need another layer of refactoring but the idea was that you would have a block and just call wtap_optionblock_add_option to add your options.  For "defined but not yet used" options of the standard blocks that turn into "now used" options, the wtap_optionblock_add_option would be done within the creation of that standard block.  The creation of "standard blocks" and their option types is where we may need another layer of registration and refactoring. Copying/reading/writing on any/all blocks is supposed to happen within the "option block" functionality (where default behavior would be "inclusive" in copying blocks).  I wasn't sure where "merging" would go and if that would be part of the "option block" functionality (at least for "provided" blocks/options) or left to external tools (like mergecap and editcap).

 10) Generic versus filetype-specific block types: I think the current implementation is too specific to PCAP-NG blocks. This makes sense for highly specialized blocks but common metadata that can be (and are) supported across formats used by Wireshark like interface information (IDB/ISB), capture information (SHB) and potentially things like name resolution blocks, should be as agnostic as possible. The same block type could then be extended with new option types for format-specific information. If that is via a Wireshark PEN and custom option that is fine, but it shouldn't be too cumbersome to use compared to native PCAP-NG options for simple types. In particular I don't think the mandatory struct makes much sense in most cases. Other formats should not be required to provide any options not directly required by Wireshark, regardless of whether they are part of a fixed header in PCAP-NG. Examples of this could include IDB snap_len and SHB section_length. Ideally Wireshark-wide timestamp formats would also be used or at least included.

 11) Self-writing/serializing block types: For the same reason I really don't think self-writing of PCAP-NG blocks belongs in wtap_opttypes.c (as relatively recently added in https://code.wireshark.org/review/#/c/14357/). I think the optionblock should act more like a well-known list of generic and filetype specific options that fit into that metadata category. Perhaps formats could register a write function each for the block type, but I personally don't see much benefit in this. The format might as well treat it similarly to foreign encap types in its dump function, where the wiretap determines how the information should be converted or written out for that format.

 12) Multiple options: As Guy Harris has mentioned, it should be possible to add multiple options of a given type such as (but not limited to!) multiple comments. Personally I think a list-based approach works well here (possibly also indexed by a hash table or something like a GSequence) as it also preserves ordering which may be important for some formats. Perhaps an append() vs add() (or set()) distinction could be made, for replacing an existing value rather than adding another. On the display side, multiple comments for a packet may even come from multiple blocks.

 13) Checking for and merging options: There seems to be some conflation between adding options to blocks and registering "standard" option types to block types. There doesn't appear to be a way to determine whether an option was already present in the record or simply has a default value, as all options are 'added' up front. As far as I can tell wtap_optionblock_copy_options() will always take the 'existing option' path, even when that option was not explicitly set before. This effectively means all options get overridden rather than a merge. Personally I think some kind of merge behaviour is useful, as it also allows partial updates of metadata while reading or editing the file. Merging of individual options could probably be limited to simple replacing (or removing) the value of an existing option though and possibly should be handled by the merger. This is the one place where per-format optionblock types and a merge callback could be useful, but personally I'm not sure the loss of other genericness is worth it. A hybrid approach might work where a merge callback of the reading wiretap is called and generic options used, possibly combined with 15) below. I'm undecided on whether an automatic merge of the union of options together when merging (which I am assuming what is meant by Michael Mann by "inclusive" copying) is a good idea. In many cases it probably makes sense, but in some cases it may be contradictory in a way that is only determinable by understanding the native format. Similar issues come up when deciding whether to merge IDBs, what happens here currently? Perhaps a dedicated flag similar to PCAP-NG custom option copy/don't copy behaviour could be useful.

 14) String options: I think the current behaviour of duplicating string options with strdup() does more harm than good. As there is no variant that takes a string length, almost every single current string option is copied to a temporary variable, copied again by wtap_optionblock_set_string() and then immediately freed by the caller. This even applies to options that simply need to be copied directly from a non-null-terminated TLV field out of the record header. Formatting is also not supported but this is a less common case. A contract on how memory passed to the function is allocated like the old behaviour might make more sense, or at least make wtap_optionblock_set_string() take a string length which is good for safety anyway.

 15) Custom options: I think adopting a custom option system where the user can specify a native binary part (e.g. for saving the file in the same format after modification) and a string/integer part (or callback to the original wiretap to get this) that can be used for display or conversion makes a lot of sense. I thought this was how PCAP-NG worked, but it appears to be an either/or thing there.

 16) wtap_optionblock_t being an opaque pointer rather than a struct is rather confusing and inconsistent with most of the rest of Wireshark and even within the optionblock code itself. It looks nice when used with g_array but a GPtrArray could potentially be used in its place and the naming is confusing. Alternatively the struct could be exposed but marked private like most of wtap and epan.

 Guy Harris <guy@xxxxxxxxxxxx> wrote:
 > 4) The existence of wtap_file_get_shb() seems to imply that a file has *a* Section Header Block, but a pcapng file could have multiple SHBs; we don't currently support that, but we should be prepared to do so in the future.
 >
 > A file can also have multiple Name Resolution Blocks as well; as the pcapng specification says:
 ...
 > so we should not have routines that assume a single NRB. Perhaps the routines in question should take an array of NRBs - combining the NRBs into a single table would lose information about which names were resolved by which name servers.

 17) Some option blocks, such as interface statistics blocks, name resolution or even interface configuration may also only apply to a particular time/frame range. It would be nice to support this generically somehow, but this may require more thought.

 Guy Harris <guy@xxxxxxxxxxxx> wrote:
 > For example, we currently don't handle packet metadata very cleanly.  We have a bunch of WTAP_ENCAP_ values that correspond to a regular encapsulation plus file-type-specific metadata.
 ...
 > In addition, the metadata that doesn't come directly from the packet data should probably be put into a Buffer rather than a union as is done now; that way we don't have a hard upper limit on the amount of metadata (the ERF handler imposes a limit, about which Anthony Coddington of Endace has commented).
 > 
 > And I think there are still some issues with mergecap that would require better handling of IDBs...
 > 
 > ...and we need to think about whether we're correctly handling all of:
 >
 >       files that have a per-file link-layer header type (most file formats, including pcap);
 >
 >       files that have per-interface link-layer header types (such as pcapng);
 >
 >       files that have per-packet link-layer header types.

 18) How this all fits in with REC_TYPE_FT_SPECIFIC_ records needs consideration, currently it is necessary to dissect everything twice to also display in the epan tree. One option would be to allow epan to do things like add comments and update other metadata while doing a safely bounded parse. I'm not sure how this interacts with simpler tools like mergecap though? Do these use libwiretap on its own? On the subject of buffers and unions it is also not possible to use a ft_specfic_record_phdr together with another phdr type, and PCAP-NG assumes it is always a PCAP-NG block type code. It may also make sense to have filetype specific not-on-wire pseudo header and trailer buffers or more carefully distinguishing record versus captured lengths (currently it is not possible to dissect data not accounted for and counted as 'captured' bytes), but that is probably a separate discussion.

 I think the general idea makes a lot of sense and can improve the flexibility of Wireshark, and prefer it to the previously proposed idea of synthesizing special record types (non-synthesized special record types make sense though). I just think the API needs some tweaks to make it more generic.

Thanks,
Anthony Coddington

Software Engineer – CTO Office
Endace

 References:
https://code.wireshark.org/review/#/c/13667/
https://code.wireshark.org/review/#/c/14300/
https://code.wireshark.org/review/#/c/14357/
 Other earlier discussions:
https://wiki.wireshark.org/WiretapPcapng
https://wiki.wireshark.org/Development/PcapngCustom
https://code.wireshark.org/review/#/c/9729/