Wireshark-dev: [Wireshark-dev] Re: Discussion: Untangling the situation with the Darwin process

From: Guy Harris <gharris@xxxxxxxxx>
Date: Fri, 25 Apr 2025 12:49:29 -0700
On Apr 24, 2025, at 5:37 PM, Omer Shapira via Wireshark-dev <wireshark-dev@xxxxxxxxxxxxx> wrote:

> On Apr 24, 2025, at 4:29 PM, Guy Harris <gharris@xxxxxxxxx> wrote:
> 
>> On Apr 24, 2025, at 2:56 PM, Omer Shapira via Wireshark-dev <wireshark-dev@xxxxxxxxxxxxx> wrote:
> 
> 
>> 3) means either "have dissection code for the metadata blocks, *and* have a way for the dissection of packets associated with a given process include the process metadata" or "have some way for the Wireshark packet filter language specify fields from blocks pointed to by the packet block (which would also allow filtering on Interface Description Block fields).
> 
> What I have in mind is the second: allow the engineers to do stuff like
> a. $ tshark -r  file.pcapng -T fields -e darwin.process_id -e darwin.interface 
> b. $ tshark -r file.pcang -Y ’tcp.port == 6040 && darwin.flags.wake_pkt’ 
> c. Same in Wireshark

To quote a comment from Wireshark's emacs epan/dissectors/file-pcapng-darwin.c file (which dissects Process Event Blocks if you're using Wireshark as "Fileshark" on a pcapng file that contains Process Event Blocks; there is currently no code to handle Process Event Blocks if you're reading a capture file to see the packets rather than to see the file's structure):

/*
 * Apple's Pcapng Darwin Process Event Block
 *
 *    A Darwin Process Event Block (DPEB) is an Apple defined container
 *    for information describing a Darwin process.
 *
 *    Tools that write / read the capture file associate an incrementing
 *    32-bit number (starting from '0') to each Darwin Process Event Block,
 *    called the DPEB ID for the process in question.  This number is
 *    unique within each Section and identifies a specific DPEB; a DPEB ID
 *    is only unique inside the current section. Two Sections can have different
 *    processes identified by the same DPEB ID values.  DPEB ID are referenced
 *    by Enhanced Packet Blocks that include options to indicate the Darwin
 *    process to which the EPB refers.
 *
 *
 *         0                   1                   2                   3
 *         0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 *         +---------------------------------------------------------------+
 *       0 |                   Block Type = 0x80000001                     |
 *         +---------------------------------------------------------------+
 *       4 |                     Block Total Length                        |
 *         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *       8 |                          Process ID                           |
 *         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *      12 /                                                               /
 *         /                      Options (variable)                       /
 *         /                                                               /
 *         +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 *         |                     Block Total Length                        |
 *         +---------------------------------------------------------------+
 *
 *                   Figure XXX.1: Darwin Process Event Block
 *
 *    The meaning of the fields are:
 *
 *    o  Block Type: The block type of a Darwin Process Event Block is 2147483649.
 *
 *       Note: This specific block type number falls into the range defined
 *       for "local use" but has in fact been available publicly since Darwin
 *       13.0 for pcapng files generated by Apple's tcpdump when using the PKTAP
 *       enhanced interface.
 *
 *    o  Block Total Length: Total size of this block, as described in
 *       Pcapng Section 3.1 (General Block Structure).
 *
 *    o  Process ID: The process ID (PID) of the process.
 *
 *       Note: It is not known if this field is officially defined as a 32 bits
 *       (4 octets) or something smaller since Darwin PIDs currently appear to
 *       be limited to maximum value of 100000.
 *
 *    o  Options: A list of options (formatted according to the rules defined
 *       in Section 3.5) can be present.
 *
 *    In addition to the options defined in Section 3.5, the following
 *    Apple defined Darwin options are valid within this block:
 *
 *           +------------------+------+----------+-------------------+
 *           | Name             | Code | Length   | Multiple allowed? |
 *           +------------------+------+----------+-------------------+
 *           | darwin_proc_name | 2    | variable | no                |
 *           | darwin_proc_uuid | 4    | 16       | no                |
 *           +------------------+------+----------+-------------------+
 *
 *              Table XXX.1: Darwin Process Description Block Options
 *
 *    darwin_proc_name:
 *            The darwin_proc_name option is a UTF-8 string containing the
 *            name of a process producing or consuming an EPB.
 *
 *            Examples: "mDNSResponder", "GoogleSoftwareU".
 *
 *            Note: It appears that Apple's tcpdump currently truncates process
 *            names to a maximum of 15 octets followed by a NUL character.
 *            Multi-byte UTF-8 sequences in process names might be truncated
 *            resulting in an invalid final UTF-8 character.
 *
 *            This is probably because the process name comes from the
 *            p_comm field in a proc structure in the kernel; that field
 *            is MAXCOMLEN+1 bytes long, with the +1 being for the NUL
 *            terminator.  That would give 16 characters, but the
 *            proc_info kernel interface has a structure with a
 *            process name field of only MAXCOMLEN bytes.
 *
 *            This all ultimately dates back to the "kernel accounting"
 *            mechanism that appeared in V7 UNIX, with an "accounting
 *            file" with entries appended whenever a process exits; not
 *            surprisingly, that code thinks a file name is just a bunch
 *            of "char"s, with no multi-byte encodings (1979 called, they
 *            want their character encoding back), so, yes, this can
 *            mangle UTF-8 file names containing non-ASCII characters.
 *
 *    darwin_proc_uuid:
 *            The darwin_proc_uuid option is a set of 16 octets representing
 *            the process UUID.
 *
 */

(And, yes, that's why darwin_proc_name is limited to 15 octets and, yes, I'm old enough to remember when V7 Unix came out and provided the "kernel accounting" mechanism, and, yes, that mechanism is in XNU, see bad/kern_acct.c.

And they even reimplemented it in Linux - kernel/acct.c - although they incorrectly refer to it as "BSD-style" rather than "Bell Labs Research-style".) 

So it appears that packet blocks don't have the process information - instead, according to epan /dissectors/file-pcapng.c, there are darwin_dpeb_id and darwin_edpeb_id options for packet blocks that indicate the Process Event Block ordinal number of the process to which or from which the packet was sent.

Given the use of "-e darwin.process_id", this means that either:

	1) the frame dissector should copy the data from a PEB to fields in the packet, so there's no indirection needed

or


	2) *all* references to PEB fields, not just references in packet-matching expressions (commonly called "display filters", although they're used for other purposes, such as coloring), but elsewhere, e.g. in the -e option, must support indirection.


"The second" refers to the second of those options; either the first or the second of them would support "-e darwin.process_id" and "-Y ’tcp.port == 6040 && darwin.flags.wake_pkt'".  The first of them, but not the second, would also support seeing the process information in the packet details for a given packet.

This is similar to the way we copy the interface name and description from the Interface Description Block into the dissection of a packet.

	

>>> Moreover, due to the quantity of the “legacy” pcap files, it might be a pragmatic idea to mention the block 0x80000001 as an “exception” in https://datatracker.ietf.org/doc/draft-ietf-opsawg-pcapng/ , so that future developers would skip this.
>> 
>> I.e., mark it as "used by Apple" to avoid having other people who (because they lack a Private Enterprise Number, or whatever) choose to use local-use blocks use that *particular* value?  If so, that's an issue for the pcapng spec.
> 
> That’s a possibile way to proceed, but I am not sure whether this is the *best* way to proceed. Another possibilities:
> 1. Add a preference to say that 0x80000001 always means Darwin PIB.
> 2. Add a heuristic to attempt to parse 0x80000001 as Dawin PIB, if successful mark the file as created by tcpdump.
> 3. …

If nobody other than Apple uses 0x80000001 as a block type, there's no need for *any* of these; we can treat this as a *de facto* assignment of 0x80000001 to Apple, and always dissect it as a PEB.

If somebody other than Apple *does* use 0x80000001, there should be a way to control how it's dissected - preferably a way that allows plugins, so that whoever *other* than Apple uses it can use a plugin with standard Wireshark to handle it.

> Sounds like we are on the same page.
> 
> I would like to hear from more people, but my tentative plan is to proceed in three or maybe four steps:
> 
> Step one: Wireshark land.
> I want to make sure that the legacy 0x80000001 is supported, in the sense that there are new frame fields that contain the darwin process metadata, if present.

"New frame fields" is "the first" from above. That makes sense to me, as it

	1) is similar to the way we handle IDB fields;

	2) avoids adding the indirection mechanism of "the second";

	3) makes the process information visible in the packet details.

> I know that Jim Young has done work on that in the past, I will touch base with him, but even if not, the support should be quite straightforward.

As noted, the Darwin extensions for this are handled if you're using Wireshark as Fileshark to look at the internal structure of a pcapng file, but there isn't any support for it if you're just using Wireshark to read network traces. There *are* merged changes from Jim Young, but those are the "Wireshark as Fileshark" changes - there aren't any changes to make it work for regular network trace processing.

The pcapng reader may require some changes to do a good job of allowing support for vendor and custom blocks to be as "pluggable" we want.  I have some work on that lying around somewhere.

We may also want to have the "Frame" dissector be extensible rather than requiring it to directly include code to pull data from IDBs, PEBs, etc. into the Frame section of the packet dissection.

> Step three: Wireshark land. 
> I will add support for the new custom block to Wireshark, so that it will support the new custom block as well as the legacy.

You'll need to add support for both; as indicated, the only support for the legacy block is in Wireshark-as-Fileshark.