Wireshark-dev: [Wireshark-dev] Re: Inquiry Regarding Protocol Identification Process in Wiresha

Date Prev · Date Next · Thread Prev · Thread Next
From: Jaap Keuter <jaap.keuter@xxxxxxxxx>
Date: Mon, 31 Mar 2025 23:33:06 +0200
Hi Yoon-Seong Jang,

Thank you for taking an interest in Wireshark and its internals. Let me try to discuss your inquiry.

As you may be aware Wireshark is designed to handle all kinds of protocols, at any layer of the OSI model (apart from the Physical layer that is). To that end it has support for various methods of determining the protocol stack of each packet. Please note that I refer to a protocol stack here, not just a single protocol. In that Wiresharks’ handling of a packet very much mimics the layered concepts of the OSI model.

Starting at the lowest level, the frame, this has associated with it an encapsulation type. This encapsulation, as determined by the packet capture mechanism, defines the first level of protocol  dissection Wireshark is using to look at the packet data. In many cases this is the Ethernet encapsulation, therefore the first protocol is often “Ethernet II”.

The Ethernet protocol itself has a Type field, with which it indicates what the protocol in the payload of this Ethernet packet is. To forward the payload to the appropriate dissector for this Type, the Ethernet dissector exposes a 'dissector table’. Other dissectors can register with the value for their particular protocol at this table, indicating their capability to dissect this protocol. 
For example, the IPv4 dissector registers with value 0x0800 at the Ethernet Type dissector table, and the UDP dissector registers with value 17 at the IPv4 Protocol dissector table. 

This same principle, based around a dissector table, is used whenever a protocol has a field that identifies the payload protocol. I.e. IPv4 (protocol), IPv6 (next header), TCP (port number), UDP (port number), LLC (DSAP), etc.

Whenever a protocol had an explicit indication of the payload protocol and assuming this value is correct, Wireshark can unambiguously determine the type of the next protocol layer. However, not all protocols have such a field!. For example MPLS only has a label stack. Once it indicates ‘bottom of stack’ what follows is a packet of some unknown protocol.
This is where two other options of protocol identification come into play. The first is heuristics, the second is user configuration.

Heuristics is a method of looking at a sample of the payload and making an attempt to guess the right protocol. Some protocols have a distinct signature to them, while others might be harder to identify. When heuristics have to be used the dissector exposes a heuristic dissector table where dissectors can register their interest attempting to identify the payload as their protocol. Whenever a payload comes by it is presented to these registered dissectors. If the first dissector indicates to not recognise it, the payload is handed to the next, until a dissectors indicates that is has recognised the protocol and takes care of the dissection of it. 
Heuristics are a good as the identification of the payload can be. This is not without flaw, so Wireshark can not unambiguously determine the type of the next protocol layer. It can only make a best effort attempt.

When heuristics are of too poor quality (as in, it is not possible to determine the protocol from the payload with enough certainty) the dissector can also expose a user configurable dissector table. In this table some characteristic of the protocol content is used to determine the next protocol layer. This has to be defined by the user.
For instance, the MPLS dissector can make an attempt to heuristically attempt identify the next protocol layer, or the user can set the next protocol layer based on the last label on the stack. Even though the relationship between a MPLS label and the protocol contained in the MPLS payload is totally ambiguous, it can be present for this particular capture file. Needless to say these mappings often need adjustment from capture to capture.

One final method that exists are signalling protocols. These protocols are used to communicate the mapping between certain oblique identifiers in protocols and the interpretation of the protocol data. For instance RTP packets contain sound data, but its encoding is identified by a single 7-bit payload type value. For some values the encoding is defined (e.g. 0 = PCM u-law), but other values are dynamic. For this the SDP protocol is used, which defines the mapping between RTP payload type and codec used. This is an example of Out-of-Band signalling, a separate protocol is used to define the relationship between identifier and protocol. Wireshark can store these mappings learned from the signalling protocol and apply them in other protocols.
Whenever a protocol uses In-band signalling the solution is the same, Wireshark can store these mappings and use them with subsequent packets in the capture file.

To recap, these general methods exist in Wireshark to determine the protocol stack in a packet:

1. Explicit, through some protocol type field
2. Heuristically, through payload examination
3. User configuration
4. In-Band, or Out-of-Band signalling 


Regards,
Jaap


On 31 Mar 2025, at 08:35, brave1094 <brave1094@xxxxxxxxxxx> wrote:

Dear Wireshark Team,

My name is Yoon-Seong Jang, a combined Master's and Ph.D. student at Korea University in the Republic of Korea.

We are currently conducting research focused on analyzing various types of application traffic and malicious traffic, with the goal of classifying them using deep learning techniques.

In this process, Wireshark has been an invaluable tool and is widely used in our research.

The reason I am reaching out via email is to ask about how Wireshark determines the protocol of each packet or flow when decoding a given pcap file.

From our observations, it seems that the protocol is often determined based on the port number. However, we would greatly appreciate a more objective explanation or documentation regarding the actual rules or logic used by Wireshark for protocol decoding.

A detailed explanation would be extremely helpful for our research.

Thank you very much for taking the time to read this email despite your busy schedule.

Sincerely,
Yoon-Seong Jang