Wireshark-dev: [Wireshark-dev] Idle Thoughts on Parallelized Packet Dissection
From: Evan Huus <eapache@xxxxxxxxx>
Date: Sun, 13 May 2012 10:33:56 -0400
This is a topic that's been stewing in the back of my brain for a while now, but it's cooked enough that I think it's worth getting feedback on. This is a long, (overly) detailed email - read with caution :) tl;dr; I think it's possible to support parallelized (multi-threaded) packet dissection in a manner that's both useful (provides a good distribution of work over multiple cores) and backwards compatible with dissectors written in the current, single-threaded style. The changes to the Wireshark core would be significant and intrusive, but once finished individual dissectors could be updated at our own convenience. --- First off, there are a lot of obvious, known problems (like global variables) that would need to be fixed before multi-threading makes sense. Most of them appear to be documented at [1]. But we know, more or less, how to fix those - we just need to put in the work. Unfortunately, there isn't a lot of motivation, because we're still stuck at the second part of the problem. In Guy's words (from [2], #6), packet dissection is an "embarrassingly serial" problem in a lot of ways. Dissecting a single protocol in a single packet can depend on other packets, on other protocols in the same packet, on conversations and other structures, and on who-knows-what-else, and right now that information isn't really stored anywhere (except implicitly through the use of certain APIs, but that's certainly incomplete). And even if we were to somehow collect all that information, what would we do with it? Since TCP conversations (and window calculations, and ack analysis, and ...) depend on all the previous TCP packets in the capture, does that mean that we have to revert to strictly single-threaded dissection as soon as we see a TCP packet? That wouldn't be very useful at all. It is worth noting that a lot of TCP dissection could be done without any previous information. The locations and values of the fields themselves don't depend on anything stored in previous packets. Neither does the choice of sub-dissector. It's really all of the bells and whistles (conversations, ack analysis, expert info, etc.) that are the problem, but right now they're all mixed together with the easy stuff. So let's split them up. --- In broad strokes, the idea is this: instead of registering a single dissect_proto() function, allow dissectors to register multiple functions to be chained together for dissecting a single packet. Allow these registrations to specify various levels of dependency on other parts of the capture (the three that come to mind are "Totally Independent", "Conversation Only", and "Everything", but I'm sure there are others). Then the core can run parts of the dissection in parallel, making sure for each function that its dependencies are satisfied before it gets called. For example, let's consider a bunch of TCP/IP/Ethernet packets (with nothing below TCP for simplicity's sake). The total work for a single packet, in the traditional serial dissection, would look like: 1. Ethernet 2. IP 3. TCP Now let's say that each of those three dissectors have been converted to use two chained functions, where the first is "Totally Independent" and the second depends on "Everything". If the newly parallelized dissectors were run serially, it would look like: 1. Ethernet ---a) Totally Independent ---b) Depends on Everything 2. IP ---a) Totally Independent ---b) Depends on Everything 3. TCP ---a) Totally Independent ---b) Depends on Everything However, they can now be run at least partially in parallel. Since sub-dissector choice depends only on a field in the current packet, it could be placed in the first, "Independent" function (part a) for all three protocols. With proper parallelization, 1b and 2a could be run simultaneously, and 2a could trigger 3a whether or not 1b is done yet. It would also be possible to do inter-packet parallelization, with step 1a being started in parallel on as many packets as desired. --- This idea is based on the assumption that most protocols have at least some parts of their dissection that don't depend on anything prior. These parts can therefore be split out into lower-dependency functions. Based on what I've seen looking at some common protocol dissectors, I don't think that's an entirely unreasonable assumption. It does mean, however, that APIs will need to be enhanced to support accessing and manipulating proto_trees in more interesting ways: if I register a field in my first function and then want to verify it in a later function, I should be able to pull it from the proto_tree by field (hf_whatever), rather than finding it's value in the tvb again. If I subsequently want to add expert info to the field, or insert a generated field immediately after it, I should be able to do so easily and quickly. Based on my understanding of the current proto_tree layout, all of this should be doable with some work (and possibly an extra data structure or two). On the plus side, this method provides a really easy way to maintain backwards compatibility for older dissectors. Simply leave the current registration function as a wrapper around the new registration function with a dependency of "All", and dissectors that haven't been adapted yet will be automatically serialized. --- Obviously this is a hugely ambitious undertaking, but I think it's doable, and the benefits on modern multi-core systems would be significant. Please ask questions and provide feedback, I'm sure there are things I've missed. Thoughts? [1] http://wiki.wireshark.org/Development/multithreading [2] http://wiki.wireshark.org/Development/Wishlist#General_.2F_Unsorted
- Follow-Ups:
- Re: [Wireshark-dev] Idle Thoughts on Parallelized Packet Dissection
- From: Jakub Zawadzki
- Re: [Wireshark-dev] Idle Thoughts on Parallelized Packet Dissection
- Prev by Date: Re: [Wireshark-dev] A set of patches to allow a pcap-ng file to be piped into wireshark
- Next by Date: Re: [Wireshark-dev] How do I build wireshark so I can run gdb on the result?
- Previous by thread: Re: [Wireshark-dev] A set of patches to allow a pcap-ng file to be piped into wireshark
- Next by thread: Re: [Wireshark-dev] Idle Thoughts on Parallelized Packet Dissection
- Index(es):