Wireshark-dev: Re: [Wireshark-dev] Multiple-line parsing of packets dissected over HTTP

From: Joey Salazar <jgsal@xxxxxxxxxxxxxx>
Date: Wed, 20 Jan 2021 23:17:37 +0000
Hi all, Jonathan,

On Tuesday, January 19, 2021 6:09 PM, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:

> Hi,
>
> Pascal Quantin wrote:
>
> > Le mar. 19 janv. 2021 à 17:45, Joey Salazar via Wireshark-dev <
> > wireshark-dev@xxxxxxxxxxxxx> a écrit :
>
> > > In commit 33af2649 [1] we can keep dissecting the contents of the req,
> > > adv, and res packets by setting
> > > while (plen > 0) { }
> > > either in `dissect_git_pdu()` or in `dissect_one_pkt_line()`, but for now
> > > in `dissect_git_pdu()` it'd be a bit messy, so wanted to ask for your
> > > feedback for getting `dissect_one_pkt_line()` to work properly first.
> > > As you can see in pcap 169 [2], it correctly parses the length of the
> > > first line as 0x0014 (20 bytes) until `0x0a`, then it's supposed to get the
> > > length of the next line by the first 4 hex bytes in that line, but instead
> > > of reading the length as 0x0018 (24 bytes) it's reading it as 0x0010 (16
> > > bytes), and anyways, this particular line's length actually is 59 bytes.
>
> Interesting. Let me summarize your question: getting the information
> in one place here, the relevant code at line 114 looks like
>
> | + while (plen > 0) {
> | + proto_tree_add_uint(git_tree, hf_git_packet_len, tvb, offset, 4, plen);
> | + offset += 4;
> | + plen -= 4;
> | +
> | + proto_tree_add_item(git_tree, hf_git_packet_data, tvb, offset, plen, ENC_NA);
> | + offset += plen;
> | + // To-do: add lines for parsing of terminator packet 0000
> | + }
>
> The relevant part of the pcap is shown in an image; transcribing
> imperfectly, I see
>
> | 0014command=ls-refs\n
> | 0018agent=git/2.29.0.rc2
> | 0016object-format=sha1
> | 0001
> [etc]
>
> where \n denotes a newline byte and there are no newlines between
> these pkt-lines.
>
> That first pkt-line has 4 bytes for the length, followed by 12 bytes
> for "command=ls-refs\n", including newline. So why does this parse as
>
> packet-length: 0x0014
> packet data: "command=ls-refs\n"
> packet-length: 0x0010
> packet data: "agent=[etc]"
>
> ?
>
> [...]
>
> > So what is the code leading to this dissection? It does not seem to be
> > https://gitlab.com/joeysal/wireshark/-/commit/33af2649927cb5660d4aeb64b9a9e9a58a1823aa
> > as dissect_one_pkt_line() seem to read only one line (BTW using a while
> > loop in this commit is useless as you are incrementing offset by plen, and
> > the code you shared considers that plen includes the 4 bytes of the packet
> > length field while your screenshot does not assume that).
>
> This reply is a bit dense, but it contains the hints to move forward:
>
> First, there's the matter of the contract of the dissect_one_pkt_line()
> function. The name suggests it would dissect a single pkt-line, but
> it has this loop in it. What does it actually do? And what do we
> want it to do?
>
> That second question is easy to answer: this code will be much easier
> to read if dissect_one_pktline dissects a single pkt-line! For
> example, if we imagine a contract like
>
> /** Dissects a single pkt-line.
> *
> * @param[in] tvb Buffer containing a pkt-line.
> * @param offset Offset at which to start reading.
> * @param[out] tree Tree to attach the dissected pkt-line to.
> * @return Number of bytes dissected, or -1 on error.
> */
> static int dissect_one_pkt_line(tvbuff_t *tvb, int offset, proto_tree *tree)
>
> then we could call this in a loop, like:
>
> int offset = 0;
>
> while (offset < total length)
> offset += dissect_one_pkt_line(tvb, offset, tree);
>
> Obtaining the total length and including some error handling left as
> an exercise to the reader.
>
> As for the first question: what does the current code do? The loop at
> l114 doesn't modify plen except by subtracting 4 from it. So instead
> of reading the pkt-length from the next pkt-line, it assumes it is 4
> bytes less. 0x14 - 4 is 0x10, hence the 0x10 as pkt length
> assumption.

This was most helpful, thank you, in some cases I couldn't understand why `pkt-length` was being parsed the one it was.
Apologies to all for my poor explanation of what I wanted the code to do!

I've updated the commit and it now is 9342d7fb [1], it is now reading the size of each pkt-line before checking for `0000` Flush Packet (checking for `0001` delimiter could be added in line 131) and jumping to the next pkt-line.

However, I understand now that `while (plen > 0)` in line 119 should not be doing that since that `plen` is for the first pkt-line only, making this an unsuitable approach.

I'll start working on the approach suggested above by Jonathan. All your feedback and observations most welcome.

Thank you all for the input and time invested so far,
Joey

[1] gitlab.com/joeysal/wireshark/-/commit/9342d7fbc15fad4ec7765bbbb0f1672319dc705a