Wireshark-dev: Re: [Wireshark-dev] Tshark closing unexpectedly due to failure reading from file

From: James Ko <jck@xxxxxxxxxx>
Date: Thu, 19 Nov 2020 16:15:11 -0800
Hi John,

Thanks for your analysis.  If you still have the strace logs would you attach them to the bug report together with your analysis?

So I think the questions now are..
Why do we have the difference in read/write timing?
What are the events that need to happen before the first bytes are written to the /tmp/wireshark file ?
Does the initial SHB to the file get generated from within dumpcap itself at startup, or does dumpcap wait for the start of the TCP stream with its own SHB and just pass that data through?  I think it's the latter as you've proven the temporary file is exactly the same as the source.
Does the delay for TCP connection cause that initial 0 byte read?
What other delays are happening in dumpcap startup for which tshark isn't allowing enough time?

I see that there is a update_cb parameter to sync_pipe_start() but it is NULL in the tshark case.  wireshark uses it for ui update. The callback if it exists is called just before the pipe_input_set_handler().  Perhaps adding a callback to introduce a delay at this point when connecting to a TCP socket may help.

Are there any other things we might try to fix or avoid this race condition?

James




On Thu, Nov 19, 2020 at 6:13 AM John Sullivan <jsethdev@xxxxxxxxxxxxxx> wrote:
On Monday, November 16, 2020, 8:45:48 PM, Alastair Scott wrote:
> I've posted an in-depth description of the issue with logs and pcap's
> attached here: https://gitlab.com/wireshark/wireshark/-/issues/17013

Observations after running under strace with "-f -s 4096" with the
same capture file (5992 bytes, SHA1 cefbb7):

* tshark execs dumpcap to do the actual "capture" and communicates
with it over pipes and a temporary file

* Given we're reading from a pcapng file, despite the convoluted data
path, dumpcap's output to the temp file is actually identical to the
original pcap file (same size, same sha1sum). This is written out in what
looks like packet-sized chunks as it reads them from the socket.

* The reading end does three read() calls. The first two between them
read the first 4096 bytes, the third reads the remaining 1896 bytes.
The first happens almost immediately after opening the file, the
second two not until the writing end has finished writing all data
(which it signals by writing the packet count over the pipe). This is
the same between failing and successful runs.

* By extracting the data from strace output I can prove that the
reading end does in fact read all data, and exactly the same data as
in the original file, even in the failing case. There is no problem
there.

* One key difference appears to be that in the failing case the first
read happens before the writing end has written anything at all, and
reads 0 bytes. The second read then reads 4096 bytes, and the third
1896 bytes, before the reading end throws that all away and aborts. In
the successful case the first read happens late enough to return 28
bytes, then the second reads asks for (and gets) only 4068 bytes to
make up the difference to 4096 bytes.

So it looks like a race at the beginning, and even though the complete
data transfer is executed in all cases, if that initial read didn't
get the first valid file chunk so the "section header block" can be
validated, then it will ultimately cause the file to be rejected.

John
--
Dead stars still burn

___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@xxxxxxxxxxxxx>
Archives:    https://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://www.wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-request@xxxxxxxxxxxxx?subject=unsubscribe