On 12/29/21 5:15 PM, John Thacker wrote:
> I was working on a MR for moving the text2pcap/text_import debug over to the ws_log features and I ran into a seemingly bizarre problem. Setting the log level to a non-default value causes the pytest procedures to fail with heap corruption on the Gitlab Windows CI.
>
> Some of the text2pcap pytests depend on grepping through the stderr output for some of the debug information. Those tests originally passed the -d flag to text2pcap, so I replaced it with setting the log level to "debug" (and later "info") with the standard "--log-level debug" argument read by ws_log_parse_args().
>
> On Windows (but not Linux or MacOS, not clang or gcc, nor with either using ASAN), those tests which set the log level (and only those tests) started failing with a return code of 0xc0000374, heap corruption.
>
> As I looked into it closer, all the debug information that those tests used ought to be logged at "warning" or "message," which are at the default log level, so I was able to remove that flag, and then it passed.
>
> It looks like it might be related to some of the things discussed here, though I'm not 100% sure because I'm not a Windows programmer:
>
> https://discuss.wxpython.org/t/heap-corruption-on-windows/35583 <https://discuss.wxpython.org/t/heap-corruption-on-windows/35583>
> https://bugs.python.org/issue36792 <https://bugs.python.org/issue36792>
> https://bugs.python.org/issue37945 <https://bugs.python.org/issue37945>
>
> There's some kind of issue seen in Python 3.8 and higher, with Windows 10 build 1809 (which is a long term support build that is what the CI build server uses), with UTF-8 locales, with log systems that get system locale information and print dates, the Windows 10 Universal CRT, and heap corruption.
>
> It might have something to do with the tests spawning a lot of subprocesses in parallel and setting the log level to a different value eventually calling free_log_filter() from ws_log_set_debug_filter().
Is https://gitlab.com/wireshark/wireshark/-/pipelines/438735249 one of the pipelines that failed? If so, it looks like Wireshark is crashing and Python is complaining about its return code:
----
Traceback (most recent call last):
File "C:\builds\wireshark\wireshark\test\fixtures.py", line 54, in wrapped
test_fn(self, *fixtures)
File "C:\builds\wireshark\wireshark\test\suite_text2pcap.py", line 186, in test_text2pcap_ikev1_certs_pcap
check_text2pcap(self, 'ikev1-certs.pcap', 'pcap')
File "C:\builds\wireshark\wireshark\test\suite_text2pcap.py", line 144, in check_text2pcap_real
self.assertRun(text2pcap_cmd, shell=True)
File "C:\builds\wireshark\wireshark\test\subprocesstest.py", line 304, in assertRun
self.assertEqual(process.returncode, expected_return)
AssertionError: 3221226356 != 0
----
Yes, that's one of them. 3221226356 = 0xc0000374 and is apparently a special Windows return code for heap corruption.
Just a wild guess, but maybe we need to call setlocale at the beginning of text2pcap similar to our other executables?
That probably makes sense to do anyway. However, I tried another draft merge request adding "--log-level debug" to tshark (and dumpcap) executables, with no other changes, and saw the same issue:
All the tests where I added the log-level command fail with heap corruption on Windows (and nowhere else). Since tshark already has the setlocale command, I guess that's not it. The various bug reports seem to indicate that it only happens on UTF-8 code pages, and was fixed in a later Windows release.
John Thacker