Wireshark-dev: Re: [Wireshark-dev] epan/asm_utils* and NASM

From: Guy Harris <guy@xxxxxxxxxxxx>
Date: Fri, 13 Oct 2017 18:45:27 -0700
On Oct 13, 2017, at 2:11 PM, Guy Harris <guy@xxxxxxxxxxxx> wrote:

> On Oct 13, 2017, at 1:50 PM, Gerald Combs <gerald@xxxxxxxxxxxxx> wrote:
> 
>> Before we migrated away from NMake, epan/Makefile.nmake built the assembly versions of various routines for x86 (but not x64) defined in epan/asm_utils_win32_x86.asm. Should we resurrect it in epan/CMakeLists.txt or get rid of it along with the NASM download in tools/win-setup.ps1?
> 
> Are there any platforms on which the assembler versions are significantly faster than the non-assembler versions?
> 
> If not, I'd say get rid of it.

OK, this all dates back to:

	https://www.wireshark.org/lists/wireshark-dev/200711/msg00303.html

which was about speeding up Wireshark startup.

asm_utils_win32_x86.asm contains:

	wrs_strcmp() (with an apparently-unused alias wrs_strcmp_with_data()), which is a 4-bytes-at-a-time unrolled version of strcmp()

	wrs_str_equal(), which is a 4-bytes-at-a-time routine to compare routines for equality, without caring whether, if unequal, string A is greater than or less than string B (thus a bit simpler than strcmp())

	wrs_check_charset(), which is a 4-bytes-at-a-time routine to check whether all characters in an 1-byte-character string are in a given character set, with the set represented as a table of 256 bytes with 1 meaning "in the set" and 0 meaning "not in the set";

	wrs_str_hash(), which is a 4-bytes-at-a-time string hashing function.

It's in Intel assembler syntax; I don't know how many UN*X assemblers support Intel syntax rather than AT&T syntax, so for use on UN*X this might require two versions.

For wrs_strcmp(), that seems useful only if Microsoft's own strcmp() isn't fast enough.

For wrs_str_equal(), the bulk of the loop is the same as wrs_strcmp(), so, if Microsoft's own strcmp() is fast enough, the only advantage of wrs_str_equal() would be that you'd spend a little less time per string pair computing a 3-way less/greater/equal result and then turning it into 1 for equal and 0 for less or greater.

For the others, they're interesting optimizations, but if they were rewritten in C, and used on all platforms where you can do unaligned loads and stores (at this point, that might mean "anything that's not SPARC"), it might be as fast (assuming the compiler generated similar code for extracting the 4 bytes from the word) and usable on other platforms.  For extra credit, do it 8 byte at a time on ILP64/LLP64 platforms.

So some questions are:

	1) How much do they speed up Wireshark startup on 32-bit x86 on Windows?

	2) How much do they speed up Wireshark startup on 32-bit x86 on various UN*Xes (which may mean "translate them to AT&T assembler") - the answers may differ on different platforms?

	3) What about x86-64?

	4) For wrs_check_charset() and wrs_str_hash(), how much of a difference do they make on non-x86 platforms not from Oracle :-) if done in C?