Comment # 8
on bug 10214
from Peter Wu
After a second thought, the garbage copy does not matter. First, it looks for a
terminating NUL. Anything thereafter gets ignored.
Here are some results for attempts to optimize it. n=1e9, a haystack of 16
bytes ('x') and a needle of 1 byte ('x'). CPU cycles count (rdtsc) is used as
indicator (times 1e9 for total time, that was actually measured). First number
is a normal build, second number is with ASAN enabled, third number is normal
build but a needle of 8 bytes.
18.1 (22.8; 18.1) - current SSE implementation
32.3 (45.0; 44.2) - test and copy without memcpy (aliasing the mask as char *,
skipping _mm_load_si128)
33.2 (47.2; 51.5) - test and copy without memcpy
37.2 (66.0; 42.8) - loop with ptr
38.5 (69.0; 50.1) - loop through needles
45.0 (90.0; 43.3) - naive implementation with strlen (weird, a longer string is
faster each time?!)
46.3 (100.; 47.3) - memchr to find needles length, substract ptrs for length
50.9 (74.6; 65.3) - _ws_strpbrk
---
I'll make an ASAN build use one of these changes rather than disabling
everything. The performance advantage is still something to consider even after
adding the ASAN quirk.
---
The following replaces the alignment checks branches of needles:
#define FALLBACK return _ws_mempbrk(s, slen, a)
// strlen:
length = strlen(a);
if (length > 16)
FALLBACK;
// memchr:
char *p = memchr(a, '\0', 16);
if (p == NULL)
FALLBACK;
length = p - a;
// loop through needles:
length = 0;
for (i = 0; i < 16; i++) {
if (a[i] == '\0') {
length = i;
break;
}
}
if (length == 0)
FALLBACK;
// loop through ptrs (assumes small needle)
const char *p = a;
while (*p)
p++;
length = p - a;
if (length > 16)
FALLBACK;
// the replacements code follows up with:
__m128i a128 = _mm_setzero_si128();
memcpy(&a128, a, length);
mask = _mm_load_si128 (&a128);
// test and copy (without memcpy):
char tmp[8] = { '\0' };
int i;
for (i = 0; a[i] && i < 16 && a[i]; i++)
tmp[i] = a[i];
/* larger than 16B */
if (tmp[15] != '\0')
return _ws_mempbrk(s, slen, a);
mask = _mm_load_si128 ((__m128i *) (void *) tmp);
You are receiving this mail because:
- You are watching all bug changes.