Wireshark-dev: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

From: "Maynard, Chris" <Christopher.Maynard@xxxxxxxxx>
Date: Fri, 23 Apr 2010 10:04:30 -0400
I think this is a good idea, but it obviously requires the value string  array to be properly sorted for it to work well.  And the bigger the array, the more enticing it would be to use this faster method, but I think it will also be potentially harder to ensure that those bigger arrays are indeed properly sorted.  Developers are bound to make mistakes, especially with the very large arrays, and as other developers add new entries, they may not know that a particular array is supposed to be sorted, and even if they do, there's no guarantee that a properly sorted array will remain so.

So, with that in mind ... is there a way, or could there be a way, to verify that certain value string arrays are properly sorted and remain so, perhaps during compilation or at least via a tools/checkXYZ.pl script?

Also, is there a recommendation on the cutoff size of a value string array where one would choose to use the faster method?  Maybe this requires more profiling?  Perhaps it would be a good idea to also have checkXYZ.pl complain about all unsorted value string arrays above a certain size, then they could be more easily identified and switched over to the faster method?

- Chris

________________________________________
From: wireshark-dev-bounces@xxxxxxxxxxxxx [wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Anders Broman [anders.broman@xxxxxxxxxxxx]
Sent: Friday, April 23, 2010 4:04 AM
To: Developer support list for Wireshark
Subject: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

Hi,
Looking at the code again I think it looks promising a further enhancement could
Be to add a flags field indicating the search method if the value_string is "full"
E.g. all values 0-x used the value could be directly used as an index to the sought string.

I just need to find the time to do some profiling/tests...

Regards
Anders

-----Original Message-----
From: wireshark-dev-bounces@xxxxxxxxxxxxx [mailto:wireshark-dev-bounces@xxxxxxxxxxxxx] On Behalf Of Jakub Zawadzki
Sent: den 22 april 2010 23:40
To: Ed Beroset; Developer support list for Wireshark
Subject: Re: [Wireshark-dev] RFC: sorted value_string + bsearch

On Thu, Apr 22, 2010 at 10:47:14AM -0400, Ed Beroset wrote:
> I may have missed it, but have we measured to see if this is worth optimizing in the first place?

I have some old callgrind log, where match_strval (for 58736 frames) is called a lot:

 from dissect_ieee80211 58,7k
 from ieee_80211_add_tagged_parameters 228k

It's problem with ieee80211 dissector, but when you grep dissector sources:

#v+
$ grep -Ir 'val_to_str' ./ | wc -l
3621

$ grep -Ir 'match_strval' ./ | wc -l
305
#v-

It's a common thing to use these functions.

> If not, I think I'd favor choosing code

Come on, this code require only adding:
   static const VALUE_STRING_FAST(some_value_string_array);

I don't see much difference between using:
   value_to_str_fast(..., &foo_fast) and value_to_str(..., foo)

If you don't like automatic variable naming I can replace VALUE_STRING_FAST with:

  #define VALUE_STRING_FAST_INIT(x) = { array_length(x)-1, x }
  static const value_string_fast foo = VALUE_STRING_FAST_INIT(bar);

Dissector maintainer is free to use old/new (or write his own) function he want, but putting 57 new lines to core won't hurt.

> clarity over an inconsequential speedup.

ATM I don't have working profiler to benchmark wireshark, but I made some _raw_ benchmark [1], for 5k entries and searching 0 to 6k three times

Average time for 16 tests:
 val_to_str_fast: 0.02s
      val_to_str: 1.56s

Sorry for not providing real-life benchmark.

Cheers.

[1] http://szara.chmurka.net/value_string-bench.c
CONFIDENTIALITY NOTICE: The contents of this email are confidential
and for the exclusive use of the intended recipient. If you receive this
email in error, please delete it from your system immediately and 
notify us either by email, telephone or fax. You should not copy,
forward, or otherwise disclose the content of the email.