On 01/06/2011 12:49 PM, Martin Mathieson wrote:
Martin, I assume the pre-tvb_get_ptr() code here was similar to this
change in that it only retrieved the string once? (I ask since
several of the strings are used multiple times.)
The diff is at
http://anonsvn.wireshark.org/viewvc/trunk/epan/dissectors/packet-catapult-dct2000.c?r1=28085&r2=28183&pathrev=35393
<http://anonsvn.wireshark.org/viewvc/trunk/epan/dissectors/packet-catapult-dct2000.c?r1=28085&r2=28183&pathrev=35393>.
(Sorry, I don't know why I didn't look that up myself...)
There were some strings that were being retrived in more than one place,
but probably only once for any given frame. I suspect the slowest part
was probably the block around line 1610 where it was pulling out these 3
strings for every frame.
Well, a couple are added to the protocol item (around line 1610) and
then again to col_info (around line 2102).
After remembering that profiling (at least in its easiest form with the
'time' command) isn't so hard, I played around a bit. After building a
decent-sized dct2000 file (taking the sample from SampleCaptures and
merging it with itself until I had a 276 Mb file), I tried before and
after this change and I can't find a measurable difference in the CPU
usage. I even tried forcing my (AMD) CPU down to 1 GHz to exaggerate
the difference, but I still got only a couple of seconds CPU time
difference out of over 5 minutes--and in that case rev 35393's code was
faster.
Maybe I'll try tomorrow on a SPARC: I know that memcpy()s are a lot more
expensive there than on x86.
I suppose for cases like this a get_string function that uses the
TVB as the backing store (and only pushes the
tvb_get_strsize()+tvb_get_ptr() out of the dissector and into
tvbuff.c) might be an option to help reduce the prevalence of
tvb_get_ptr() in dissectors. That may not be a bad idea because I
think I saw some other dissectors doing the same thing as this
one--and why take the performance hit?
My strings are null-terminated, and I was using
tvb_get_ephemeral_string(), whereas I already knew the lengths from when
I first parsed the line, and that they do terminate with a NULL before
the end of the tvb. Would this function do more than just cast
tvb_get_ptr() to a char* ? If not, it wouldn't be any more safe.
Note that my change also removed the tvb_getstrsize() in favor of using
the _string*z* function (which does it for you) and each string is
retrieved exactly once.
My thought on the new function was to only provide a, say,
tvb_get_const_stringz() function which:
- gets the string length with tvb_get_strsize() # now we know we'll
provide a NULL-terminated string
- calls tvb_get_ptr() to get a pointer (and ensure the TVB is flat)
- return a *const* char*
It is no safer than the code in this dissector (which is already safe)
and doesn't leave the dissector writer in a more dangerous position
(going off the end of a string--whether it's stored in a TVB or not--is
always going to be bad). Its sole purpose is to remove some dissector
usage of tvb_get_ptr(). Not sure if it's worth it--maybe I'll see if
other dissectors could use it too.