On Fri, Jan 7, 2011 at 3:33 AM, Jeff Morriss
<jeff.morriss.ws@gmail.com> wrote:
(Sorry, I don't know why I didn't look that up myself...)
No worries. I was just relieved to find that I'd written a meaningful checkin comment ;)
There were some strings that were being retrived in more than one place,
but probably only once for any given frame. I suspect the slowest part
was probably the block around line 1610 where it was pulling out these 3
strings for every frame.
Well, a couple are added to the protocol item (around line 1610) and then again to col_info (around line 2102).
After remembering that profiling (at least in its easiest form with the 'time' command) isn't so hard, I played around a bit. After building a decent-sized dct2000 file (taking the sample from SampleCaptures and merging it with itself until I had a 276 Mb file), I tried before and after this change and I can't find a measurable difference in the CPU usage. I even tried forcing my (AMD) CPU down to 1 GHz to exaggerate the difference, but I still got only a couple of seconds CPU time difference out of over 5 minutes--and in that case rev 35393's code was faster.
Maybe I'll try tomorrow on a SPARC: I know that memcpy()s are a lot more expensive there than on x86.
I think you win, the difference isn't worth it and it'd be better not to leave unnecessary examples of tvb_get_ptr() use around.
The profiling I did was with "valgrind --tool=callgrind --trace-children=yes ./wireshark", then I looked at the callgrind file using kcachegrind. I no longer have those files, but I do now remember that adding the 'double' field took 0.4%, and that was more than what this is trying to fix. So really it was in the noise - I guess I was just concentrating on improving the parts I had written (and I often profile embedded programs on othe workstation, where extra spoofing and other test code isn't optimised, so I'm used to speeding up parts of the code that don't show a large part of overall CPU usage in kcachegrind).
Soon after that I realised that to spend 30% of CPU reading text lines from the file (in wiretap) was too high. When I configured to compile without zlib support that went down to 0.3%, and I haven't worried too much about performance since then.
I suppose for cases like this a get_string function that uses the
TVB as the backing store (and only pushes the
tvb_get_strsize()+tvb_get_ptr() out of the dissector and into
tvbuff.c) might be an option to help reduce the prevalence of
tvb_get_ptr() in dissectors. That may not be a bad idea because I
think I saw some other dissectors doing the same thing as this
one--and why take the performance hit?
My strings are null-terminated, and I was using
tvb_get_ephemeral_string(), whereas I already knew the lengths from when
I first parsed the line, and that they do terminate with a NULL before
the end of the tvb. Would this function do more than just cast
tvb_get_ptr() to a char* ? If not, it wouldn't be any more safe.
Note that my change also removed the tvb_getstrsize() in favor of using the _string*z* function (which does it for you) and each string is retrieved exactly once.
My thought on the new function was to only provide a, say, tvb_get_const_stringz() function which:
- gets the string length with tvb_get_strsize() # now we know we'll provide a NULL-terminated string
- calls tvb_get_ptr() to get a pointer (and ensure the TVB is flat)
- return a *const* char*
It is no safer than the code in this dissector (which is already safe) and doesn't leave the dissector writer in a more dangerous position (going off the end of a string--whether it's stored in a TVB or not--is always going to be bad). Its sole purpose is to remove some dissector usage of tvb_get_ptr(). Not sure if it's worth it--maybe I'll see if other dissectors could use it too.
I see, so you could confirm that it is null-terminated but avoid the cost of allocating a string but just returning a char* to the memory in the tvb.
Martin