Wireshark-dev: Re: [Wireshark-dev] Header field with scaling factor/units?

From: "John Dill" <John.Dill@xxxxxxxxxxxxxxxxx>
Date: Thu, 10 Apr 2014 17:38:54 -0400
>Message: 3
>Date: Wed, 9 Apr 2014 16:16:40 -0700
>From: Guy Harris <guy@xxxxxxxxxxxx>
>To: Developer support list for Wireshark <wireshark-dev@xxxxxxxxxxxxx>
>Subject: Re: [Wireshark-dev] Header field with scaling factor/units?
>Message-ID: <4D624AB4-5FCD-47A2-8850-39CC4E6B4325@xxxxxxxxxxxx>
>Content-Type: text/plain; charset=iso-8859-1
>
>
>On Apr 9, 2014, at 11:01 AM, "John Dill" <John.Dill@xxxxxxxxxxxxxxxxx> wrote
>(in a font that gets rendered as rather small characters in my mail reader -
> you might want to use larger type to help out those of us with aging eyes):
>
>>I have a common use case (hundreds to low thousands of data elements) where
>>I need to take some data, encoded in an integer FT_UINT[8|16|32], sometimes
>>has a bitmask applied, and needs to be multiplied by a scaling factor that
>>may be an integer or floating point value, with an optional units string.
>>I didn't see a use case in README.developer that directly handles this scenario.
>
>Unfortunately, while both scaling and appending units would be useful for a
>number of fields, we don't have mechanisms to support them.
>
>>Since at the moment it appears that I need to overwrite the item's text
>>string to accomplish what I want, I was considering hijacking the 'strings'
>>member to store the scaling factor and units strings.  Then I could test
>>for the existence of a scaling factor/units string in the hf->strings member.
>>I'll probably have to package it into a VALS and use try_val_to_str to
>>access the units string to remain compatible with 'proto_tree_add_item'
>>before I rewrite the text representation.    The scale factor code be
>>encoded as a string where I'd have to convert it on the fly using some form
>>of strto[d|l|ul].  Of course this could be just added inline with the
>>dissector code, but it would be nice to have a place in the hf_register_info
>>declaration that documents this information.
>>
>>I would think it would be possible to extend the FT_ types with a constant,
>>that informs the api that the scaling factor and units are encoded in
>>'hf->strings' as [{ 0, "0.25" } { 1, "pounds" }] with a new interface
>>function or two to implement it.
>
>Currently, the header_field_info structure has a field named "display".
>Originally, it was used only for numerical values, to control the base in
>which to display the number; it's now used for other field types to control
>how they're displayed as well.  In addition, for numerical fields, there
>are some flags that can be set to indicate how to interpret the ext field...
>
>...which is the "strings" field.
>
>I might be inclined to, for numerical fields, divide the "display" field
>into a 4-bit field used for the base and another field indicating how the
>strings field is to be interpreted (currently, that's what we have, but
>they're implemented as bit flags, which means that there are mixtures of
>flags that might not make sense).
>
>We could add an additional type, in which the "strings" field encodes a
>scaling factor and units; perhaps integral and floating-point scaling
>factors could have different types.  (I would include the scaling factor
>as a number, not a string.)
>
>I would also rename "strings", while we're at it - I think "display" might
>have been called "base" in the old days.
>
>>How difficult would it be to allow a filter expression to be able to
>>search on a header field whose condition assumes that the scaling factor
>>has been applied, i.e., the data is an integer and has a scaling factor
>>of .25 and you want to filter its value using a floating point value
>>(probably quite difficult I'm guessing)?
>
>We'd probably want to support both filtering on the raw and displayed value
>of a field.  We can already do that with "enumerated" fields - if you have
>a field with a value_string table that maps 2 to "Hello", then you can do
>
>        proto.field == 2
>
>or
>
>        proto.field == "Hello"
>
>We might want to add syntax so that, for a field with a scale factor of 0.5,
>we might have
>
>        wlan.rate = raw(22)
>
>or
>
>        wlan.rate = 11

I think raw() is a fine option.  Another idea is to surround the value with a
delimiter character that's not used otherwise to signal that the value should
be compared against the raw data.  If '~' isn't used for instance, one could\
use ~22~.  I don't have a particular preference at this point in time.

>(no, that was not a randomly-chosen field example :-)).  Other suggestions
>for the syntax are welcome.

I spent some time thinking about this, and I believe that the scaling factor
and units could probably be implemented independently (as in one after the
other).  The units string is probably the easier of the two, so here are my
thoughts so far.  Hopefully not all of it is chaff.

Of all the FT_ types defined, this is the list that I think makes sense to
allow an optional unit string.

FT_INT[8|16|24|32|64]
FT_UINT[8|16|24|32|64]
FT_FLOAT
FT_DOUBLE

You could add FT_ABSOLUTE_TIME and FT_RELATIVE_TIME to turn on a units
display, but the only units that would make sense would be 'seconds' or
a derived abbreviation 's', 'sec', etc.

I do not know if FT_NONE should allow a unit string.

I'm going to iterate all of the possible scenarios to add a unit string
that I can think of.  The question is where this unit string should live,
and how the display of the unit string should be activated.

Obviously one can drop their own unit string if they take over the
formatting using one of the proto_tree_add_xxx_format(_value) functions
without any changes to the dissector api, or using the callback function
to custom format the data.  The drawback in this scenario is that the
unit string is segregated from the hf_register_info array declaration
and generating the appropriate proto_tree_add_xxx_format(_value)
function call is more cumbersome.

One alternative to integrating the unit string into the header_field_info
structure is to create an external "database" of unit strings using
a lookup table based on registered hf_ integers.  It could use a similar
method as how val_to_str searches an { hf_index, "unit string" } pair.
This structure would have to be populated after all the

static int hf_xxx = -1;

fields have been registered.  Once these unit strings have been associated
with their hf_ integer, one could incorporate adding that string as a
suffix to field types that allow it.  One can follow the same pattern as
proto_registrar_get_nth but have a new function prototype reference the
unit string database.

extern const char* proto_registrar_get_units(const int n);

The other proposal is to try to integrate the unit string into the header
field structure using the 'strings', in a similar fashion as how
BASE_RANGE_STRING and BASE_EXT_STRING are implemented.

To this end, a new define would be needed.

#define BASE_UNIT_STRING 0x40

After the label string is populated, a suffix function could inspect
'strings' and append the unit string to the display if that flag
is active.

\code snippet
if (hf->strings) {
  if (hf->display & BASE_UNIT_STRING) {
    proto_item_append_text(item, " %s", (const char *)hf->strings);
  }
}
\endcode

At the same time, any time that 'hf->strings' is used as a condition
when it's not NULL, an addition condition would have to check that
BASE_UNIT_STRING is not set for instances where 'strings' is assumed
to be one of the following.

(const value_string *)
(const value_ext_string *)
(const range_string *)

I don't know how far the side effects would go, but that's my initial
thoughts into trying to inject a unit string based on my limited
exposure to the Wireshark codebase.  If the "right" thing to do for
unit strings can be developed, then introducing the more complex
idea of a scaling factor can be approached within the context of
the "best" proposal for adding a unit string.

I'm still thinking about the scaling factor issue, and I'm out of time
for today.  Thanks for your comments and insight as always.

Best regards,
John Dill

<<winmail.dat>>