Wireshark-dev: [Wireshark-dev] FT_STRING, FT_STRINGZPAD, and null padding

From: John Thacker <johnthacker@xxxxxxxxx>
Date: Sat, 5 Sep 2020 01:56:20 -0400
I don't understand this change, which makes me wonder if I understand FT_STRINGZPAD vs FT_STRING in general:

Everything in proto.c and tvbuff.c treats FT_STRING and FT_STRINGZPAD exactly the same currently, except for checking stray characters. If the length is given as -1, the length to the end of the tvbuff is used. Otherwise, the specified length is checked and copied from the tvbuff, and a NUL is added at the end regardless. The last byte ensures that they both work even if the buffer is not null terminated.

However, if the string is null terminated somewhere in the middle, all the bytes after that (presumably padding) are still copied into the return value, along with the extra '\0'. This doesn't really cause any problems other than wasting a little bit of memory and time, since the first null ends the string.

Before that commit, there was absolutely no difference that I could tell. Now the difference is that if the string is terminated by a NUL somewhere in the middle before the specified field length, whether or not the bytes in the padding after that must be null padding. It is FT_STRING that requires that any padding be zero-padding (and adds an expert warning if not), whereas FT_STRINGZPAD does not.

While I agree with the commit message that "there are protocols where a string that's not the full length of the part of the packet for the string has a null terminator but isn't guaranteed to be fully padded with nulls," and that there can be "a separate type for fields where we really *should* check that the padding is all nulls," I don't see why FT_STRINGZPAD isn't that latter type. To me, the name FT_STRINGZPAD implies that it is the one where we really should check that the padding is zero padding.

I would reverse the handling in that commit, and have FT_STRINGZPAD be the type that does check, and make FT_STRING be the type that does not check for trailing characters.

The documentation in README.dissector does not really explain the differences between the use cases, as it seems like FT_STRING can be used for everything where FT_STRINGZPAD can:

    FT_STRING   A string of characters, not necessarily
                NULL-terminated, but possibly NULL-padded.
                This, and the other string-of-characters
                types, are to be used for text strings,
                not raw binary data.
...
    FT_STRINGZPAD   A NULL-padded string of characters.
                The length is given in the proto_tree_add_item()
                call, but may be larger than the length of
                the string, with extra bytes being NULL padding.
                This is typically used for fixed-length fields
                that contain a string value that might be shorter
                than the fixed length.
John Thacker