Wireshark-users: Re: [Wireshark-users] Reading a zero-terminated string in Lua dissector

From: Tony Trinh <tony19@xxxxxxxxx>
Date: Mon, 4 Jun 2012 16:22:03 -0400
On Mon, May 28, 2012 at 5:00 PM, Carsten Fuchs <CarstenFuchs@xxxxxxxxxxx> wrote:
Dear Wireshark group,

I'm currently writing my first Lua dissector for the network protocol of the Cafu game engine (http://www.cafu.de), and would be very grateful for your help:

The presentation at http://sharkfest.wireshark.org/sharkfest.09/DT06_Bjorlykke_Lua%20Scripting%20in%20Wireshark.pdf got me started very well, and reading fixed-width data works fine. For example (incomplete excerpt):

   local CafuProto = Proto("Cafu", "Cafu Engine network protocol");

   CafuProto.fields.SequNr = ProtoField.uint32("Cafu.SequNr", "1st sequence number")

   function CafuProto.dissector(buffer, pinfo, tree)
       local subtree = tree:add(CafuProto, buffer())
       local offset  = 0

       subtree:add(CafuProto.fields.SequNr, buffer(offset, 4));
       offset = offset + 4
   end


However, I was wondering how I best read zero-terminated strings?

Of course it is possible to loop over i until
       buffer(offset+i, 1):uint8() == 0
then
       i = i + 1    -- Read the zero as well.
       subtree:add(CafuProto.fields.MyString, buffer(offset, i));
       offset = offset + i

But I was wondering if there is a more direct and/or more elegant way?

Btw., what is the difference between ProtoField.string and ProtoField.stringz as mentioned at http://www.wireshark.org/docs/wsug_html_chunked/lua_module_Proto.html#lua_class_ProtoField ?

Any help or comments would much be appreciated!

Best regards,
Carsten



TvbRange.string() gets a sequence of n 8-bit characters from a TvbRange, including all intermediate zeroes, where n is the byte length of the TvbRange. This is consistent with the Lua definition of a string, described in the Lua reference manual:

Strings in Lua can contain any 8-bit value, including embedded zeros, which can be specified as '\0'.

TvbRange.stringz() gets a sequence of n 8-bit characters from a TvbRange, up to the first zero (or end of buffer).

Also see the equivalents for UTF-16 strings:

If the field is a null-terminated string and the max length of the field is known, you should use stringz() with a length-limited TvbRange (e.g., buf(0,10):stringz() for an 11-character field starting from buffer offset 0). Otherwise, omit the length field from the TvbRange (e.g., buf(0):stringz()).

Example:

function proto_foo.dissector(buf, pinfo, tree)
    -- assume buf is a TvbRange that contains "foo\0bar"
    
    -- gets "foo\0bar"
    local s = buf(0):string()

    -- gets "foo\0b"
    s = buf(0,5):string()

    -- gets "foo"
    s = buf(0,5):stringz()
end

ProtoField.string() and ProtoField.stringz() are similar to the above. The former specifies a protocol field that is a string of arbitrary bytes (including all zeroes), and the latter specifies a field that is a null-terminated string. I don't know when it would ever make sense to display intermediate zeroes in a string in the Packet Details Pane (in fact, these zeroes are displayed as unseemly boxes). I would even say that you should normally opt for stringz() unless you had an appropriate use case for string().

In your sample code, I would make a few minor changes:

  1. Change the field types of f.SC1_WI_GameName and f.SC1_WI_WorldName to ProtoField.stringz().
  2. Rewrite getStringLength() as follows:
local function getStringLength(buffer, offset)
    return buffer(offset):stringz():len()
end

Hope that helps.

-Tony