Ethereal-dev: RE: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Biot Olivier <Olivier.Biot@xxxxxxxxxxx>
Date: Tue, 18 Nov 2003 13:27:49 +0100
| From: Loïc Minier
| 
| Biot Olivier <Olivier.Biot@xxxxxxxxxxx> - Mon, Nov 17, 2003:
| 
| > No problems with that, but some handsets do not always 
| comply with this
| > rule, so maybe we want to fall-back to text-string 
| afterwards? Additionally,
| > some handsets only display the starting quote, other 
| handsets also encode
| > the closing quote. And others don't quote at all :)
| 
|  In my implementation, I've got a generic header name parser and a
|  generic header value parser used in cases where the guessed parser
|  fails. In this case (I've never met), the QuotedString parser would
|  fail and the HeaderValue generic parser will attempt to decode the
|  value as a stringz. This is based on section 8.4.1.2 which 
| I've always
|  seen respected.
|    I see two possibilities to support nicely quoted-strings:
|  - we always parse the headers with a generic header-name + 
| header-value
|  parser and put that in a tvb, then we try to call a specific 
| parser on
|  this tvb;
|  - we try to guess the correct parser, with respect for the 
| norm, and if
|  this fails, we fall back on a generic parser.
| 
|  Right now, I think the main loop does something like trying 
| to call the
|  parser from a table of macro generated parsers.
|    What do you think?

The current implementation is also based on a generic header parser, and a
generic value parser is called only for well-known headers (represented with
a 7-bit index into an array of parser functions; this removes a lengthy
switch statement). If the header name is textually encoded, then I bypass
calling a header parsing function as the value MUST be encode as text too.
Unfortunately some early WAP phones implemented a Time-of-Day clock synch
mechanism by breaking the mentioned MUST rule, so this functionality
remained in add_headers().

I heavily used macros for keeping the code readible and maintainable (to me
at least, and hopefully to others too :). There are 72 well-known header
encodings for the default code page, and 35 for the Openwave code page. Add
the 2 default parsers, and you get about 100 parsers. The framework of those
100 parsers is the same, and is documented around line 1800 of the code.

Regarding the Quoted-string issue, I have already seen the following:
	1. "text
	2. "text"
	3. text (on rare occasions; should be flagged as an error anyway)
We *might* want to check if the last character of the quoted-string is a
quote, and strip it; however the definition of TEXT tells us that a quote is
valid, meaning that then example 2 should be interpreted as "text"" (with
double quote at end)...

I like your proposal for is_quoted_string(x) but I also want the
get_quoted_string() macro to be correct :)
I would suggest to fall-back to get_text_string() in the code wherever
get_quoted_string() fails (ok == FALSE), and to document this. In case we
fall back, I want to display a warning too, like I already do in similar
cases. And the logic of this fall-back must be explained in the code too so
other developers know why it's in etc etc...

For the sake of efficiency, we may want to do a get_text_string(), check for
is_quoted_string(val_str[0]), and if true, display val_str+1 else display
val_str and append a warning. I'll implement this this evening, unless
someone objects.

| > |    - what is this for:
| > |     "val_len++; /* For extra '\0' byte */"
| > |  ... also in packet-wsp.c, about line 1740 after this patch
| > If the text string is { 'a', 'b', 'c', '\0' }, then val_len 
| will only return
| > 3 while the value is indeed encoded as 4 bytes (val_len is 
| the actual length
| > of the value as found in the packet). So we need to add 
| this zero byte to
| > the count.
| 
|  That was understood, sorry for being unclear, I was 
| wondering where do
|  you use your computed val_len? Maybe I'm misunderstanding some macro
|  stuff or this statement is "just in case", but it seems to me val_len
|  isn't used after this if {}.

This is some over-generalized piece of the code, which I already removed and
it will be sent as a patch.

Regards,

Olivier