Ethereal-dev: RE: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: Biot Olivier <Olivier.Biot@xxxxxxxxxxx>
Date: Tue, 18 Nov 2003 13:27:49 +0100
| From: Loïc Minier
|
| Biot Olivier <Olivier.Biot@xxxxxxxxxxx> - Mon, Nov 17, 2003:
|
| > No problems with that, but some handsets do not always
| comply with this
| > rule, so maybe we want to fall-back to text-string
| afterwards? Additionally,
| > some handsets only display the starting quote, other
| handsets also encode
| > the closing quote. And others don't quote at all :)
|
| In my implementation, I've got a generic header name parser and a
| generic header value parser used in cases where the guessed parser
| fails. In this case (I've never met), the QuotedString parser would
| fail and the HeaderValue generic parser will attempt to decode the
| value as a stringz. This is based on section 8.4.1.2 which
| I've always
| seen respected.
| I see two possibilities to support nicely quoted-strings:
| - we always parse the headers with a generic header-name +
| header-value
| parser and put that in a tvb, then we try to call a specific
| parser on
| this tvb;
| - we try to guess the correct parser, with respect for the
| norm, and if
| this fails, we fall back on a generic parser.
|
| Right now, I think the main loop does something like trying
| to call the
| parser from a table of macro generated parsers.
| What do you think?
The current implementation is also based on a generic header parser, and a
generic value parser is called only for well-known headers (represented with
a 7-bit index into an array of parser functions; this removes a lengthy
switch statement). If the header name is textually encoded, then I bypass
calling a header parsing function as the value MUST be encode as text too.
Unfortunately some early WAP phones implemented a Time-of-Day clock synch
mechanism by breaking the mentioned MUST rule, so this functionality
remained in add_headers().
I heavily used macros for keeping the code readible and maintainable (to me
at least, and hopefully to others too :). There are 72 well-known header
encodings for the default code page, and 35 for the Openwave code page. Add
the 2 default parsers, and you get about 100 parsers. The framework of those
100 parsers is the same, and is documented around line 1800 of the code.
Regarding the Quoted-string issue, I have already seen the following:
1. "text
2. "text"
3. text (on rare occasions; should be flagged as an error anyway)
We *might* want to check if the last character of the quoted-string is a
quote, and strip it; however the definition of TEXT tells us that a quote is
valid, meaning that then example 2 should be interpreted as "text"" (with
double quote at end)...
I like your proposal for is_quoted_string(x) but I also want the
get_quoted_string() macro to be correct :)
I would suggest to fall-back to get_text_string() in the code wherever
get_quoted_string() fails (ok == FALSE), and to document this. In case we
fall back, I want to display a warning too, like I already do in similar
cases. And the logic of this fall-back must be explained in the code too so
other developers know why it's in etc etc...
For the sake of efficiency, we may want to do a get_text_string(), check for
is_quoted_string(val_str[0]), and if true, display val_str+1 else display
val_str and append a warning. I'll implement this this evening, unless
someone objects.
| > | - what is this for:
| > | "val_len++; /* For extra '\0' byte */"
| > | ... also in packet-wsp.c, about line 1740 after this patch
| > If the text string is { 'a', 'b', 'c', '\0' }, then val_len
| will only return
| > 3 while the value is indeed encoded as 4 bytes (val_len is
| the actual length
| > of the value as found in the packet). So we need to add
| this zero byte to
| > the count.
|
| That was understood, sorry for being unclear, I was
| wondering where do
| you use your computed val_len? Maybe I'm misunderstanding some macro
| stuff or this statement is "just in case", but it seems to me val_len
| isn't used after this if {}.
This is some over-generalized piece of the code, which I already removed and
it will be sent as a patch.
Regards,
Olivier
- Follow-Ups:
- Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- From: Loïc Minier
- Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Prev by Date: Re: [Ethereal-dev] How big is Ethereal?
- Next by Date: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Previous by thread: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Next by thread: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Index(es):