Ethereal-dev: RE: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.
From: Biot Olivier <Olivier.Biot@xxxxxxxxxxx>
Date: Tue, 18 Nov 2003 13:27:49 +0100
| From: Loïc Minier | | Biot Olivier <Olivier.Biot@xxxxxxxxxxx> - Mon, Nov 17, 2003: | | > No problems with that, but some handsets do not always | comply with this | > rule, so maybe we want to fall-back to text-string | afterwards? Additionally, | > some handsets only display the starting quote, other | handsets also encode | > the closing quote. And others don't quote at all :) | | In my implementation, I've got a generic header name parser and a | generic header value parser used in cases where the guessed parser | fails. In this case (I've never met), the QuotedString parser would | fail and the HeaderValue generic parser will attempt to decode the | value as a stringz. This is based on section 8.4.1.2 which | I've always | seen respected. | I see two possibilities to support nicely quoted-strings: | - we always parse the headers with a generic header-name + | header-value | parser and put that in a tvb, then we try to call a specific | parser on | this tvb; | - we try to guess the correct parser, with respect for the | norm, and if | this fails, we fall back on a generic parser. | | Right now, I think the main loop does something like trying | to call the | parser from a table of macro generated parsers. | What do you think? The current implementation is also based on a generic header parser, and a generic value parser is called only for well-known headers (represented with a 7-bit index into an array of parser functions; this removes a lengthy switch statement). If the header name is textually encoded, then I bypass calling a header parsing function as the value MUST be encode as text too. Unfortunately some early WAP phones implemented a Time-of-Day clock synch mechanism by breaking the mentioned MUST rule, so this functionality remained in add_headers(). I heavily used macros for keeping the code readible and maintainable (to me at least, and hopefully to others too :). There are 72 well-known header encodings for the default code page, and 35 for the Openwave code page. Add the 2 default parsers, and you get about 100 parsers. The framework of those 100 parsers is the same, and is documented around line 1800 of the code. Regarding the Quoted-string issue, I have already seen the following: 1. "text 2. "text" 3. text (on rare occasions; should be flagged as an error anyway) We *might* want to check if the last character of the quoted-string is a quote, and strip it; however the definition of TEXT tells us that a quote is valid, meaning that then example 2 should be interpreted as "text"" (with double quote at end)... I like your proposal for is_quoted_string(x) but I also want the get_quoted_string() macro to be correct :) I would suggest to fall-back to get_text_string() in the code wherever get_quoted_string() fails (ok == FALSE), and to document this. In case we fall back, I want to display a warning too, like I already do in similar cases. And the logic of this fall-back must be explained in the code too so other developers know why it's in etc etc... For the sake of efficiency, we may want to do a get_text_string(), check for is_quoted_string(val_str[0]), and if true, display val_str+1 else display val_str and append a warning. I'll implement this this evening, unless someone objects. | > | - what is this for: | > | "val_len++; /* For extra '\0' byte */" | > | ... also in packet-wsp.c, about line 1740 after this patch | > If the text string is { 'a', 'b', 'c', '\0' }, then val_len | will only return | > 3 while the value is indeed encoded as 4 bytes (val_len is | the actual length | > of the value as found in the packet). So we need to add | this zero byte to | > the count. | | That was understood, sorry for being unclear, I was | wondering where do | you use your computed val_len? Maybe I'm misunderstanding some macro | stuff or this statement is "just in case", but it seems to me val_len | isn't used after this if {}. This is some over-generalized piece of the code, which I already removed and it will be sent as a patch. Regards, Olivier
- Follow-Ups:
- Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- From: Loïc Minier
- Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Prev by Date: Re: [Ethereal-dev] How big is Ethereal?
- Next by Date: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Previous by thread: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Next by thread: Re: [Ethereal-dev] [PATCH][WSP]Decoding of Quoted-string
- Index(es):