Wireshark-dev: Re: [Wireshark-dev] Protocol Parser Compiler
From: "Luis EG Ontanon" <luis.ontanon@xxxxxxxxx>
Date: Wed, 24 Oct 2007 16:05:01 +0200
IMHO BNF or the alike is not the way to go! BNF parser generators have few isues that make them unfit for protocol dissectors the way we do them. I've started to write an ABNF-based LR dissector generator but found many things that would make it unfit. take the following BNF: a ::= b c. b := b b. b ::= B. c ::= C. let's say we got a packet containing BBBC (a mechanism, besides the BNF, to define terminal symbols is needed). The code for the reduction of "B -> b", "b b -> b" and "C -> c" will be evaluated before the code for reducing "b c -> a" is triggered. That means that we'll have a call sequence like this: B -> b B -> b b b ->b B -> b b b -> b C -> c b c -> a a -> $ If we want to create a dissection tree from this call sequence we would need to call reversed. The code for the reduction of the start symbol (that should create the root of our tree) should be called first but an LR parser is going to call it at last. We would have to evaluate the entire message (hoping that is complete or else we will not be able to reduce the start symbol) creating interim containers before being able to add anything to the tree, which is cumbersome. This phenomenon shows up in the XML dissector (which is based on a bad idea I had similar to that of a BNF generated parser) where in order to avoid not being able to reduce the start symbol in case the message is truncated, I wrote many grammars for many different elements instead of a single grammar for the entire XML message and manage the entire parsing with a separate stack of described. Not only in order to be able to create the subtree before its children the parser first creates a tree on its own then it does some callbacks before pushing the subtrees and some others later on after popping making the code unintelligible. It does not even do the whole thing via the grammar! For generating dissectors for arbitrary protocols I would be looking more into something more similar to lex than to yacc. That is: a cursor based tool with an FSM. That means no not generating code from a context free grammar (like BNF) but looking into a contextual parser. <UDP> { <START> src_pt = UINT(2,src.port) -> GET_DST. <GET_DST> dst_pt = UINT(2,"dst.port") -> GET_LEN. <GET_LEN> data_lenght = UINT(2,"len") -> GET_CHK. <GET_CHK> UINT(2,"checksum") -> DATA. <DATA> DISSECT_TABLE(,"udp.port",src_pt,data_len) || DISSECT_TABLE(,"udp.port",dst_pt,data_len) || CALL_DISSECTOR("data",data_len). } This would allow to create the tree from the root (as we do) instead of building it from the leafs and would allow also to parse truncated messages which at least for me should be a a requirement for dissectors. Luis On 10/23/07, Andrew Feren <acferen@xxxxxxxxx> wrote: > > --- Guy Harris <guy@xxxxxxxxxxxx> wrote: > > > Graham Bloice wrote: > > > Might be interesting for some: > > > > > > binpac: A yacc for Writing Application Protocol Parsers > > > http://lambda-the-ultimate.org/node/2496 > > > > Sebastien Tandel mentioned that back in May - I didn't get around to > > replying back then; thanks for reminding me of this and getting me to > > reply. Apologies to Sebastien for not replying then.... > > > > Yes, something such as this would, I suspect, be a Very Good Thing. > > [ snip ] > > I'm looking at binpac for other reasons, but would be interested in using it > to generate Wireshark dissectors too. > > I do, however, have one question before I head too far down this path. How > do people feel about introducing C++ to Wireshark? I ask because binpac > currently generates C++ code. > > I can use binpac as it stands to generate dissectors, but adding a C backend > to binpac is out of scope for me at this time. > > -Andrew > > > -Andrew Feren > acferen@xxxxxxxxx > _______________________________________________ > Wireshark-dev mailing list > Wireshark-dev@xxxxxxxxxxxxx > http://www.wireshark.org/mailman/listinfo/wireshark-dev > -- This information is top security. When you have read it, destroy yourself. -- Marshall McLuhan
- References:
- Re: [Wireshark-dev] Protocol Parser Compiler
- From: Guy Harris
- Re: [Wireshark-dev] Protocol Parser Compiler
- From: Andrew Feren
- Re: [Wireshark-dev] Protocol Parser Compiler
- Prev by Date: Re: [Wireshark-dev] epan/.libs/libwireshark.so.0.0.1 is not in scope of make ?
- Next by Date: [Wireshark-dev] Wireshark Crashing
- Previous by thread: Re: [Wireshark-dev] Protocol Parser Compiler
- Next by thread: [Wireshark-dev] QSig link type
- Index(es):