Ethereal-dev: Re: [Ethereal-dev] [Patch] Add 2 more media types to XML dissector

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

Date: Sat, 4 Mar 2006 05:42:10 +0100
On 3/3/06, Mike Duigou <ethereal@xxxxxxxxxx> wrote:
> This sounds really interesting.
I needed it once, it wasn't there... It was an interesting challenge,
(I realy enjoyed writing it). I never realy enjoyed it as an user
myself.

> I took a very small look at the xml
> dissector but still wasn't quite sure how this feature is supposed to
> work.
That, I agree is complex code. Turned out to be more challenging than
I thought when I started (the second time).

There's the dtd parser complex made of a preparser to resolve entities
and a parser.

> Can you point me in the directions of any extra documentation for
> the XML dissector and any of the more non-obvious features? I promise
> I'll turn whatever I receive/find/learn into a wiki page!  :-)

Don't forget to redirect the target wiki page of ethereal's xml
protocol (what the user gets if there's no DTD) to the page you write.
:-)

I'll try to explain the machine in the simplest way and then wait for
further questions.

There's a directory in the ethereal data dir called dtds that contains
DTDs (Take a look at what's in there).
All files ending in ".dtd" will be processed.

Been interested just in names almost everithing that "looks like" a
DTD is ok, you do not need the real one. It would be neat if someone
writes a tap listener (in Lua) or a script  that automagically
generates/updates the dtds from what is found in the dissected xml.

The minimum file containins just the <? ethereal:protocol ?>(1) XMLPI.

Only the following DTD tags are (partially) implememented and must be
after the <?ethereal:protocol?> (1) xmlpi tag
    <!ENTITY>(2)
   <! DOCTYPE>(3)
   <!ELEMENT>(4)
   <!ATTLIST>(5)

There are more complex DTD tags that allow "parameterized types" or
"templates" that I won't implement(6).

<!-- the example dtd -->
<?ethereal:protocol protocol_name="this" media_type="application/this"
hierarchy="yes" ?>
<!DOCTYPE this [
   <!ELEMENT that (other|another|#PCDATA) >
   <!-- #PCDATA is assumed to be there even it isn't -->

   <!ATTLIST that one CDATA #REQUIRED two CDATA #IMPLIED one CDATA #REQUIRED>
   <!-- we don't care of #REQUIRED and other #THINGS  -->
   <!ELEMENT other (#PCDATA) >
   <!ELEMENT another (#PCDATA) >
]>

this creates the following filter fields

this
this.that
this.that.one
this.that.two
this.that.other
this.that.another

given the xml:

<this>aaa<that one="bbb">ccc<other>ddd</other>eee</that>fff</this>

all these filter epressions match:

this == "aaa"
this == "fff"
this.that == "ccc"
this.that.one == "bbb"
this.that.other == "ddd"
this.that.other != "<other>ddd</other>"


(1) dtd_parse.l:68 The <?ethereal:protocol?> tag is used to tell
ethereal how to deal with the rest:
it takes some parameters.(7)
 * proto_name="this" -- the name to be used as the root of the
namespace(*), the protocol. It SHOULD be there.
 * root="that" -- effect: all field names from there will be "this.that"
 * hierarchy="yes|no" defaults to no (this this.that this.that.one
this.other)(8)
 * media="application/this_or_that" -- add this as the DTD to use for
the given media type.
 * description="application/this_or_that" -- add this as the handler
of the given media type.

(2) ENTITYs of both types (% and no %) are resolved in the preparser
so there is no information whatsoever about entities while dissecting.
It does not support file inclussion, it has to be made "by hand".

(3)  DOCTYPE supports only the form in the example all others will
cause a syntax error.

(4) the first ELEMENT will be the default root if nothing else is
found. if it's name is different from protocol_name the name will be
protoname.elemname

(5)  ARGLIST parameters will be ignored we just care about the name

(6)  If someone wants to implement "templates" the place is either
dtd_preparse.l or dtd_parse.l or both.

(7) There's an issue with conflicting protocol names:   The DOCTYPE
name or the root name(5) cannot be that of an ethereal protocol or
else ethereal will abort during startup.

--
This information is top security. When you have read it, destroy yourself.
-- Marshall McLuhan