Wireshark-bugs: [Wireshark-bugs] [Bug 9607] TFShark (Terminal FileShark)

Date: Wed, 01 Jan 2014 18:58:47 +0000

Comment # 12 on bug 9607 from
(In reply to comment #11)
> (In reply to comment #10)
> > I think
> > you're over-complicating things if I understand correctly. Wiretap's job is
> > to abstract all the different capture formats into a single API providing a
> > list of packets + extra stuff like name-resolution blocks. For tfshark we
> > have no "pre-processing" to do before we hand the data to the dissectors -
> > all we need is an API that abstracts the file into a series of bytes, which
> > is what fopen and friends already do.
> 
> I was certainly worried about starting to over-complicating things, but I
> thought one of Wiretap's jobs was to provide a "header" and an array of
> "records" (aka packets), so I thought Filetap was necessary to provide "a
> generic version" of that for non-capture files.

My understanding of our current packet architecture is: We start with a
file-handle, which gets associated with a capture_file struct. Wiretap turns
that into some metadata (ie capture device, timestamp, etc) and an array of
packet records. Each packet gets a frame_data, a wtap_pkthdr, a tvb, a pinfo,
etc that are used for dissection of that packet.

For non-capture files, they may not be record-based and we have much less
"generic" metadata (basically just the filename and whatever fstat gives us).
This means we can't reliably talk about frames or headers in any generic way -
all of that information must be dependent on the dissector, since it can be
totally different for every file type.

So for fileshark the simpler method is: We start with a file-handle, which gets
turned into a tvb. That tvb is passed to the dissector. All of the intermediate
layers of capture_file struct, wiretap, record headers etc aren't applicable.

> > In my mind, the simplest way forward
> > is first to implement epan/tvbuff_file.c which fopens a file and uses
> > seek/read to implement the TVB interface (I don't expect this to be too
> > hard, Jakub knows this interface best).
> 
> But file.c already provides seek/read functionality within a file itself,
> it's just uses a capture_file structure (which I don't really want).  This
> sits below wiretap (which is below tvb), and I wanted to somehow put filetap
> between file.c (functionality) and the tvb.

The file.c code, on a quick skim, appears entirely packet-oriented. I don't
know how much of it (if any) can be reused.

> > Then the normal "dissect" path in
> > tfshark becomes very simple:
> > create a tvb_file backed by the file we are
> > dissecting create dummy wtap_pkthdr and frame_data structs (probably just
> > memzeroed is fine)
> > pass all of the above to epan_dissect_run()
> > Wiretap/filetap/etc never need to get involved.
> 
> > Does this make sense?
> 
> I can see this is a simpler design, but I thought we wanted to leverage
> frame_data structs as a "generic record/frame" and discard (preferably
> remove) wtap_pkthdr in favor of a "generic header" (data blob) within
> fileshark.  Are you seeing my design as more of a "long term solution" or do
> you think it will always be "too complicated"?

I'm not sure what you mean here. Epan is fundamentally record-based, and files
are not, so we can fake it by wrapping the entire file in a single fake
"record". If I were writing fileshark from scratch I wouldn't use
records/frames at all.

---

These architectural questions are why I was hesitant earlier to just start
dumping file dissectors into Wireshark without some sort of separation (sorry
Michal Labedzki if I explained that poorly at the time); the requirements are
very similar, but also very different and not always compatible.

---

Taking a step back for a slightly bigger picture, we have two possible general
approaches:

- On one approach we leave as much as possible of wireshark the way it is and
put some hacks (dummy header values, etc) in fileshark to make it work (ie
pretending that the file is one big "packet"). Fileshark becomes a second-class
citizen in one sense, but it's the least amount of work for a functional
program. The question is then if this is really "good enough" or if we'll
eventually hit incompatible feature requirements and be unable to hack our way
around them.

- On the other approach we extract all the components of epan and/or wiretap
that are also useful to file dissection into a separate library (epan-base?)
with a totally file/packet-agnostic API. epan then makes use of epan-base to
provide its current packet-specific API, and we write a small "efan" library to
provide a file-specific API on top of epan-base.

On either approach I don't see that wiretap's abstractions are useful to
fileshark - on the first approach we'll have to fake some wiretap structs for
compat with epan's current API; on the second approach we'll have a totally
wiretap-free fileshark.

The second approach is obviously architecturally cleaner, but it's a lot more
work and a lot more disruptive, meaning it needs strong buy-in from the rest of
the devs. The first approach gets us to a working program without affecting
Wireshark-proper too much; if it doesn't pick up then it can be killed or
3rd-partied without too much effort. If it does pick up then hopefully we will
have manpower to gradually migrate to the second model.


You are receiving this mail because:
  • You are watching all bug changes.