Ethereal-users: Re: [Ethereal-users] HTTP Dissector & reassembler, tethereal, and mirroring a we

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Wed, 16 Feb 2005 17:49:31 -0800
Jon Passki wrote:

While doing off-line analysis of some HTTP traffic, I would like to
reconstruct the results back into a webpage.  I understand the GUI
has the TCP reassembly [1,2,3], plus the HTTP dissector understands
data such as JPEGs.

"Understands" in the sense that it can dissect the structure of a JPEG file; it doesn't "understand" it in the sense of being able to display the image. (Also, the HTTP dissector only "understands" that "image/jpeg" means that the entity body should be handed to the JPEG dissector - which it knows because the JPEG dissector has registered itself with a media type of "image/jpeg".

What I'd like to do is feed a pcap session
into tethereal, reconstruct an HTTP session, and have the HTTP
dissector magically spit out a web page.

To do this seems non-trivial to me, since there might be multiple
TCP sessions for one web page (e.g. a JPEG download).

By "Web page" do you mean "page displayed by a Web browser"? If so, then that's not really a concept that exists at the HTTP layer, and thus, it's not really something that the HTTP dissector should be doing.

A tap could perhaps be used to gather together various HTTP entities that could be considered the components of a Web page, but I'm not sure what it'd do with them after that. Is there some representation of a Web page, in that sense, as a single file? If not, what would the tap in question do with that the components of the page to "spit out a Web page"?

So, I'd
assume a state machine of some sort.  Example: the initial page had
some image src, so the state machine would check to see if there
were any HTTP requests for the link.  Then this has the added
difficulty that time would be the only thing to separate multiple
downloads of the same file (JPEG Session 1 was 10 seconds later,
JPEG Session 2 was 60 seconds later, JPEG Session 3 was 120 seconds
later - use JPEG Session 1).

So, does this functionality exist?

No.

If so, what did I miss in reading up on reassembly?

None of that has anything to do with "reassembly" in Ethereal's sense of the word. "Reassembly", in Ethereal's sense, refers to assembling the parts of a higher-level packet that are contained in multiple lower-level packets, e.g. reassembling fragments of a fragmented IP datagram, reassembling the parts of an HTTP request or reply split across multiple TCP segments, etc.. There's no notion of a "Web page" at the HTTP layer or any other protocol layer, so there's no notion of "reassembly" of a Web page at the protocol layer, so the existing reassembly code wouldn't help.