Ethereal-dev: Re: [Ethereal-dev] extracting http

Note: This archive is from the project's previous web site, ethereal.com. This list is no longer active.

From: Guy Harris <gharris@xxxxxxxxx>
Date: Thu, 19 May 2005 02:24:18 -0700
F Lace wrote:

I am trying to extract the http content following the example of
udpdump.c available in WpdPack_3_01_a.

This is really more of a tcpdump-workers or winpcap-users question than
an Ethereal developer's question - Ethereal already has code to extract
HTTP content.

I tried the following, and
pkt_data doesnt seem to contain anything. Any pointers in the right
direction will be helpful. Thanks.

// 20 bytes TCP Header
typedef struct tcp_header {
 u_short sport; // Source port
 u_short dport; // Destination port
 u_int seqnum; // Sequence Number
 u_int acknum; // Acknowledgement number
 u_char hlen; // Header length
 u_char flags; // packet flags
 u_short win; // Window size
 u_short crc; // Header Checksum
 u_short urgptr; // Urgent pointer...still don't know what this is...

See RFC 793 to find out what it is.

}tcp_header;


    /* retireve the position of the ip header */
    ih = (ip_header *) (pkt_data +
        14); //length of ethernet header

If "pkt_data doesn't seem to contain anything", that means to me that
the raw packet data doesn't seem to contain anything - you don't use
"pkt_data" after that, so presumably you don't even have an IP header or
 TCP header or even Ethernet header.

Or do you mean...

    /* retireve the position of the tcp header */
    ip_len = (ih->ver_ihl & 0xf) * 4;
    th = (tcp_header *) ((u_char*)ih + ip_len);

    /* convert from network byte order to host byte order */
    sport = ntohs( th->sport );
    dport = ntohs( th->dport );

    http_data = (char *) (th + (int)th->hlen + 8*2*5);
    ZeroMemory (httpstr, sizeof(httpstr));
    snprintf (httpstr, sizeof(httpstr)-1, "src: %d, dst: %d, th->hlen:
%d\n, %s\n", sport, dport, (int)th->hlen, http_data);
    httpstr[255] = 0;
    fprintf  (http_fp, "%s\n", httpstr);

that *http_data* doesn't seem to contain anything?

If so, note that

	http_data = (char *) (th + (int)th->hlen + 8*2*5);

is wrong - the units of the "header length" field are 4-byte words, and
"th" is presumably a pointer to a "tcp_header" data structure (so that
adding N to it advances it by N such data structures, i.e. N*20 bytes),
so you want

	http_data = (char *) ((u_char *)th + (int)th->hlen*4);

which will take you past the TCP header to the first byte of the HTTP
header.  If you want only the HTTP *payload*, you'll have to scan
through the HTTP header until you either run out of data or find the
blank line separating the HTTP header from the payload.

By the way, there's no guarantee that there's a null byte at the end of
the HTTP data, so printing it with "%s" could start trying to print
random data past the end of the HTTP data - you're using "snprintf()",
so it'll eventually stop, but you should probably calculate the length
of the HTTP data (based on the total length field from the IP header and
the lengths of the IP and TCP headers) and use "%.*s" to print the HTTP
data, with the length of the HTTP data given as the argument for the
".*" part of the format.  (Of course, there's no guarantee that the HTTP
payload is text - what if it's fetching a GIF or JPEG, for example? - so
you'd probably either want to print only the headers, meaning you'll
have to scan the headers yourself, or print each character of the HTTP
data separately, checking whether the characters are printable.)