DRZOIDBERG@xxxxxxxx wrote:
Actually I'm writing an application which tries to retrieve all the HTTP
packets transported into TCP packets. This TCP packets are often
fragmented, forming different segments associated to the whole TCP
packet. I think I can't use a filter to directly obtain the HTTP
packets, thus, I have to reassemble TCP segments by myself. First of
all, I notice that TCP segments should be reassembled using its sequence
numbers, which link up the segments. I also know how works this kind of
fragmentation, where the next sequence number expected is obtained
adding the payload length of the segment to the current sequence number.
Now, I have some doubts:
* I don't know how to know what TCP segment is the first one of the
chain of segments which forms a packet.
For HTTP:
the very first data segments, in each direction, of the entire TCP
connection are the first segments in an HTTP request or response;
you then process, according to the HTTP 1.1 specification, the request
or response, until you get to the end;
the next byte in the flow is the first byte of the next request or
response.
Note that I said "next byte in the flow" - there is *NO* guarantee that
an HTTP request or response begins at the beginning of a TCP segment.
TCP doesn't supply to protocols running atop it any notion of "packets";
it supplies a notion of a sequenced byte stream, and it's entirely the
responsibility of the protocol running atop TCP to divide the data
stream into packets.
I.e., there's no such thing as a "TCP packet"; there are only TCP segments.
* I don't know what to decide if a TCP segment is the last one of a
sequence of segments. I notice that an acknowledgement for the last
segment would be used for this purpose, but If I haven't got this
acknowledgment, I don't know when the TCP packet finishes.
As I said, there's no such thing as a "TCP packet"; you can't tell,
purely from using information in a TCP header, where HTTP requests start
or end. Acknowledgements are *NOT* used to indicate whether a TCP
segment is the start or end of a packet for a protocol running atop TCP;
they're used *solely* by TCP to indicate that it has received a segment.
In fact, there's no guarantee that an HTTP request starts at the
beginning of a TCP segment or ends at the end of a TCP segment.
You will have to process the HTTP requests or responses yourself to see
where they end.