On Oct 9, 2009, at 7:43 AM, Jeff Morriss wrote:
One advantage of using memory mapped files instead of swap is that if
your OS is swapping, *everything* is slow. If only Wireshark is, er,
swapping, only Wireshark is slow.
That depends on the OS's policies for managing main memory - and on
any policy hints given to the OS by the application. If, for example,
when it searches for a page frame to use to satisfy a page fault, it
uses the same policy when servicing a page fault for a page backed by
a mapped file and when servicing a page fault for a page backed by
swap space (an "anonymous" page), the only advantage to memory mapping
would be
1) if the file is mapped into multiple process's address spaces (and
either read-only or not copy-on-write), those processes can share a
single page frame for a page from the file - but that's not the case
here, as I understand it;
2) if the data in anonymous pages is a copy of data from a file,
memory-mapping the file even in only one process means that you don't
even temporarily have two copies of the data in memory.
Using memory mapped files would probably help quite a bit with keeping
the UI responsive because only Wireshark's, for example, packet data
would be on disk but the executable pages and "core" memory like the
statistics could be kept in RAM (or at least whatever the OS gives
us).
As per my mail to Erlend, the frame data isn't kept in Wireshark's
address space, although reassembled data is (and frame_data structures
are, and some or all column text is).
However, if Wireshark reads a large capture file, on many OSes the
blocks of the file will be brought into the page pool (as, on many
OSes, the "buffer cache" is implemented atop the page pool, so pages
being read in with read()/ReadFile() compete for memory with pages
faulted in - it may even be that a read is done by mapping into the
kernel's address space the region of the file being read and copying
from that region into the userland buffer space, so that the actual
file system reads are done in response to page faults). *Hopefully*
the OS will recognize it as sequential access and, at least, not
completely blow the page cache if the file is big enough (although, if
you have enough memory that you *don't* blow the page cache, you might
as well keep the pages in memory; my menagerie of capture files I use
for Wireshark/tcpdump regression testing for some changes can fit
entirely in main memory on my machine, so if I run the tests twice in
a row, the disk hardly does anything).