Wireshark-bugs: [Wireshark-bugs] [Bug 7863] New: Have editcap use modulo when calculating filenu

Date: Sun, 14 Oct 2012 22:13:40 -0700 (PDT)
https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=7863

           Summary: Have editcap use modulo when calculating filenum
                    component of fileset name
           Product: Wireshark
           Version: 1.9.x (Experimental)
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Minor
          Priority: Low
         Component: Extras
        AssignedTo: bugzilla-admin@xxxxxxxxxxxxx
        ReportedBy: jyoung@xxxxxxx


Created attachment 9352
  --> https://bugs.wireshark.org/bugzilla/attachment.cgi?id=9352
Patch to use modulo in editcap's fileset_get_filename_by_pattern()

Build Information:
bash-3.2$ wireshark -v
wireshark 1.9.0-SVN-45281 (SVN Rev 45281 from /trunk)

Copyright 1998-2012 Gerald Combs <gerald@xxxxxxxxxxxxx> and contributors.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Compiled (64-bit) with GTK+ 2.24.10, with Cairo 1.8.6, with Pango 1.30.0, with
GLib 2.32.3, with libpcap, with libz 1.2.3, without POSIX capabilities, without
libnl, with SMI 0.4.8, without c-ares, without ADNS, with Lua 5.1, without
Python, with GnuTLS 2.12.19, with Gcrypt 1.5.0, with MIT Kerberos, with GeoIP,
with PortAudio V19-devel (built Aug 12 2012 22:27:54), with AirPcap.

Running on Mac OS X 10.6.8, build 10K549 (Darwin 10.8.0), with locale .UTF-8,
with libpcap version 1.0.0, with libz 1.2.3, GnuTLS 2.12.19, Gcrypt 1.5.0,
without AirPcap.

Built using gcc 4.2.1 (Apple Inc. build 5666) (dot 3).
bash-3.2$ 

--
When using editcap's -c or -a option to split a trace file into multiple files,
editcap is currently using the 5 most significant digits of a file number
counter to generate the filenum component of the fileset name.

When editcap generates more than 100000 output files, the five digit filenum
component of the fileset name will rollover from 99999 to 10000 instead of
expected 00000.  The next 9 trace files will also use filenum value of 10000,
then next 10 will use 10001, etc.  

When the same filenum component is used several times in a row and when
multiple fileset output files are generated within the same one second period
then editcap will use the exact same fileset name for multiple consecutive
output files resulting in loss of data.

The attached patch uses the same modulo value defined in ringbuffer.h and used
within ringbuffer.c to limit editcap's filenum component to use the 5 least
significant digits.  With patch applied, editcap will behave like dumpcap and
will rollover from 99999 to 00000.  An additional benefit of this patch is that
editcap can produce up to 100000 unique fileset names a second without
collisions.

Some additional background:

When editcap's -c or the -i options are used, a tracefile is potentially split
into a multiple output files using Wireshark's fileset naming convention:

  [OPTIONALPREFIX]_NNNNN_YYYYMMDDhhmmss[.OPTIONALSUFFIX].

The NNNNN is a zero prefixed filenum value that starts with 00000 and normally
increments by one with each new output file.

The YYYYMMDDhhmmss is a date/time stamp component that is regenerated at the
moment that the new fileset name is created. 

The fileset name algorithm used by editcap is basically the same one used by
dumpcap's -a and -b options with one exception.  Dumpcap creates its first file
fileset name with the filenum component of 00001 whereas editcap starts at
00000.  

Future work:

The current patch takes a conservative approach and only modifies editcap.c's
fileset_get_filename_by_pattern() to be consistent with ringbuffer.c's
ringbuf_open_file().

Some refactoring of fileset code could perhaps be considered.  Currently
fileset code is implemented in at least three files: ringbuffer.c, fileset.c
and editcap.c.

In addition to refactoring, an alternate fileset name might be considered that
swaps the filenum and date/time stamp components to use the following:

  [OPTIONALPREFIX]_YYYYMMDDhhmmss_NNNNN[.OPTIONALSUFFIX].

In this alternate format, the NNNNN value instead of monotonically incrementing
with each new output file, would be reset back to 00001 whenever the date/time
component is different from the previous fileset name.  Or more likely, the
NNNNN component could simply represent the fractional part of the second when
the fileset name was generated.

This alternate fileset name would in theory always sort in chronological order.

-- 
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.