Wireshark-dev: Re: [Wireshark-dev] Why are authors never Cc'ed before their code is changed?

From: Pascal Quantin <pascal.quantin@xxxxxxxxx>
Date: Sun, 19 Aug 2012 15:56:18 +0200
2012/8/19 Harald Welte <laforge@xxxxxxxxxxxx>
Hi all,

I understand that different FOSS projects have different cultures,
norms, rules, etc.  However, my experience with wireshark it has reached
a point where I think a post like this is requierd.

I don't want to see this as some kind of flame, or to claim that the
wireshark development model is bad.  I just really don't understand it,
coming from a different background.  Please help me understand it.

In other large projects (let's say the Linux kernel), it is customary
that the original author of code (or a designated maintainer who has
taken over that particular module) is always Cc'ed before a change to
his code is being made.

In wireshark, it has happened repeatedly, that code contributed by
Osmocom developers (like the GMR dissector of Sylvain Munaut, or my
GSMTAP dissector) has been modified erroneously (and thus broken)
without any notice to us.

I find this at least disturbing (if not annoying) adn am wondering what
is the benefit of this practise to wireshark or its users?

It is generally a fair assumption to make that somebody who writes a
particular dissector actually cares about that code being functional,
and that he probably knows the respective protocol quite well.  In most
caess, I would expect that author to be able to review changes to his
code.

So why are those authors not Cc'ed in any kind of patches, or merge
requests to their code?  If you don't want to wait for their explicit
approval for efficiency reasons, you could at least notify them that
there was a change to their code that they should review.

The current situation ends up like this:

* People like me who just contribute particular dissectors have no time
  to go through all of the committlog or read all of the mailing list.

* Somebody else quietly makes a change, without discussing the change
  beforehand, without Cc'ing the proposed change to us

* A wireshark developer committs that patch, again without Cc'ing the
  original author

* Wireshark ends up being broken for the given protocol

* This is not discovered for a long time, until one of the few 'bleedign
  edge' users of those protocols discovers a problem and finds the time
  to report it, at which point we get the complaint about something not
  working and have to go back in history.

I would like to raise the following questions:

1) Is this the way how the wireshark development model / flow is
   supposed to work ?

2) If yes, do you really think that the gain in flexibilty caused by
    this anarchy overweighs the benefit of having dissector-authors give
    timely feedback to proposed changes, or prevent breakage?

3) Do you have any idea how to improve this situation?

4) How do other wireshark dissector authors deal with this problem?

Thanks in advance!

Regards,
        Harald

Hi Harald,

I can fully understand you concern and the warning you put in packet-gsmtap.* files (following several changes done without notice to Osmocom guys) has been a good step forward. As far as I know the GSMTAP compatibility has not been broken since and this notice prevented new breakage (see bug 7615 comments for example). Did you have any major issue since this warning was put in place (except the filter rename Sylvain raised)?

For me not all changes have the same potential of breakage: a change in an API used by an external tool can easily lead to a complete incompatibility (GSMTAP format unapproved changes for example) while a filter rename or a wrong subtree used to display a field (GMR-1 breakages seen lately) are "just" an annoyance (unless you have an external tool configuring filters by command line?).
I agree that this massive filter rename could / should have been discussed before being done, and for me things can still be changed as it is only done in the development tree (see my reply to Sylvain). That said, applying general rules (like filter names) to the whole code base once defined should have precedence over a single developer feelings.

I can see different type of commits:
- enhancement / fix for an existing dissector. This case is easier to handle and adding an original author in copy is a good idea when you have a single contributor to a given dissector (your use case). It can become much more complex when you have several authors across the years... If you check bugzilla, you will see that checking with an original author is sometimes done, depending on the knowledge of the reviewer for this given protocol and the impact of the change. This is not a golden rule though.
- general code cleaning spreading across several files. Usually it should not impact the dissector usability while complying with the coding style. Getting in touch with any contributor simply because a given Wireshark internal API gets deprecated and replaced by another one is not doable from my point of vue. When the changes are related to warnings removal (if I remember correctly it was the root cause of one of the two regressions seen with GMR-1 dissector) author could be contacted also. To avoid this kind of problems, an author can also check by himself the Clang / MSVC Code Analysis warnings for his own dissectors as they are freely available on the buildbot pages ;)

Wireshark model can probably be enhanced but calling it anarchy seems a bit excessive to me. This flexibility is also what makes Wireshark great with tons of new dissectors / features for each major release. Unfortunately it also implies a few annoyances / regressions that we must mitigate (for example we try to maintain API compatibilities across minor versions, but not between major versions). With more than 1000 dissectors and all the other components that are part of the code base, the complexity gets huge and making mistakes becomes easier. Using development versions is one way to ensure that things are still working for you. Moreover you can see bug reporters or dissector authors not responding to requests in Bugzilla simply because they moved to something else and do not care anymore, so we would need to define a timeout when trying to get an approval. Things get even worse when you start dealing with dissectors that were submitted several years ago (can go up to 10 years or more!).
We need to find a good tradeoff (not found yet) and we do not have any easy way to know who is still maintaining a given dissector (you cannot only rely on the header file only). On top of this, the core team is a few people doing this on their spare time (I know it should not be an excuse but it's a fact :) ).

Personally, before even joining the core team, I started following more closely the commits to check the changes done in the dissectors I use everyday (for most of them I contributed actively while not being the original author so your proposal would not have helped me) to ensure that nothing wrong was checked in. By that time I decided to check it by myself as I did not find any other way to be kept informed.

Having a RSS link (or something similar) to the subversion file log (http://anonsvn.wireshark.org/viewvc/trunk/epan/dissectors/packet-gsmtap.c?view=log for example) could be a way to easily monitor the changes done to a given file for whoever is interested. Of course it would not prevent the offending commit but it would allow a faster feedback / bugfix if needed (which could be sufficient from my point of vue). Would it be an acceptable first step towards improving the monitoring for changes by original authors / contributors? Is it feasible with ViewVC?


Regards,
Pascal.