Wireshark-dev: Re: [Wireshark-dev] Regenerating packet-parlay.c

From: Jaap Keuter <jaap.keuter@xxxxxxxxx>
Date: Mon, 4 May 2020 10:55:54 +0200
On 5/3/20 10:35 AM, Luke Mewburn wrote:
> On 20-05-01 13:46, Jaap Keuter wrote:
>   | On 5/1/20 12:02 PM, Luke Mewburn wrote:
>   | > On 20-05-01 07:34, Jaap Keuter wrote:
>   | >   | 
>   | >   | > On 1 May 2020, at 04:13, Luke Mewburn <luke@xxxxxxxxxxx>
>   | >   | > wrote: However, looking at the code some more, it appears
>   | >   | > that generally wireshark_gen.py generates code in the order
>   | >   | > the operations are defined; the exception (hah!) is the user
>   | >   | > exceptions.
>   | >   | > 
>   | >   | > If I instead add at the top import collections and change
>   | >   | > get_exceptionList() from ex_hash = {}  # holds a hash of
>   | >   | > unique exceptions.  to ex_hash = collections.OrderedDict()
>   | >   | > # holds a hash of unique exceptions.
>   | >   | > 
>   | >   | > This results in consistent generated code with both python
>   | >   | > 2.7 (CentOS 7) and python 3.7 (Fedora 31).
>   | >   | > 
>   | >   | > I've also fixed a whitespace issue in the generated code by
>   | >   | > indenting the break in
>   | >   | > template_helper_switch_msgtype_default_end, so that it
>   | >   | > matches the epan/dissectors code and other default
>   | >   | > statements.
>   | >   | > 
>   | >   | > 
>   | >   | > Here's a patch with my suggested fixes.  Or would you prefer
>   | >   | > a commit/pull request (etc)?
>   | >   | > 
>   | >   | > 
>   | >   | > regards, Luke.
>   | >   | > 
>   | >   | 
>   | >   | Hi Luke,
>   | >   | 
>   | >   | That’s great, I didn’t have the opportunity yet to dig into
>   | >   | this.  Nice that you compared Python 2.7 and 3.7 already.
>   | >   | I’ll pick this up and put it in with the other fixes I've
>   | >   | lined up, so you won’t have to push a change. I’ll credit you
>   | >   | in the commit :)
>   | >   | 
>   | >   | Thanks, Jaap
>   | 
>   | He Luke,
>   | 
>   | Are you able to recreate the parlay dissector currently in the
>   | repository with this. So far I haven't succeeded.
>   | 
>   | Thanks, Jaap
> 

Hi Luke,

> Yes, I regenerated the code using that patch to tools/wireshark_gen.py,
> and it builds fine across a couple of platforms.

Yes, I could build it too, so the generated code itself was okay. My quest is to
regenerate the _same_ source files.


> 
> (BTW: I needed a separate build fix on CentOS 7 for packet-dof.c
> and I've pushed a change to review for that)
> 

Yes, that was an unintended consequence of an include glib.h cleanup (of which
this is also part). You and Dario are on the case, so we'll work that out.


> 
> I wrote a simple wrapper tool idl-regen (attached) to assist each
> regen from the idl, used with something like:
> 	$ ./idl-regen ../epan/dissectors/
> 	mmccs.idl:120: Warning: Identifier 'EventType' clashes with CORBA 3 keyword 'eventtype'
> 	policy_data.idl:119: Warning: Anonymous sequences for recursive unions are deprecated. Use a forward declaration instead.
> 	policy_data.idl:123: Warning: Anonymous sequences for recursive unions are deprecated. Use a forward declaration instead.
> 
> I get the same .c files generated from either:
> - CentOS 7,  python 2.7.5, omniidl 4.2.0
> - Fedora 31, python 3.7.6, omniidl 4.2.3
> 

So the code generation is identical between Python 2 and 3, that's a good sign.


> 
> Three files get changes from git master:
> - packet-gias.c
> - packet-parlay.c
> - packet-tango.c
> 
> The generated code for packet-gias.c and packet-tango.c
> differ to git master because of the following fix to idl2wrs
> that doesn't appear to have been rerun to regenerate those files:
> 	commit 443df93896
> 	Author: Yannik Enss <Yannik.Enss@xxxxxxxxxxxxxxxxx>
> 	Date:   2019-05-29 14:20:42 +0200
> 
> 	idl2wrs: fix 'undeclared identifier' error
> 
> 	the 'x_octetx' variables were removed a few years back, replace them with get_CDR_xxx()
> 	I8cf3410d8a152c834e7019f7d1d80de3798530c3
> 

Good find. I've applied the anti-patch for this and reran code generation. Now
the same source files come out (with the exception below). I've also build with
these files without issue, so can anyone tell me what this commit achieves? It
references something 'a few years back' which doesn't help either. If nothing
I'm inclined to apply this anti-patch.

Gerald, you seem to have 'rubber stamped' this patch series about a year ago. I
know it's much to ask, but I can't find anything on this, no bug report, no
wireshark-dev posts, can you tell what this was for and why the repository code
is not the same as the code generated?


> 
> packet-parlay.c gets that above change as well as many other
> changes to generated code for /* User exception filters */,
> mostly in the ordering of the declarations and the list of
> array entries added to proto_register_giop_parlay() variable
> 	static hf_register_info hf[] 
> 

Yes, this was the trigger for my initial question. Why is that ordering
different than what's committed in the repo. My position is that generated files
must always be possible to be regenerated.


> 
> As far as I can tell, the new packet-parlay.c has the same
> number of lines, just in different order.

That was my impression too, but there are just too many to check by hand.


> I manually applied a change of:
> 	disc_s_XXX = (gint32) u_octet4;     /* save Enum Value discriminant and cast to gint32 */
> to
> 	disc_s_XXX = (gint32) get_CDR_ulong(tvb,offset,stream_is_big_endian, boundary);     /* save Enum Value  discriminant and cast to gint32 */

Which is a consequence of the commit you referenced above, correct.


> and compared the regenerated file with that, they both still have same
> size and number of lines, and if I sort the lines in both the files are
> identical. (Of course, that doesn't compile... :)
> 

Oh, that is indeed a nice trick to see if at least the same lines are in the
generated code files. I'm going to use that.


> 
> I tried doing some comparison of the generated object files but they
> differed too much, probably because of the reordering of the source
> lines, even though the hf[] array ends up with the same content in
> different order.
> 

Okay, so your statement is that we just have to accept that the ordering of the
the currently generated source file is going to be different from the committed
source file. Do you think we still should introduce the ordering statements?

> 
> I don't know how much more verification is required, however.
> It looks like in the past there have been changes to wireshark_gen.py
> and then regen of packet-parlay.c that reordered a lot of the content.
> For example, see commit c99bee9b5d on 2019-06-05.
> 

Well, for me now reproducibility is the main goal. With your 'line sorting
trick' we can at least confidently state that the actual items are not changed,
and have to assume that the code generation is solid enough to create equivalent
code. I'm okay with that.