Wireshark-dev: Re: [Wireshark-dev] Regenerating packet-parlay.c
From: Jaap Keuter <jaap.keuter@xxxxxxxxx>
Date: Mon, 4 May 2020 14:46:58 +0200
On 5/4/20 11:16 AM, Luke Mewburn wrote: > On 20-05-04 10:55, Jaap Keuter wrote: > | Hi Luke, > | > | > Yes, I regenerated the code using that patch to > | > tools/wireshark_gen.py, and it builds fine across a couple of > | > platforms. > | > | Yes, I could build it too, so the generated code itself was okay. My > | quest is to regenerate the _same_ source files. > > I don't think that's going to be possible to regenerate the same as the > currently committed file, because it was generated with wireshark_gen.py > whose implementation that did not guarantee reproducible output. > > I think that unless you have access to the exact system setup that > was used to generate packet-parlay.c that you won't be able reproduce > the files as-is with the existing tool anyway. > Good point. Also this drives home the point that generated code needs to be reproducible. > > | > I wrote a simple wrapper tool idl-regen (attached) to assist each > | > regen from the idl, used with something like: > | > $ ./idl-regen ../epan/dissectors/ > | > mmccs.idl:120: Warning: Identifier 'EventType' clashes with CORBA 3 keyword 'eventtype' > | > policy_data.idl:119: Warning: Anonymous sequences for recursive unions are deprecated. Use a forward declaration instead. > | > policy_data.idl:123: Warning: Anonymous sequences for recursive unions are deprecated. Use a forward declaration instead. > | > > | > I get the same .c files generated from either: > | > - CentOS 7, python 2.7.5, omniidl 4.2.0 > | > - Fedora 31, python 3.7.6, omniidl 4.2.3 > | > > | > | So the code generation is identical between Python 2 and 3, that's a > | good sign. > > Yes. > > > | > Three files get changes from git master: - packet-gias.c - > | > packet-parlay.c - packet-tango.c > | > > | > The generated code for packet-gias.c and packet-tango.c differ to > | > git master because of the following fix to idl2wrs that doesn't > | > appear to have been rerun to regenerate those files: commit > | > 443df93896 Author: Yannik Enss <Yannik.Enss@xxxxxxxxxxxxxxxxx> > | > Date: 2019-05-29 14:20:42 +0200 > | > > | > idl2wrs: fix 'undeclared identifier' error > | > > | > the 'x_octetx' variables were removed a few years back, replace > | > them with get_CDR_xxx() > | > I8cf3410d8a152c834e7019f7d1d80de3798530c3 > | > > | > | Good find. I've applied the anti-patch for this and reran code > | generation. Now the same source files come out (with the exception > | below). I've also build with these files without issue, so can > | anyone tell me what this commit achieves? It references something 'a > | few years back' which doesn't help either. If nothing I'm inclined > | to apply this anti-patch. > | > | Gerald, you seem to have 'rubber stamped' this patch series about a > | year ago. I know it's much to ask, but I can't find anything on > | this, no bug report, no wireshark-dev posts, can you tell what this > | was for and why the repository code is not the same as the code > | generated? > > I actually think the change is probably fine to leave, but I'll defer > to others. > True, but than it becomes a solution searching for a problem. I'd rather understand the problem, or find there's no problem at all, so the solution can be dropped. > > | > packet-parlay.c gets that above change as well as many other > | > changes to generated code for /* User exception filters */, > | > mostly in the ordering of the declarations and the list of > | > array entries added to proto_register_giop_parlay() variable > | > static hf_register_info hf[] > | > > | > | Yes, this was the trigger for my initial question. Why is that ordering > | different than what's committed in the repo. My position is that > | generated files must always be possible to be regenerated. > > I completely agree that generated files must be able to be reliably > regenerated from the same inputs and tools without change. > > As we both know, the issue is that the existing tool did not guarantee > reproducible output, due to the use of a python dict which did not > guarantee a particular ordering before python 3.6. > > > > | > As far as I can tell, the new packet-parlay.c has the same > | > number of lines, just in different order. > | > | That was my impression too, but there are just too many to check by hand. > | > | > | > and compared the regenerated file with that, they both still have same > | > size and number of lines, and if I sort the lines in both the files are > | > identical. (Of course, that doesn't compile... :) > | > > | > | Oh, that is indeed a nice trick to see if at least the same lines are in the > | generated code files. I'm going to use that. > > It wasn't super robust, but it was just part of a broader group of > tests I ran. > Of course, it's partial, but significant. > > | > I tried doing some comparison of the generated object files but > | > they differed too much, probably because of the reordering of the > | > source lines, even though the hf[] array ends up with the same > | > content in different order. > | > > | > | Okay, so your statement is that we just have to accept that the > | ordering of the the currently generated source file is going to be > | different from the committed source file. Do you think we still > | should introduce the ordering statements? > > I think we should use the OrderedDict change I proposed, which means > that the ordering will be in order the data structure is populated. > This matches all of the other data structures in wireshark_gen.py > and wireshark_be.py (lists). > > I don't think we need to change the code to sort all the different > identifiers and generated code, however. That will result in a bigger > change to all the source files generated from IDL. > Agreed, so your improvements go in, as far as I'm concerned. > > | > I don't know how much more verification is required, however. It > | > looks like in the past there have been changes to wireshark_gen.py > | > and then regen of packet-parlay.c that reordered a lot of the > | > content. For example, see commit c99bee9b5d on 2019-06-05. > | > > | > | Well, for me now reproducibility is the main goal. With your 'line > | sorting trick' we can at least confidently state that the actual > | items are not changed, and have to assume that the code generation > | is solid enough to create equivalent code. I'm okay with that. > > There may be some more testing that could be done with processing the > output of objdump, but I'm not sure it's worth it. > > As mentioned previously, I suspect that this level of verification > of regenerated files has not been applied to the previously > committed versions of these regenerated files based on reviewing > their commit history. > > > Another other test, of course, is to process some parlay captures with > tshark with a full decode, and compare the output :) > > Sure thing. I'll see what I can do. thanks, Jaap
- References:
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Luke Mewburn
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Jaap Keuter
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Luke Mewburn
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Jaap Keuter
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Luke Mewburn
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Jaap Keuter
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- From: Luke Mewburn
- Re: [Wireshark-dev] Regenerating packet-parlay.c
- Prev by Date: Re: [Wireshark-dev] Regenerating packet-parlay.c
- Next by Date: [Wireshark-dev] Proposed changes to make tcp.ack and tcp.seq relative
- Previous by thread: Re: [Wireshark-dev] Regenerating packet-parlay.c
- Next by thread: [Wireshark-dev] Building under Ubuntu on WSL
- Index(es):