Wireshark-dev: Re: [Wireshark-dev] display filter scanner.l possible weirdness

From: João Valverde <j@xxxxxx>
Date: Tue, 23 Aug 2022 10:30:01 +0100

On 8/22/22 14:42, Richard Sharpe wrote:
Hi folks,

In trying to introduce my contexts approach for display filters to
handle embedded/recursive structures in 802.11 Information Elements
(TLVs) I came across this in epan/dfilter/scanner.l:

-----------------------------
-               ([.][-+[:alnum:]_:]+)+[.]{0,2} |
-[-+[:alnum:]_:]+([.][-+[:alnum:]_:]+)*[.]{0,2} {
+              ([.][-+[:alnum:]_]+)+[.]{0,2} |
+[-+[:alnum:]_]+([.][-+[:alnum:]_]+)*[.]{0,2} {
------------------------------

Basically, the original scanner allowed solons (:) in field names. I
had to change that since I needed to parse out colons separately in
the grammar. It almost looks like someone made a mistake and assumed
they needed ':]' in contexts where that was not necessary.

I do not believe anyone uses colons in filter strings and did not
think it was possible.

Does anyone think this will be a problem?
It is a problem because that regex also has to match things other than 
fields. Bytes, MAC addresses, IPv6 addresses, those use colons.
Is there a reason why you are not developing this on the master branch? 
That is odd.
And I urge you to come up with a design first that can garner some 
support. Maybe you could explain what "protocol contexts" are. I fail to 
see what makes contexts recursive.
Also, are there automated tests for the dfilter stuff? I have been
using dftest to test my changes but it would be good to see if I have
disturbed anything.

Read README.test and try running "pytest -k dfilter" in the build directory.