Wireshark-dev: Re: [Wireshark-dev] Fixing #12958 (Duplicated keys in -T json output)

From: Daan De Meyer <daan.j.demeyer@xxxxxxxxx>
Date: Tue, 13 Jun 2017 10:10:15 +0000
I've submitted a patch which refactors the JSON output functions in order to support grouping multiple nodes in a json array. Before being printed a node's children are grouped using a grouping function. If multiple children end up in the same group they are printed as a json array in the output.

Right now the grouping function puts every child in a separate group so as to not change the current json output. Removing duplicate keys in the output is as simple as changing the grouping function to a function that groups children based on their json key. More complex grouping functions could also be added in the future.

The patch can be found here: https://code.wireshark.org/review/#/c/22064/ . I've tested the changes by diffing json output from this commit against json output from the current master branch. The output is exactly the same for multiple traces with multiple combinations of options enabled (-x, -j, -T jsonraw).

Is creating the change on the code review site all I need to do or is some other step required before the patch can get reviewed?

Regards,

Daan


On Wed, 7 Jun 2017 at 21:32 Daan De Meyer <daan.j.demeyer@xxxxxxxxx> wrote:
Hello,

Right now to use the tshark -T json output in a project I have to use a streaming json parser in order to avoid values of duplicated keys being overwritten. Using a standard json parser like _javascript_'s JSON.parse() results in only the last value of the duplicated key being available in the resulting json. This is not ideal and I'd like to fix this bug so I can use JSON.parse() instead of a streaming json parser to read tshark's json output.

The way I work around the problem at the moment is by intercepting each duplicated key/value before it gets overwritten and storing the value next to the duplicated key values as an array with the same key with the "_array" suffix.

I'd solve the problem in wireshark in a similar way. A duplicate key in the current output would only be written once (in the object) and its value would be a json array containing all different values for the key. A simple suffix like "_array" or "s" could be added to the key in order to clearly indicate the key has mulitple values.

My current workaround with a streaming json parser does the same thing and this has worked for the ip, tcp, http and http/2 tshark json output. However, I don't know if there are other protocols where this approach would not work.

Would this be a good solution for the problem or am I missing something?

Regards,

Daan