Wireshark-dev: Re: [Wireshark-dev] RTP player - a suggestion

From: "Peter Budny" <pbudny@xxxxxxxxxx>
Date: Mon, 27 Mar 2017 11:40:17 -0400
Hi Erik and Jirka,

Some comments for both of you (makes sense to reply to both together).

Firstly, Erik, did you see my comments about the RTP player from a few
months ago?
(https://www.wireshark.org/lists/wireshark-dev/201612/msg00081.html)
I didn't submit any of it as bug reports, but I could if that would
help.

>> My proposal:
>> Add 'mixer' layer which will provide the following features to
>> improve usability:

Your idea sounds good overall, but before you dive into it, can I
first suggest that a lot of the current problems are due to the use of
QCustomPlot for graphing the audio?  It draws *very* slowly, and
that's true even after we downsample the audio just so the plot has
less data points to draw.

If I had the free time and the Qt knowledge, I'd love to see this use
of QCustomPlot replaced with a widget that behaves more like the old
GTK RTP player's display did, or better yet, more like an Audacity
track (with proper support for zooming and displaying the waveform
differently depending on the zoom level).  Since I don't think such a
widget exists yet, that would mean first making a new UI widget and
then rewriting Wireshark's Qt UI to use it. :-/

If that sounds like too much work, a better first step might be to
change the current Qt RTP player so that it creates one QCustomPlot
for each stream (like the GTK player did), rather than displaying
multiple streams overlapping.  It probably won't help or hurt
performance much, but changing the layout of the UI may give a clearer
picture of how a user would like to select or mix streams together, so
maybe that will affect how you think about the mixer layer.

>> I'm also guessing using just one audio output gives better
>> performance with lots of parallel streams.

To be sure, I *desperately* want to see the audio go back to the
left-right panning that the GTK player did, so that when listening to
multiple streams, you can hear which one is which because they come
from different positions.

Come to think of it, that may obviate the need for mixing in the first
place; if you have 3 channels, just tell the audio system you want to
play audio that has 3 channels (left, center, right) and provide the
data appropriately interleaved.  No mixing.

But I'm not sure how that would scale to more than 3 channels.  Then
again, how many people are listening to more than 3 streams
simultaneously anyway?  I can't think of many use cases for that.
Maybe a multicast stream with a bunch of participants, who may be
taking turns talking.  Maybe call transfers (but as you say, it'd be
nicer to handle those so that one stream "replaces" the other).
Nevertheless, I suppose we do have to handle that case somehow.  So I
guess we can't really avoid mixing.  But if there is *any* way to get
the audio system to do it for us, wouldn't that be preferable?

> I started work on "proof of concept" for very similar idea. I didn't
> finished it yet. But I have a few points which I would like to
> mention and discuss it there:
>
>3) When I analysed GTK and Qt code, I found that there is main
>difference between it. GTK stores all RTP data in memory, Qt extract
>it to file. Playing is based on extracted data then.

Not only does Qt extract it to a temporary file, it never deletes
those files.  That's rude.

I suspect the only reason it does this in the first place is because
that's just the API to the Speex resampler that someone chose to use
to downsample the audio.  And the only reason it downsamples in the
first place is because the QCustomPlot it uses to display the audio is
too slow.

Hence why I led by talking about the use of QCustomPlot.  I feel that
it's the root cause of a bunch of problems.  Instead of dealing with
these various symptoms, it would be better to address the root cause,
and these other questions (like how to downsample and store audio)
would go away.

There shouldn't be any need to downsample the audio at all.  If it is
downsampled, it should only be for *display*, so ideally it would be
the widget's job to downsample based on the zoom level.  No code
external to the widget ought to be downsampling anything.

There also shouldn't be any need for temporary files; audio is small
enough that it can fit in memory.
-- 
~Peter Budny