Wireshark-dev: [Wireshark-dev] slow sip voip flow for large captures
From: Cristian Constantin <const.crist@xxxxxxxxxxxxxx>
Date: Fri, 3 Feb 2012 13:01:29 +0100
hi!
wireshark can draw call flows for sip voip calls
(accessible through the menu Telephony/VoIP Calls).
however, when the capture is large, containing tens of
thousands of sip voip calls, wireshark becomes very slow
at producing the list of calls and the call flows.
here are my experiences with a capture file of 500 MB containing
approx. 40K calls on a debian machine having:
4xAMD Phenom(tm) II X4 940 Processor
4GB RAM
with the svn stock it takes about 10 hours to do it; the
report of the profiling done with oprofile looks like this:
CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples % image name app name symbol name
8146520 55.2011 lt-wireshark lt-wireshark
append_to_frame_graph.isra.7
3698478 25.0610 libglib-2.0.so.0.2800.6 lt-wireshark g_list_last
1566933 10.6176 lt-wireshark lt-wireshark
SIPcalls_packet
1162377 7.8763 libc-2.13.so lt-wireshark
__strcmp_sse2
8734 0.0592 libwireshark.so.0.0.1 lt-wireshark
serv_name_lookup
8130 0.0551 libwireshark.so.0.0.1 lt-wireshark
dissect_sip_common
6504 0.0441 libwireshark.so.0.0.1 lt-wireshark format_text
5749 0.0390 libwireshark.so.0.0.1 lt-wireshark
tap_push_tapped_queue
4753 0.0322 libglib-2.0.so.0.2800.6 lt-wireshark
g_unichar_isprint
4627 0.0314 libglib-2.0.so.0.2800.6 lt-wireshark
g_hash_table_lookup
3699 0.0251 libwireshark.so.0.0.1 lt-wireshark guint8_pbrk
step 1
------
by changing SIPcalls_packet() to use a hash table for the lookup of the calls
(on sip call-id) the runtime decreases to an (estimated) 3-4 hours.
SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115%
(1000 times faster)
CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples % image name symbol name
411305 69.4523 lt-wireshark append_to_frame_graph.isra.7
171549 28.9675 libglib-2.0.so.0.2800.6 g_list_last
475 0.0802 libwireshark.so.0.0.1 serv_name_lookup
439 0.0741 libwireshark.so.0.0.1 dissect_sip_common
375 0.0633 libwireshark.so.0.0.1 format_text
360 0.0608 libwireshark.so.0.0.1 tap_push_tapped_queue
303 0.0512 libglib-2.0.so.0.2800.6 g_hash_table_lookup
283 0.0478 libglib-2.0.so.0.2800.6 g_unichar_isprint
202 0.0341 libwireshark.so.0.0.1 guint8_pbrk
172 0.0290 libwireshark.so.0.0.1 proto_tree_add_item
149 0.0252 libwireshark.so.0.0.1 compute_offset_length
143 0.0241 libc-2.13.so __memcpy_sse2
141 0.0238 libwireshark.so.0.0.1 call_dissector_work
141 0.0238 libwireshark.so.0.0.1 emem_free_all
115 0.0194 libc-2.13.so __strncmp_sse2
112 0.0189 libc-2.13.so vfprintf
105 0.0177 libwireshark.so.0.0.1 proto_item_add_subtree
104 0.0176 libglib-2.0.so.0.2800.6 g_slice_free_chain_with_offset
99 0.0167 libwireshark.so.0.0.1 dissect_sdp
91 0.0154 libc-2.13.so free
91 0.0154 libwireshark.so.0.0.1 dissect_ip
90 0.0152 libwireshark.so.0.0.1 proto_field_is_referenced
90 0.0152 libwireshark.so.0.0.1 tvb_length_remaining
90 0.0152 lt-wireshark
process_specified_packets.constprop.10
89 0.0150 libglib-2.0.so.0.2800.6 g_slice_free1
89 0.0150 libwiretap.so.0.0.1 file_seek
88 0.0149 libwireshark.so.0.0.1 dissect
86 0.0145 libwireshark.so.0.0.1 dissect_frame
82 0.0138 libwireshark.so.0.0.1 proto_is_protocol_enabled
79 0.0133 libc-2.13.so __strcmp_sse2
79 0.0133 libc-2.13.so _int_malloc
78 0.0132 libglib-2.0.so.0.2800.6 g_slice_alloc
75 0.0127 libpthread-2.13.so pthread_getspecific
73 0.0123 libwireshark.so.0.0.1 tvb_find_guint8
72 0.0122 libc-2.13.so malloc
71 0.0120 libwireshark.so.0.0.1 fast_ensure_contiguous
68 0.0115 lt-wireshark SIPcalls_packet
step 2
------
the next improvment was to change append_to_frame_graph() to use a
hash table for graph analysis (still slow, over 1 hour):
CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples % image name symbol name
565268 94.6185 libglib-2.0.so.0.2800.6 g_list_last
1863 0.3118 libwireshark.so.0.0.1 serv_name_lookup
1620 0.2712 libwireshark.so.0.0.1 dissect_sip_common
1340 0.2243 libwireshark.so.0.0.1 format_text
987 0.1652 libglib-2.0.so.0.2800.6 g_unichar_isprint
982 0.1644 libglib-2.0.so.0.2800.6 g_hash_table_lookup
819 0.1371 libwireshark.so.0.0.1 tap_push_tapped_queue
731 0.1224 libwireshark.so.0.0.1 guint8_pbrk
628 0.1051 libwireshark.so.0.0.1 proto_tree_add_item
587 0.0983 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal
574 0.0961 libwireshark.so.0.0.1 compute_offset_length
405 0.0678 libwireshark.so.0.0.1 proto_is_protocol_enabled
401 0.0671 libwireshark.so.0.0.1 emem_free_all
395 0.0661 libwireshark.so.0.0.1 dissect_ip
391 0.0654 libc-2.13.so __strncmp_sse2
367 0.0614 libwireshark.so.0.0.1 dissect
...
20 0.0017 lt-wireshark append_to_frame_graph.isra.8
however, append_to_frame_graph() got from 69.45% to 0.0017% (40852 times better)
step 3
------
I have changed the graph analysis list and the callsinfo list to use
prepend instead of append:
CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples % image name symbol name
130285 18.6875 libc-2.13.so msort_with_tmp
66464 9.5333 libglib-2.0.so.0.2800.6 node_get_pos
53790 7.7154 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal
51146 7.3361 libc-2.13.so __memcpy_sse2
44419 6.3712 libglib-2.0.so.0.2800.6 node_get_next
42461 6.0904 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup
41265 5.9189 libglib-2.0.so.0.2800.6 is_end
40819 5.8549 libglib-2.0.so.0.2800.6 g_hash_table_resize
38905 5.5803 libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next
36069 5.1736 libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder
35828 5.1390 libglib-2.0.so.0.2800.6 g_hash_table_lookup
29469 4.2269 libgtk-x11-2.0.so.0.2400.4 save_positions
26858 3.8524 libgtk-x11-2.0.so.0.2400.4 generate_order
7892 1.1320 libc-2.13.so __memset_sse2
7752 1.1119 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_sort_func
6860 0.9840 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func
6835 0.9804 libglib-2.0.so.0.2800.6 g_hash_table_remove_all_nodes
4085 0.5859 libglib-2.0.so.0.2800.6 g_array_append_vals
3864 0.5542 libglib-2.0.so.0.2800.6 g_sequence_iter_is_end
3591 0.5151 libc-2.13.so __memset_x86_64
2057 0.2950 libglib-2.0.so.0.2800.6 g_array_maybe_expand
step 4
------
reordering is performed _continously_ while adding elements to the
voip calls list.
(by _gtk_rbtree_reorder which is calling qsort from libc which is
calling msort_with_tmp)
after disabling continous reordering, by using in voip_calls_dlg_update():
/* disable the re-ordering */
gtk_tree_sortable_set_sort_column_id(sortable,
GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING);
listx = g_list_first(listx);
while (listx) {
add_to_list_store((voip_calls_info_t*)(listx->data));
listx = g_list_next(listx);
}
/* enable the re-ordering */
gtk_tree_sortable_set_sort_column_id(sortable,
CALL_COL_START_TIME, GTK_SORT_ASCENDING);
CPU: AMD64 family10, speed 3e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 750000
samples % image name symbol name
13495 8.8003 libwireshark.so.0.0.1 format_text
10698 6.9763 libglib-2.0.so.0.2800.6 g_unichar_isprint
7421 4.8394 libwireshark.so.0.0.1 guint8_pbrk
4973 3.2430 libwireshark.so.0.0.1 compute_offset_length
4776 3.1145 libglib-2.0.so.0.2800.6 g_hash_table_lookup
4029 2.6274 libwireshark.so.0.0.1 dissect_sip_common
3851 2.5113 libglib-2.0.so.0.2800.6 g_atomic_pointer_get
3186 2.0776 libc-2.13.so __memcpy_sse2
3064 1.9981 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal
3008 1.9616 libwireshark.so.0.0.1 emem_free_all
2964 1.9329 libwireshark.so.0.0.1 fast_ensure_contiguous
2803 1.8279 libwireshark.so.0.0.1 serv_name_lookup
2793 1.8214 libwireshark.so.0.0.1 check_offset_length_no_exception
2114 1.3786 libc-2.13.so _int_malloc
2063 1.3453 libglib-2.0.so.0.2800.6 g_slice_free1
2046 1.3342 libwireshark.so.0.0.1 tap_push_tapped_queue
1841 1.2005 libwireshark.so.0.0.1 tvb_get_guint8
1713 1.1171 libc-2.13.so memchr
1668 1.0877 libwireshark.so.0.0.1 tvb_length_remaining
1625 1.0597 libglib-2.0.so.0.2800.6 g_atomic_int_get
1550 1.0108 libwireshark.so.0.0.1 proto_tree_add_item
1524 0.9938 libc-2.13.so vfprintf
1475 0.9619 libwireshark.so.0.0.1 emem_alloc_chunk
1439 0.9384 libc-2.13.so __strncmp_sse2
1395 0.9097 libglib-2.0.so.0.2800.6 g_slice_alloc
1312 0.8556 libwireshark.so.0.0.1 tvb_find_guint8
1279 0.8341 libwireshark.so.0.0.1 ensure_contiguous_no_exception
1052 0.6860 libc-2.13.so malloc
1028 0.6704 libwireshark.so.0.0.1 dissect_sdp
1022 0.6665 libwireshark.so.0.0.1 tvb_pbrk_guint8
the runtime decreased now to approx. 1 minute (pretty reasonable
when compared to 10 hours)
I can publish the patches on bugzilla; however,
the changes/tests have been explicitly targeted at sip voip calls.
some of the data structures are global and used by _all_
voip protocol taps; it won't work correctly with dumps having
also non-sip voip calls.
pls. let me know your opinion.
thanks.
cristian
- Follow-Ups:
- Re: [Wireshark-dev] slow sip voip flow for large captures
- From: Anders Broman
- Re: [Wireshark-dev] slow sip voip flow for large captures
- From: Jaap Keuter
- Re: [Wireshark-dev] slow sip voip flow for large captures
- Prev by Date: Re: [Wireshark-dev] Conference room before FOSDEM
- Next by Date: Re: [Wireshark-dev] slow sip voip flow for large captures
- Previous by thread: Re: [Wireshark-dev] [Wireshark-commits] rev 40820: /trunk/doc/ /trunk/doc/: capinfos.pod dumpcap.pod editcap.pod mergecap.pod rawshark.pod text2pcap.pod tshark.pod wireshark-filter.pod wireshark.pod.template
- Next by thread: Re: [Wireshark-dev] slow sip voip flow for large captures
- Index(es):