Wireshark-dev: [Wireshark-dev] slow sip voip flow for large captures
From: Cristian Constantin <const.crist@xxxxxxxxxxxxxx>
Date: Fri, 3 Feb 2012 13:01:29 +0100
hi! wireshark can draw call flows for sip voip calls (accessible through the menu Telephony/VoIP Calls). however, when the capture is large, containing tens of thousands of sip voip calls, wireshark becomes very slow at producing the list of calls and the call flows. here are my experiences with a capture file of 500 MB containing approx. 40K calls on a debian machine having: 4xAMD Phenom(tm) II X4 940 Processor 4GB RAM with the svn stock it takes about 10 hours to do it; the report of the profiling done with oprofile looks like this: CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name app name symbol name 8146520 55.2011 lt-wireshark lt-wireshark append_to_frame_graph.isra.7 3698478 25.0610 libglib-2.0.so.0.2800.6 lt-wireshark g_list_last 1566933 10.6176 lt-wireshark lt-wireshark SIPcalls_packet 1162377 7.8763 libc-2.13.so lt-wireshark __strcmp_sse2 8734 0.0592 libwireshark.so.0.0.1 lt-wireshark serv_name_lookup 8130 0.0551 libwireshark.so.0.0.1 lt-wireshark dissect_sip_common 6504 0.0441 libwireshark.so.0.0.1 lt-wireshark format_text 5749 0.0390 libwireshark.so.0.0.1 lt-wireshark tap_push_tapped_queue 4753 0.0322 libglib-2.0.so.0.2800.6 lt-wireshark g_unichar_isprint 4627 0.0314 libglib-2.0.so.0.2800.6 lt-wireshark g_hash_table_lookup 3699 0.0251 libwireshark.so.0.0.1 lt-wireshark guint8_pbrk step 1 ------ by changing SIPcalls_packet() to use a hash table for the lookup of the calls (on sip call-id) the runtime decreases to an (estimated) 3-4 hours. SIPcalls_packet()'s cpu usage has decreased from 10.61% to 0.0115% (1000 times faster) CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name symbol name 411305 69.4523 lt-wireshark append_to_frame_graph.isra.7 171549 28.9675 libglib-2.0.so.0.2800.6 g_list_last 475 0.0802 libwireshark.so.0.0.1 serv_name_lookup 439 0.0741 libwireshark.so.0.0.1 dissect_sip_common 375 0.0633 libwireshark.so.0.0.1 format_text 360 0.0608 libwireshark.so.0.0.1 tap_push_tapped_queue 303 0.0512 libglib-2.0.so.0.2800.6 g_hash_table_lookup 283 0.0478 libglib-2.0.so.0.2800.6 g_unichar_isprint 202 0.0341 libwireshark.so.0.0.1 guint8_pbrk 172 0.0290 libwireshark.so.0.0.1 proto_tree_add_item 149 0.0252 libwireshark.so.0.0.1 compute_offset_length 143 0.0241 libc-2.13.so __memcpy_sse2 141 0.0238 libwireshark.so.0.0.1 call_dissector_work 141 0.0238 libwireshark.so.0.0.1 emem_free_all 115 0.0194 libc-2.13.so __strncmp_sse2 112 0.0189 libc-2.13.so vfprintf 105 0.0177 libwireshark.so.0.0.1 proto_item_add_subtree 104 0.0176 libglib-2.0.so.0.2800.6 g_slice_free_chain_with_offset 99 0.0167 libwireshark.so.0.0.1 dissect_sdp 91 0.0154 libc-2.13.so free 91 0.0154 libwireshark.so.0.0.1 dissect_ip 90 0.0152 libwireshark.so.0.0.1 proto_field_is_referenced 90 0.0152 libwireshark.so.0.0.1 tvb_length_remaining 90 0.0152 lt-wireshark process_specified_packets.constprop.10 89 0.0150 libglib-2.0.so.0.2800.6 g_slice_free1 89 0.0150 libwiretap.so.0.0.1 file_seek 88 0.0149 libwireshark.so.0.0.1 dissect 86 0.0145 libwireshark.so.0.0.1 dissect_frame 82 0.0138 libwireshark.so.0.0.1 proto_is_protocol_enabled 79 0.0133 libc-2.13.so __strcmp_sse2 79 0.0133 libc-2.13.so _int_malloc 78 0.0132 libglib-2.0.so.0.2800.6 g_slice_alloc 75 0.0127 libpthread-2.13.so pthread_getspecific 73 0.0123 libwireshark.so.0.0.1 tvb_find_guint8 72 0.0122 libc-2.13.so malloc 71 0.0120 libwireshark.so.0.0.1 fast_ensure_contiguous 68 0.0115 lt-wireshark SIPcalls_packet step 2 ------ the next improvment was to change append_to_frame_graph() to use a hash table for graph analysis (still slow, over 1 hour): CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name symbol name 565268 94.6185 libglib-2.0.so.0.2800.6 g_list_last 1863 0.3118 libwireshark.so.0.0.1 serv_name_lookup 1620 0.2712 libwireshark.so.0.0.1 dissect_sip_common 1340 0.2243 libwireshark.so.0.0.1 format_text 987 0.1652 libglib-2.0.so.0.2800.6 g_unichar_isprint 982 0.1644 libglib-2.0.so.0.2800.6 g_hash_table_lookup 819 0.1371 libwireshark.so.0.0.1 tap_push_tapped_queue 731 0.1224 libwireshark.so.0.0.1 guint8_pbrk 628 0.1051 libwireshark.so.0.0.1 proto_tree_add_item 587 0.0983 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal 574 0.0961 libwireshark.so.0.0.1 compute_offset_length 405 0.0678 libwireshark.so.0.0.1 proto_is_protocol_enabled 401 0.0671 libwireshark.so.0.0.1 emem_free_all 395 0.0661 libwireshark.so.0.0.1 dissect_ip 391 0.0654 libc-2.13.so __strncmp_sse2 367 0.0614 libwireshark.so.0.0.1 dissect ... 20 0.0017 lt-wireshark append_to_frame_graph.isra.8 however, append_to_frame_graph() got from 69.45% to 0.0017% (40852 times better) step 3 ------ I have changed the graph analysis list and the callsinfo list to use prepend instead of append: CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name symbol name 130285 18.6875 libc-2.13.so msort_with_tmp 66464 9.5333 libglib-2.0.so.0.2800.6 node_get_pos 53790 7.7154 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal 51146 7.3361 libc-2.13.so __memcpy_sse2 44419 6.3712 libglib-2.0.so.0.2800.6 node_get_next 42461 6.0904 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_fixup 41265 5.9189 libglib-2.0.so.0.2800.6 is_end 40819 5.8549 libglib-2.0.so.0.2800.6 g_hash_table_resize 38905 5.5803 libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_next 36069 5.1736 libgtk-x11-2.0.so.0.2400.4 _gtk_rbtree_reorder 35828 5.1390 libglib-2.0.so.0.2800.6 g_hash_table_lookup 29469 4.2269 libgtk-x11-2.0.so.0.2400.4 save_positions 26858 3.8524 libgtk-x11-2.0.so.0.2400.4 generate_order 7892 1.1320 libc-2.13.so __memset_sse2 7752 1.1119 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_sort_func 6860 0.9840 libgtk-x11-2.0.so.0.2400.4 gtk_rbtree_reorder_invert_func 6835 0.9804 libglib-2.0.so.0.2800.6 g_hash_table_remove_all_nodes 4085 0.5859 libglib-2.0.so.0.2800.6 g_array_append_vals 3864 0.5542 libglib-2.0.so.0.2800.6 g_sequence_iter_is_end 3591 0.5151 libc-2.13.so __memset_x86_64 2057 0.2950 libglib-2.0.so.0.2800.6 g_array_maybe_expand step 4 ------ reordering is performed _continously_ while adding elements to the voip calls list. (by _gtk_rbtree_reorder which is calling qsort from libc which is calling msort_with_tmp) after disabling continous reordering, by using in voip_calls_dlg_update(): /* disable the re-ordering */ gtk_tree_sortable_set_sort_column_id(sortable, GTK_TREE_SORTABLE_UNSORTED_SORT_COLUMN_ID, GTK_SORT_ASCENDING); listx = g_list_first(listx); while (listx) { add_to_list_store((voip_calls_info_t*)(listx->data)); listx = g_list_next(listx); } /* enable the re-ordering */ gtk_tree_sortable_set_sort_column_id(sortable, CALL_COL_START_TIME, GTK_SORT_ASCENDING); CPU: AMD64 family10, speed 3e+06 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 750000 samples % image name symbol name 13495 8.8003 libwireshark.so.0.0.1 format_text 10698 6.9763 libglib-2.0.so.0.2800.6 g_unichar_isprint 7421 4.8394 libwireshark.so.0.0.1 guint8_pbrk 4973 3.2430 libwireshark.so.0.0.1 compute_offset_length 4776 3.1145 libglib-2.0.so.0.2800.6 g_hash_table_lookup 4029 2.6274 libwireshark.so.0.0.1 dissect_sip_common 3851 2.5113 libglib-2.0.so.0.2800.6 g_atomic_pointer_get 3186 2.0776 libc-2.13.so __memcpy_sse2 3064 1.9981 libglib-2.0.so.0.2800.6 g_hash_table_insert_internal 3008 1.9616 libwireshark.so.0.0.1 emem_free_all 2964 1.9329 libwireshark.so.0.0.1 fast_ensure_contiguous 2803 1.8279 libwireshark.so.0.0.1 serv_name_lookup 2793 1.8214 libwireshark.so.0.0.1 check_offset_length_no_exception 2114 1.3786 libc-2.13.so _int_malloc 2063 1.3453 libglib-2.0.so.0.2800.6 g_slice_free1 2046 1.3342 libwireshark.so.0.0.1 tap_push_tapped_queue 1841 1.2005 libwireshark.so.0.0.1 tvb_get_guint8 1713 1.1171 libc-2.13.so memchr 1668 1.0877 libwireshark.so.0.0.1 tvb_length_remaining 1625 1.0597 libglib-2.0.so.0.2800.6 g_atomic_int_get 1550 1.0108 libwireshark.so.0.0.1 proto_tree_add_item 1524 0.9938 libc-2.13.so vfprintf 1475 0.9619 libwireshark.so.0.0.1 emem_alloc_chunk 1439 0.9384 libc-2.13.so __strncmp_sse2 1395 0.9097 libglib-2.0.so.0.2800.6 g_slice_alloc 1312 0.8556 libwireshark.so.0.0.1 tvb_find_guint8 1279 0.8341 libwireshark.so.0.0.1 ensure_contiguous_no_exception 1052 0.6860 libc-2.13.so malloc 1028 0.6704 libwireshark.so.0.0.1 dissect_sdp 1022 0.6665 libwireshark.so.0.0.1 tvb_pbrk_guint8 the runtime decreased now to approx. 1 minute (pretty reasonable when compared to 10 hours) I can publish the patches on bugzilla; however, the changes/tests have been explicitly targeted at sip voip calls. some of the data structures are global and used by _all_ voip protocol taps; it won't work correctly with dumps having also non-sip voip calls. pls. let me know your opinion. thanks. cristian
- Follow-Ups:
- Re: [Wireshark-dev] slow sip voip flow for large captures
- From: Anders Broman
- Re: [Wireshark-dev] slow sip voip flow for large captures
- From: Jaap Keuter
- Re: [Wireshark-dev] slow sip voip flow for large captures
- Prev by Date: Re: [Wireshark-dev] Conference room before FOSDEM
- Next by Date: Re: [Wireshark-dev] slow sip voip flow for large captures
- Previous by thread: Re: [Wireshark-dev] [Wireshark-commits] rev 40820: /trunk/doc/ /trunk/doc/: capinfos.pod dumpcap.pod editcap.pod mergecap.pod rawshark.pod text2pcap.pod tshark.pod wireshark-filter.pod wireshark.pod.template
- Next by thread: Re: [Wireshark-dev] slow sip voip flow for large captures
- Index(es):