Wireshark-dev: [Wireshark-dev] 3GPP 23.038 encoding and string length

From: Pascal Quantin <pascal.quantin@xxxxxxxxx>
Date: Tue, 24 Dec 2013 11:43:43 +0100
Hi all,

r54428 introduced a ENC_3GPP_TS_23_038 encoding type so as to be able to use proto_tree_add_item directly instead of manually decoding the string with gsm_sms_char_7bit_unpack() / gsm_sms_chars_to_utf8() functions.
While it is a very good idea (much more easier to use) it raises an interesting issue. With this 7 bits encoding a payload of 7 bytes will hold either 7 or 8 characters. This is handled by gsm_sms_char_7bit_unpack() function thanks to an extra parameter specifying the number of characters.
We had several bugs in this area in GSM SMS and ANSI 637 dissectors (adding an extra '@' character at the end of the buffer due to padding 0 bits) that I fixed some time ago (those protocols specify in another field the number of characters to decode).
ETSI CAT specification asks to add an explicit <CR> in this situation to avoid the problem.
GSM MAP dissection is wrong (as briefly discussed on -users: http://www.wireshark.org/lists/wireshark-users/201311/msg00014.html) and needs to make use of the lengthInCharacter variable like what is done in GSM SMS / ANSI 637 dissectors.
If we want to start using ENC_3GPP_TS_23_038 for the remaining dissectors using gsm_sms_char_7bit_unpack() we need to find a solution for this.
Should we change the length parameter meaning to represent the number of characters rather than the buffer size? This would be a major difference compared to other encoding and it might be difficult to handle correctly number of bytes to highlight. Or maybe we should continue handling it manually as it was before...
Thoughts?

Thanks,
Pascal.