Wireshark-dev: Re: [Wireshark-dev] [Wireshark-commits] rev 39143: /trunk/epan/dissectors/ /trun

From: "Maynard, Chris" <Christopher.Maynard@xxxxxxxxx>
Date: Fri, 30 Sep 2011 15:43:32 -0400
> -----Original Message-----
> From: wireshark-dev-bounces@xxxxxxxxxxxxx [mailto:wireshark-dev-
> bounces@xxxxxxxxxxxxx] On Behalf Of Guy Harris
> Sent: Tuesday, September 27, 2011 3:37 PM
> To: Developer support list for Wireshark
> Subject: Re: [Wireshark-dev] [Wireshark-commits] rev 39143:
> /trunk/epan/dissectors/ /trunk/epan/dissectors/: packet-dvbci.c
> 
> 
> On Sep 27, 2011, at 12:11 PM, Maynard, Chris wrote:
> 
> > I suppose, although I still question the real need for ENC_NA at all.
> 
> At this point, I'm willing to see it go away, too.

Hmm, "willing to" != "prefer to".  If you of all people would prefer to keep it then I think we should definitely keep it.

> > Now that being said, changing the "const gboolean little_endian"
> argument to an "int endian" or better yet, "int encoding" argument
> 
> $ egrep encoding epan/proto.h
> 
> 	...
> 
>     const gint start, gint length, const guint encoding);
>
[etc.]

Well what's the expression?  I guess this is me not seeing the forest through the trees here.  I was so focused on ENC_NA == ENC_BIG_ENDIAN, I didn't even notice the other ENC's, nor the guint encoding.  I also happened to read other things like, "In the future" from README.developer, so I didn't even think to look.  I do apologize for this.  But looking at the code and relative lack of use of these other ENC's, and noticing what's been changing (mostly TRUE -> ENC_LITTLE_ENDIAN, FALSE -> ENC_BIG_ENDIAN, and some ENC_NA's), it does seem that others might have been unaware of the other encodings as well?  I don't know, maybe there's been some confusion in thinking that the last argument to proto_tree_add_item() still meant only the endian used and so ENC_NA only meant that the endian-ness was not applicable?

OK, well that confusion aside, and assuming we want to do *something* to try to avoid situations where ENC_NA is incorrectly used where ENC_BIG_ENDIAN or [especially] ENC_LITTLE_ENDIAN should be, how does something like the attached patch grab you?  Keep in mind that this is a VERY INCOMPLETE patch (especially proto.c) and obviously needs more work; it's just to get the basic idea across.

So for example, in theory if changes such as this were to be made, then calls to proto_tree_add_item() that today don't warn us of any problems might be able to in the future.  Such as:

    Warning: TRUE/FALSE deprecated.
    proto_tree_add_item(tree, hf_ft_le_uint32, tvb, offset, len, TRUE);
    proto_tree_add_item(tree, hf_ft_be_uint32, tvb, offset, len, FALSE);

    Warning: Endian not applicable for this type.
    proto_tree_add_item(tree, hf_ft_bytes, tvb, offset, len, TRUE);
    proto_tree_add_item(tree, hf_ft_bytes, tvb, offset, len, FALSE);
    proto_tree_add_item(tree, hf_ft_bytes, tvb, offset, len, ENC_LITTLE_ENDIAN);
    proto_tree_add_item(tree, hf_ft_bytes, tvb, offset, len, ENC_BIG_ENDIAN);

    Warning: Invalid endian for this type.
    proto_tree_add_item(tree, hf_ft_le_uint32, tvb, offset, len, ENC_NA);
    proto_tree_add_item(tree, hf_ft_be_uint32, tvb, offset, len, ENC_NA);

- Chris

CONFIDENTIALITY NOTICE: The contents of this email are confidential
and for the exclusive use of the intended recipient. If you receive this
email in error, please delete it from your system immediately and 
notify us either by email, telephone or fax. You should not copy,
forward, or otherwise disclose the content of the email.
Index: epan/proto.h
===================================================================
--- epan/proto.h	(revision 39201)
+++ epan/proto.h	(working copy)
@@ -197,7 +197,7 @@
  * still need to be able to specify that at run time).
  *
  * So, for now, we define ENC_BIG_ENDIAN and ENC_LITTLE_ENDIAN as
- * bit flags, to be combined, in the future, with other information
+ * bit fields, to be combined, in the future, with other information
  * to specify the encoding in the last argument to
  * proto_tree_add_item(), and possibly to specify in a field
  * definition (e.g., ORed in with the type value).
@@ -205,13 +205,31 @@
  * Currently, proto_tree_add_item() treats its last argument as a
  * Boolean - if it's zero, the field is big-endian, and if it's non-zero,
  * the field is little-endian - and other code in epan/proto.c does
- * the same.  We therefore define ENC_BIG_ENDIAN as 0x00000000 and
- * ENC_LITTLE_ENDIAN as 0x80000000 - we're using the high-order bit
- * so that we could put a field type and/or a value such as a character
- * encoding in the lower bits.
+ * the same.  We therefore define the encoding argument as follows:
+ *
+ *    32 31        30 - 2        1
+ *  +---+---+--------//--------+---+
+ *  |ENDIAN |   CHAR ENCODING  | H |
+ *  +---+---+--------//--------+---+
+ *
+ *  ENDIAN = Endian as follows:
+ *      00 = Historic.  The H bit determines the endian
+ *      10 = Little Endian
+ *      01 = Big Endian
+ *      11 = N/A or Invalid
+ *  CHAR ENCODING = Character encoding as follows:
+ *      0x00000000 = UTF-8 or ASCII
+ *      0x0EBCD1C0 = EBCDIC
+ *  H = Historic endian bit:
+ *      0 = Big Endian
+ *      1 = Little Endian
  */
-#define ENC_BIG_ENDIAN		0x00000000
+#define ENC_ENDIAN_MASK		0xC0000000
+#define ENC_HISTORIC_ENDIAN	0x00000000
 #define ENC_LITTLE_ENDIAN	0x80000000
+#define ENC_BIG_ENDIAN		0x40000000
+#define ENC_NA_ENDIAN		0xC0000000
+#define ENC_INVALID_ENDIAN	ENC_NA_ENDIAN
 
 /*
  * Historically FT_TIMEs were only timespecs; the only question was whether
@@ -250,11 +268,16 @@
  * the control character in small caps, diagonally.  (Unfortunately,
  * those only exist for C0, not C1.)
  */
-#define ENC_CHARENCODING_MASK	0x7FFFFFFE	/* mask out byte-order bits */
+#define ENC_CHARENCODING_MASK	0x3FFFFFFE	/* mask out byte-order bits */
 #define ENC_UTF_8		0x00000000
 #define ENC_ASCII		0x00000000
 #define ENC_EBCDIC		0x0EBCD1C0
 
+/* The 'H' bit: */
+#define ENC_HISTORIC_ENDIAN_MASK	0x00000001
+#define ENC_HISTORIC_LITTLE_ENDIAN	TRUE
+#define ENC_HISTORIC_BIG_ENDIAN		FALSE
+
 /*
  * For protocols (FT_PROTOCOL), aggregate items with subtrees (FT_NONE),
  * opaque byte-array fields (FT_BYTES), and other fields where there
@@ -262,7 +285,7 @@
  * of bytes" or because the encoding is completely fixed), we
  * have ENC_NA (for "Not Applicable").
  */
-#define ENC_NA			0x00000000
+#define ENC_NA			(ENC_NA_ENDIAN|ENC_UTF_8|ENC_HISTORIC_BIG_ENDIAN)
 
 /* Values for header_field_info.display */
 
Index: epan/proto.c
===================================================================
--- epan/proto.c	(revision 39201)
+++ epan/proto.c	(working copy)
@@ -1260,6 +1260,12 @@
 			break;
 
 		case FT_BYTES:
+			endian = encoding & ENC_ENDIAN_MASK;
+			if (endian != ENC_NA_ENDIAN) {
+				/* TODO: Issue some sort of warning, expert_info(), ...?
+				 * g_warning("Endian not applicable for type %d.\n",
+				 * 	new_fi->hfinfo->type); */
+			}
 			proto_tree_set_bytes_tvb(new_fi, tvb, start, length);
 			break;
 
@@ -1298,8 +1304,22 @@
 			 * Map all non-zero values to little-endian for
 			 * backwards compatibility.
 			 */
-			if (encoding)
-				encoding = ENC_LITTLE_ENDIAN;
+			encoding = encoding_arg & ENC_ENDIAN_MASK;
+			if (encoding == ENC_INVALID_ENDIAN) {
+				/* TODO: Issue some sort of warning, expert_info(), ...?
+				 * g_warning("Invalid endian for type %d.\n",
+				 * 	new_fi->hfinfo->type); */
+				encoding = encoding_arg & ENC_HISTORIC_ENDIAN_MASK;
+			}
+			else if (encoding == ENC_HISTORIC_ENDIAN) {
+				/* TODO: Issue some sort of warning, expert_info(), ...?
+				 * g_warning("Using TRUE/FALSE for indicating type %d "
+				 * 	"endianness is deprecated.\n",
+				 * 	new_fi->hfinfo->type); */
+				encoding = encoding_arg & ENC_HISTORIC_ENDIAN_MASK;
+			}
+			else
+				encoding >>= 31;
 			proto_tree_set_uint(new_fi,
 				get_uint_value(tvb, start, length, encoding));
 			break;
@@ -1695,7 +1715,9 @@
 			DISSECTOR_ASSERT_NOT_REACHED();
 			break;
 	}
-	FI_SET_FLAG(new_fi, (encoding & ENC_LITTLE_ENDIAN) ? FI_LITTLE_ENDIAN : FI_BIG_ENDIAN);
+	FI_SET_FLAG(new_fi,
+		(encoding & ENC_HISTORIC_ENDIAN_MASK) == ENC_HISTORIC_LITTLE_ENDIAN ?
+		FI_LITTLE_ENDIAN : FI_BIG_ENDIAN);
 
 	/* Don't add new node to proto_tree until now so that any exceptions
 	 * raised by a tvbuff access method doesn't leave junk in the proto_tree. */