https://bugs.wireshark.org/bugzilla/show_bug.cgi?id=6613
--- Comment #2 from Guy Harris <guy@xxxxxxxxxxxx> 2011-11-23 23:55:49 PST ---
At least according to
https://secure.wikimedia.org/wikipedia/en/wiki/PCRE
the PCRE library has an option to enable "Unicode character properties ":
Unicode defines several properties for each character. Patterns in PCRE can
match these properties. e.g. \p{Ps}.*?\p{Pe} would match a string beginning
with any "opening punctuation" and ending with any "close punctuation" such as
"[abc]". Since verion 8.10, matching of certain "normal" metacharacters can be
driven by Unicode properties when the compile option PCRE_UCP is set. The
option can be set for a pattern by including (*UCP) at the start of pattern.
The option alters behavior of the following metacharacters: \B, \b, \D, \d, \S,
\s, \W, \w, and some of the POSIX character classes. For example, the
characters matched by \w (word characters) is expanded to include letters and
accented letters as defined by Unicode properties. Such matching is slower than
the normal (ASCII-only) non-UCP alternative. Note that the UCP option requires
the PCRE library to have been built to include Unicode property support.
Does the GLib regex matching support Unicode characters (presumably encoded
with UTF-8) in that fashion by default? If so, would setting G_REGEX_RAW
disable that? If so, would setting G_REGEX_RAW be, overall, an improvement or
a regression? ("Overall" means "to all users, regardless of whether they use
hex escapes in patterns or not".)
--
Configure bugmail: https://bugs.wireshark.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.