Richard Sharpe <sharpe@xxxxxxxxxx> writes:
> Tim Potter wrote:
> > Richard Sharpe writes:
> >
>
> >>>Hmm... do you mean strings of wide characters or some other encoding (
> >>>utf-8?).
> >>>
> >> Good question. I was thinking specifically of what MS puts on the
> >> wire, which I think is wide characters, but we need to think this
> >> issue through. Internally, Samba will be using UTF-8, I think,
> >> (although, this is irrelevant) and there may be other protocols
> >> that use other forms of UNICODE.
>
> >>
> > NT actually uses UCS2-LE (i.e 2 byte characters, little endian)
>
> > on the wire for the rpc calls. So it's a bit more complicated
> > than having ustring calls. )-:
>
> Hmmm, but don't they look like lots of UCS2-LE characters, followed by
> 0x000x00?
>
Some things to bear in mind:
Ignoring the SMB parts of things for the moment, what you're actually
seeing in the DCE/RPC parts of these calls are usually one of two
different things:
- an array of unsigned shorts with the [string] attribute. The
[string] attribute means that there'll be a terminating 0 element, in
this case a 0x0000.
- a 'counted' array of unsigned short. e.g.,
struct _LSA_UNICODE_STRING {
unsigned short len;
unsigned short max_len;
[size_is (max_len/2), length_is(len/2)] unsigned short *Buffer;
} LSA_UNICODE_STRING;
in this case, there may or may not be a null terminator. It all
depends on the caller. In practice, some APIs tend to always
include one, and others not to, but that's just quirks in MS's
implementation, not something that should be assumed.
also note that in either case, the endianness is specified by the
dce/rpc packet header, so you can't assume that it's LE on the wire.
In theory, this kind of stuff should be auto-generated by an .idl
compiler, but even if we had one, there's really no way to guess
whether an array of unsigned shorts is really a ucs2 string or not.
Todd