Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Unicode anomaly

May
12,953
172
If I start TCC with /U ...

Code:
v:\> for /l %i in (1,1,1000) ( echo abc >> abc.txt )

v:\> echo %@lines[abc.txt]
999
That's as expected.

Now I use a hex-editor to remove the BOM.
Code:
v:\> hexe abc.txt

v:\> echo %@lines[abc.txt]
1000
While I'm not sure about why the result is different, I am confident TCC doesn't identify the new file as Unicode. The test IS_TEXT_UNICODE_ASCII16 (The text is Unicode, and contains only zero-extended ASCII values/characters) is useless. Here's part of a query I made (without satisfactory results) in microsoft.public.vc.language.
Code:
LPCWSTR szStr[6] = {L"A", L"A ", L"A b", L"A bu", L"A bug", L"A bug!"};
for ( INT i=0; i<6; i++ )
{
    INT test = IS_TEXT_UNICODE_ASCII16;
    BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i]), &test);
    wprintf(L"L\"%s\" is %sUnicode", szStr[i], bResult ? L"" : L"not ");
    wprintf(L" (0x%X)\n", test);
}

L"A" is not Unicode (0x5)
L"A " is Unicode (0x1)
L"A b" is Unicode (0x1)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)

The results are different (but equally confusing) results if the terminating NUL
is included in the test:

/* as above but with */
BOOL bResult = IsTextUnicode(szStr[i], 2*wcslen(szStr[i])+2, &test);

L"A" is Unicode (0x1)
L"A " is Unicode (0x1)
L"A b" is not Unicode (0x0)
L"A bu" is not Unicode (0x0)
L"A bug" is not Unicode (0x0)
L"A bug!" is not Unicode (0x0)
FWIW, this is the kind of file you get from CMD started with /U.
 

Similar threads

Back
Top