Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Incorrect Unicode detection in "type" and "head" commands

TCC 26.02.43 x64 Windows 10 [Version 10.0.19044.1503]

I have an ASCII file "fred.hex" containing one very long line, mostly repetitions of "|00 00" and terminated by CRLF. When the line is longer than 512 chars the output from typing the file (or head) shows the per thousand symbol and the unknown char symbol and a space. type /X shows the correct hex codes on the left but also renders them as Unicode chars on the right (as per attached PNG).

The line is:
-13:03:59| 0| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00|00 00|00 00|00 00|00 00|00 00|00 00|00 00...

Originally it was 1700 chars long, but continually reducing the length (by bisection initially) resulted in correct display as the length dropped below 512 chars. Examining the critical 6 chars shows no hidden non-ASCII chars. Deleting one of the " 00" from the initial group also results in the correct display, even though the line is over 512 chars.
Regenerating the file by hand in notepad also exhibits the same behaviour, ruling out a hidden control char I missed.
I do not have UTF8 or Unicode output enabled in "option".
Is there any way to force ASCII display?
Thanks, Len.
 

Attachments

  • Bad TCC type.png
    Bad TCC type.png
    4.4 KB · Views: 154
Similar (but different) here (with TCC v28). VIEW gets it right (as does Gnu CAT).

1645582730272.png


But TYPE (without /X), LIST, HEAD, and TAIL all show

1645582918112.png


With /X, TYPE shows the hex correctly but the text is as above.

I wonder if it's the Win32 function IsTextUnicode? I'll test it.

I use codepage 1252 if it matters.
 
Since I don't know which tests TCC uses, I told IsTextUnicode to use all tests (lpiResult = nullptr). I got

Code:
546 bytes were read
IsTextUnicode() returned TRUE

I don't know if anything can be done about that.
 
And when I shorten the line, I get

Code:
508 bytes were read
IsTextUnicode() returned FALSE
 
IsTextUnicode() is a Microsoft Win32 API function. I also tested TCC'd QueryIsFileUnicode() function (which, no doubt, uses the WIN32 function) in a plugin. The results were as I reported above
 
@LurkingKiwi

If there is no workaround possible (we will see), then maybe it's the best to use an external command for this.

For example "cat" (integrated with git), which works here.
 

Similar threads

Back
Top