Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

UNICODE mixed with ANSI Code

Sep
134
1
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)
Question: Is there a possibility to transfer these files completely into Unicode with TCC?
I know, TPIPE is a monster tool, but at the moment I have the impression that it is not
suitable for this. So mayby a BTM can do the job better.
 
You have 8-bit characters embedded in otherwise UTF-16 text files?
 
You have 8-bit characters embedded in otherwise UTF-16 text files?
Yes, exactly. Here is (a fragment of) such a problem file.
Btw. FILEREAD may have a problem with this thing that I can't explain ....
 

Attachments

  • UNI_ANSI-Code-Kurz.txt
    623 KB · Views: 258
That file is a mess. Some of the 8-bit stuff is syntax messages (which you wouldn't expect to get into a log file).
upload_2017-12-6_14-11-35.png


And the times don't always go in order. For example, there are entries from [24.10.17 15:57:09] which are before entries from [25.10.17 09:21:41].

There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
 
I don't know what's expected here, but I can mess up a Unicode text file easily with >>.
Code:
v:\> notepad uctest.txt
upload_2017-12-6_14-41-19.png

Code:
v:\> type uctest.txt
My dog has fleas.
v:\> echo foo >> uctest.txt

upload_2017-12-6_14-42-43.png


Now (this is the best part).
upload_2017-12-6_14-43-48.png
 
There are also many "Param/Comment" entries referring to MetaPad sessions. What are they?

Do you have another program also logging to that file?
It's as part of my real JPSOFT-History log. In the dark age of Beginnig of 4DOS/4OS/4NT I've worked a long time
with the Log-Feature of it, so the weird "Param/Comment"-Entries result from this time :blackalien:

Only TCC is the logger, but sometimes I must switch to ANSI, and I do it with OPTION //UNICODEOUTPUT=NO.

One reason is for example DISKPART. I've written a BTM to control it in the way
"DISKPART /s Commandscript.txt"
but DISKPART cannot handle UNICODE in "Commandscript.txt", so it must be ANSI.
And as I look in the last month into Log, I see the desaster mix of Codes.
It's not a big problem, but I'm looking for a way to clean the file, so I've had the hope, that I can do it with TCC.
 
Some of my TCC log files (in Unicode format) sometimes contain sequences in
ANSI code (there are reasons for this :artist:)

I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.
 
I don't know of any way to do it automatically, as there's no obvious difference between a 2-byte Unicode character and a 2-byte pair of ANSI characters.

Provided your file didn't have any extended Unicode characters (i.e., > 255) you could pick it apart with @FILEREADB looking for characters that didn't have a 0 for the high byte, and converting those to Unicode.

I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream
and examine it, on LSB and MSB are not equal to NUL, and so on.
Maybe the the functions BALLOC,BREAD, BPEEK and so on are helpful or faster in execution ?
I will try it out :joyful:
 
I was afraid you'd say that,Rex. So I have to read the whole thing as BYTE- or WORD-Stream and examine it, on LSB and MSB are not equal to NUL, and so on.
It would have to be byte-by-byte, not even a word at a time. Thanks to the 8-bit characters in the stream, the Unicode characters may not be word-aligned....
 

Similar threads

Back
Top