Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD PDIR: files with percent sign % in name cause not listing other files in directory

May
3,515
5
This is the command I used:

[C:\TMP] ctzls F:\ > f.cat

where
ctzls is an alias : *pdir/nej/ou/s/a:-d/(dy-m-d th:m:s z 8r @inode[*] 3@links[*] fpn)

Issues:

- first run terminated after processing 41371 of 190608 files (no error message) [NOTE: I had reported identical problem in past, but had not followed up - I will run the same command again...]

- second run stopped processing each directory AND ITS SUBDIRECTORIES when a filename containing percent sign % was encountered; 1822 files not reported (only 4 contained % in name); error message from @inode[*] or @links[*] (sorry, do not recall which) displayed the file name which contains the % sign by replacing the % sign and those subsequent characters that are legal in a variable name with the variable's supposed value, which happened to be empty, with the indication that the file was not found. More pernicious than mistinterpreting file names is that no further processing of files in the whole subtree was performed.
 
Firstly, since PDIR only reports, but does not change anything, I'd rather see gibberish result than aborting the command. Secondly, the command was not aborted - it continued from the next directory, afters skipping all entries in the directory and its subdirectories.
- Why does PDIR attempt to parse a file name instead of passing it as a quoted literal to the function?
- Why does PDIR ignore the remaining files in the current directory, but resumes in the next directory?
- How can we deal with all files whose names NTFS considers legal?
 
You were (formerly) one of the most ardent proponents of not returning erroneous data! Your new preference means that you could never use the results of PDIR as input to another command.

PDIR is not attempting to parse a filename. You asked PDIR to call a variable function, and the variable expansion routine (which knows nothing whatever about any individual command) is parsing a string. PDIR has behaved this way since it was first created.

If you don't want your filenames expanded, then either don't create such highly dubious (and impractical) filenames, or turn off nested variable expansion before calling PDIR.

- How can we deal with all files whose names NTFS considers legal?

Use a compiler, not a command interpreter. Or adopt some minimal responsibility and don't propagate filenames you know are at best problematic (and at worst technically illegal).
 
You were (formerly) one of the most ardent proponents of not returning erroneous data! Your new preference means that you could never use the results of PDIR as input to another command.
What I'd like is for the bad name to be listed in errors, and PDIR to continue. This is especially relevant to find ALL filenames that are not TCC compatible.
PDIR is not attempting to parse a filename. You asked PDIR to call a variable function, and the variable expansion routine (which knows nothing whatever about any individual command) is parsing a string. PDIR has behaved this way since it was first created.
But PDIR uses special syntax to call the variable function IF THE PARAMETER IS THE JUST THE FULL NAME OF A MATCHING FILE. Why can't this syntax bypass the variable expansion routine? IIRC the documentation never mentioned that filenames are expanded as if they were variables...
If you don't want your filenames expanded, then either don't create such highly dubious (and impractical) filenames, or turn off nested variable expansion before calling PDIR.
For the specific instant where I used it this would be feasible, and will try it.
Use a compiler, not a command interpreter. Or adopt some minimal responsibility and don't propagate filenames you know are at best problematic (and at worst technically illegal).
In the current situation all such names were created either to experiment with bad names in response to postings in the SUPPORT forum, or through bad code somewhere in my system, and can thus be easily eliminated. However, NTFS now accepts names that are listed in some documentation as illegal; such files can be created, read, modified, and deleted. Some of these exist on systems over which TCMD users have no change authority. POSIX shells have no problem dealing with them. It is time for TCC to be able to do so directly, without the need for work-arounds.
 
What I'd like is for the bad name to be listed in errors, and PDIR to continue. This is especially relevant to find ALL filenames that are not TCC compatible.

It's not a TCC compatibility issue; you told TCC to expand a variable, and it did. If you don't want it to expand variables, tell it not to.

But PDIR uses special syntax to call the variable function IF THE PARAMETER IS THE JUST THE FULL NAME OF A MATCHING FILE. Why can't this syntax bypass the variable expansion routine? IIRC the documentation never mentioned that filenames are expanded as if they were variables...

No, it doesn't. The variable expansion function gets exactly the same argument as if you had passed it as a command line argument. It has no idea where the command line came from (and I'm not writing a custom variable expansion routine for every command!).

Some of these exist on systems over which TCMD users have no change authority. POSIX shells have no problem dealing with them. It is time for TCC to be able to do so directly, without the need for work-arounds.

That's nonsensical -- there filenames that POSIX shells can't handle that TCC can. Should I change TCC so it will fail on those names too?

It is simply impossible for TCC to handle every imaginable marginal or outright invalid filename without disabling features (which is what SETDOS will do for you). Repeatedly asking for the impossible doesn't eventually make it possible. (I urge you to write your own command interpreter and find out for yourself.)

If you really want to use idiotic filenames with embedded %'s, you can always set CMDVariables=Yes. Then (like CMD) TCC will only try to expand the filenames when they have two or more %'s.
 
It is simply impossible for TCC to handle every imaginable marginal or outright invalid filename without disabling features (which is what SETDOS will do for you). Repeatedly asking for the impossible doesn't eventually make it possible. (I urge you to write your own command interpreter and find out for yourself.)

Hi Rex,
I appreciate your frustration on this issue, but I must say that I identify with Steve's point. I love virtually everything about TCC, and I use it constantly, but the one real gripe that I have is about this point. There are just too many "marginal" files on my system whose names are not within my control. I've got files with %, files with a backtik, files with a carrot (^), and more. I understand that TCC is built around the assumption that such filenames should not be valid, but I'm surrounded by such files all over my system. (And I can't simply use Vince's FixNames - Firstly, because some of those files are relied upon by other databases/programs and I can't rename them; and secondly, because some of those files are being created on the fly, and I'd have to have a hook in the file system fixing the filenames as they are created).
When LFNs became popular, these also posed a serious problem to command line interpreters, and this problem was effectively circumvented by use of the double quote symbol.
Would you be willing to brainstorm together with us to find a parallel convention for filename constants? That is - something akin to double-quotes which surround a filename, and which would indicate that *everything* within should be treated as literal, be it a carrot, a percent sign, a backtik, or anything else. Perhaps it could be a user-definable unicode character, so that every user could choose a far-out unicode char (or an undefined unicode char) that is never used in filenames on a given system.
 
Hi Rex,
I appreciate your frustration on this issue, but I must say that I identify with Steve's point. I love virtually everything about TCC, and I use it constantly, but the one real gripe that I have is about this point. There are just too many "marginal" files on my system whose names are not within my control. I've got files with %, files with a backtik, files with a carrot (^), and more. I understand that TCC is built around the assumption that such filenames should not be valid, but I'm surrounded by such files all over my system.

But those files cannot be used with CMD either (with the caveat that CMD ignores single %'s and fails on multiple %'s).

Would you be willing to brainstorm together with us to find a parallel convention for filename constants? That is - something akin to double-quotes which surround a filename, and which would indicate that *everything* within should be treated as literal, be it a carrot, a percent sign, a backtik, or anything else. Perhaps it could be a user-definable unicode character, so that every user could choose a far-out unicode char (or an undefined unicode char) that is never used in filenames on a given system.

The problem has always been in finding something -- it's not possible within the ASCII character set, and a lot of people (maybe most?) are still using raster fonts so Unicode characters don't work. I'm not enthusiastic about having it user-definable -- it would make it impossible to share aliases and batch files.
 
But those files cannot be used with CMD either (with the caveat that CMD ignores single %'s and fails on multiple %'s).
True. Indeed, because of the limitations of CMD, I used to do all my complex file operations with Visual C++ code using Windows API functions. TCC has saved me hours upon hours! The precise reason that we all use TCC is because it can do what CMD does not let us do. And that's why we wish that it would be even more complete, to be able to handle all of our files without getting stuck on an odd filename that someone created.

The problem has always been in finding something -- it's not possible within the ASCII character set, and a lot of people (maybe most?) are still using raster fonts so Unicode characters don't work. I'm not enthusiastic about having it user-definable -- it would make it impossible to share aliases and batch files.

1] Regarding the "user-definable" point - well, even the current escape chars are user-definable as well, but in practice it doesn't matter because almost everybody uses the default. Similarly here - we can pick an undefined unicode char from a fairly marginal unicode range, and then we'll also have virtually everybody sticking with it, except for a rare exception.
2] Regarding the unicode/ASCII point. First of all, the world is moving towards unicode more and more, especially power users (and users of TCC tend to be power users!). Secondly, this wouldn't modify existing behavior, but rather it is an added capability. So if someone is still on ASCII, they could either continue as they do now, or they would have the option of moving to unicode to gain this very useful added capability. And for someone who really needs this capability, moving to unicode fonts would be a very small move to make in order to gain a huge advantage. Finally, you could allow the character to be set to an ASCII char as well, so that users who are on ASCII systems could have the option of setting it to a low-ASCII or high-ASCII char that they never use in filenames.
All in all, I think that it would be a very useful and appreciated feature. I know it would completely perfect TCC in my mind, because it would mean that I'd be able to run batches on all filenames, regardless of oddball chars. I gather that Steve Fabian would agree, as well as many other users. Would you prefer that I post the request on the uservoice forum?
 
True. Indeed, because of the limitations of CMD, I used to do all my complex file operations with Visual C++ code using Windows API functions. TCC has saved me hours upon hours! The precise reason that we all use TCC is because it can do what CMD does not let us do. And that's why we wish that it would be even more complete, to be able to handle all of our files without getting stuck on an odd filename that someone created.
I used to use all kinds of other tools, until I started to use 4DOS. After that I abandoned most of them!

1] Regarding the "user-definable" point - well, even the current escape chars are user-definable as well, but in practice it doesn't matter because almost everybody uses the default. Similarly here - we can pick an undefined unicode char from a fairly marginal unicode range, and then we'll also have virtually everybody sticking with it, except for a rare exception.
Disagree. Many of us who started with 4DOS still use its set of special characters (CommandSep, EscapeChar, ParameterChar directives; SETDOS options /C, /E, /P; resp.) both to maintain functionality of old batch programs and aliases, and because of habit; although I only use %+ and %= in new or revised aliases and batch programs.

2] Regarding the unicode/ASCII point. First of all, the world is moving towards unicode more and more, especially power users (and users of TCC tend to be power users!). Secondly, this wouldn't modify existing behavior, but rather it is an added capability. So if someone is still on ASCII, they could either continue as they do now, or they would have the option of moving to unicode to gain this very useful added capability. And for someone who really needs this capability, moving to unicode fonts would be a very small move to make in order to gain a huge advantage. Finally, you could allow the character to be set to an ASCII char as well, so that users who are on ASCII systems could have the option of setting it to a low-ASCII or high-ASCII char that they never use in filenames.
There are many old ASCII-only utilities which I continue to use for lack of similar capabilities in newer products (or in some cases affordability), including the 147 home-brew BRIEF macros. So ASCII is still viable. But you realized yourself that the solution to this issue need not be a strictly Unicode one.
All in all, I think that it would be a very useful and appreciated feature. I know it would completely perfect TCC in my mind, because it would mean that I'd be able to run batches on all filenames, regardless of oddball chars. I gather that Steve Fabian would agree, as well as many other users. Would you prefer that I post the request on the uservoice forum?
Yes, I agrrrrrreeeee. But the real issue is not with filenames only, it is also with strings that contain data, e.g., read from a file. The problem is that the LANGUAGE used by TCC, inherited from the COMMAND.COM of PC-DOS 3 (or earlier), does not distinguish between command and data. All the enhancements 4DOS, 4NT and TCC brought required using many more characters as syntactical delimiters. In the days of FAT file systems these delimiters were not legal in file names, and rare in data. This allowed to continue to use the old system of totally untyped strings. Based on my own theoretical knowledge of programming languages, this lack of types will never allow unrestricted use of all characters. The POSIX-based shells make the distinction between command token and data.

If you go back in the history of the JPsoft support, you will find that others and I had requested a new syntactic form making a strict distinction between command and data many years ago. Of course, it would not really be CMD compatible, in some ways it would be closed to POSIX shells... But as long as it is a superset, or a startup option, possibly using a new file extension for its command procedures, CMD and backward compatibility can be retained.
 

Similar threads

Back
Top