Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

WAD Peculiar Behavior in Quoted Strings

Jun
562
4
Coming from the world of CP/M and DOS rather than Unix, I almost never used regular expressions; the extended wildcards in TCC generally did everything that I needed. Lately, however, after seeing how others use regular expressions, I've started to experiment with them.

And that has led to my noticing something that I find very peculiar.

The help text goes to great lengths to describe two ways to pass strings that have special characters.

Double Quotes​


A string can be enclosed in double-quote characters. Most special characters are treated literally, but environment variables are expanded. Thus, if we have defined the variable char as follows

set char=A

then we get the following:

C:\>echo "<%char>"
"<A>"

The variable is expanded, but the "redirection" characters are treated literally. And the quotation characters are retained.

Back Quotes​


A string can also be enclosed in back quotes. In that case, the entire string is to be taken literally, variables are not expanded, and the quotation characters are not passed along. Thus

C:\>echo `<%char>`
<%char>

The Peculiar Case of the Caret Character​


The one character that defies the rules is the caret character. When it appears in a string enclosed in double-quote characters, it is treated literally. Its special behavior is lost.

C:\>echo "a^sb"
"a^sb"

Ordinarily, ^s would be converted to the space character, but in double-quoted strings it is not.

Counter-intuitively, inside back quotes, the caret character, unlike all other characters, maintains its special meaning. Thus

C:\>echo `a^sb`
a b

The string ^s is converted to a space character.

The Problem​


The big problem with this is that the caret character is very important in regular expressions. It can indicated the beginning of the string or serve to negate another expression. I expected that I could pass that character in a regular expression by enclosing the entire expression in back quotes. But that fails to work! I would have expected the following command to find all files whose name starts with the letter A.

dir ::`^a`

However, the ^a becomes just a, and all files that contain the letter A anywhere in their name are displayed.

Using double-quotes works.

dir ::"^a"

Fortunately, the regular-expression interpreter does not mind the double-quote characters that are not removed from the string.

A Bug?​


This certainly looks like a bug to me. Maybe Rex will provide a reason for this behavior, but why do we need to retain the special behavior of the caret character in a back-quoted string? We don't need it. We can pass special characters without using a caret.

C:\>echo `a b` & echo `a^sb`
a b
a b

But we can't pass a caret without doing something extra!

C:\>echo `a^b` & echo `a^^b`
a
a^b
 
I'm not sure I could spell out all the rules that govern ^ (escape), ` (strong quote), and " (double quote). Maybe it's something like this, the parsing proceding left to right.

Anything inside strong quotes is left alone except that (1) ^^ turns into ^ (2) ^` gives a literal ` (3) documented escapes (^q, et al.) are substituted (4) otherwise ^ disappears; the strong quotes are removed.

Except as above, inside double quotes, ^ and ` are nothing special, %% turns into %, and variables are expanded; the double quotes remain.

It's a bit mind-boggling and I could be way off; better explanations are welcome.

However it was done, I'm confident we would find challenges.
 

The Peculiar Case of the Caret Character​


The one character that defies the rules is the caret character. When it appears in a string enclosed in double-quote characters, it is treated literally. Its special behavior is lost.

Ordinarily, ^s would be converted to the space character, but in double-quoted strings it is not.

Counter-intuitively, inside back quotes, the caret character, unlike all other characters, maintains its special meaning. Thus

The string ^s is converted to a space character.

WAD.

As with most incomprehensible things the parser does, this is for compatibility with CMD.

Try it with CMD, you'll get the same result.
 
I hate CMD! That's why I use TCC. In fact, I don't think that I have ever used CMD in my life! I think I've had 4dos from the time I switched from CP/M (Z-System), but maybe I had to use COMMAND.COM for a little while. Too long ago to remember.

I do understand why you need to maintain compatibility (even to the level of reproducing CMD's bugs). However, in an ideal world, there would be a configuration switch, like "Duplicate CMD bugs", that would just do things right and ignore CMD ("Ignore CMD") :smile:

Seriously, it would be good if these things were documented in the help. (Of course, maybe they are but I didn't see them or remember them.)
 

Similar threads

Back
Top