Insert text at start/end of line

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,365
83
Albuquerque, NM
prospero.unm.edu
I spent way too much time trying to do this with regular expressions....

Insert text at start of line:
Code:
tpipe /input=filename.txt /insert=0,1,"rem "

Insert text at end of line:
Code:
tpipe /input=filename.txt /insert=0,0," //"

Both at once:
Code:
tpipe /input=filename.txt /insert=0,1,"[[ " /insert=0,0," ]]"
 
May 20, 2008
11,046
90
Syracuse, NY, USA
This too.

Code:
vefatica@jj:~$ echo foo | sed -e 's/.*/prefix \0 postfix/g'
prefix foo postfix
vefatica@jj:~$ echo foo | sed -e 's/.*/[[ \0 ]]/g'
[[ foo ]]

But it gets all fouled up when I try to use it from windows. Below, the second is expected; the first seems wacky.

Code:
v:\> echo foo | (wsl sed -e 's/.*/[[ \0 ]]/g')
 ]]foo

v:\> echo foo | (wsl sed -e 's/.*/[[ \\0 ]]/g')
[[ \0 ]]
 
May 20, 2008
11,046
90
Syracuse, NY, USA
The TPIPE/regex one wasn't too hard (only about 5 minutes).

Code:
v:\> echo foo | tpipe /replace=4,0,0,0,0,0,0,0,0,"^(.*)$","pre $1 post"
pre foo post

v:\> echo foo | tpipe /replace=4,0,0,0,0,0,0,0,0,"^(.*)$","[[ $1 ]]"
[[ foo ]]
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,365
83
Albuquerque, NM
prospero.unm.edu
The TPIPE/regex one wasn't too hard (only about 5 minutes).

You win.

Okay, here's a puzzle for you. I have a number of HTML files I'd like to deHTMLize. TPIPE /SIMPLE=16 /SIMPLE=85 is a good start, but still leaves a whole lot of gribble: header, style sheet, scripts....

So what I'd like is a way to include only the text between <BODY> and </BODY>, and omit everything outside those two tags. /XML looks like it should be useful, but I can't get it to do anything. Can you see any way to do this with TPIPE?
 
May 20, 2008
11,046
90
Syracuse, NY, USA
So what I'd like is a way to include only the text between <BODY> and </BODY>, and omit everything outside those two tags. /XML looks like it should be useful, but I can't get it to do anything. Can you see any way to do this with TPIPE?

Geez! I used to have a plugin called FROMTO that would do just that. It's such a simple task that I'm surprised TPIPE doesn't have it. I wonder if there's something UNIXy that'll do it.
 
May 20, 2008
11,046
90
Syracuse, NY, USA
Charles, did you see this (restrict to between tags)? I can't figure out how to use it.

TPIPE - Text filtering, search, and substitution
/xml=Type,IncludeText,IncludeQuotes,MatchCase,BufferSize,Tag,Attribute,EndTag

Adds an HTML / XML filter. The arguments are:

Type - the operation to perform:

0 restrict to an element

1 restrict to an attribute

2 restrict to between tags

IncludeText - whether to include the find string in the restriction result (default false)

IncludeQuotes - whether to include surrounding quotes in the attribute result or not (default false)

MatchCase - match case exactly or not (default false)

BufferSize - the maximum expected size of the match (default 32768)

Tag - the element or start tag to find

Attribute - the attribute to find

EndTag - the endTag to find
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,365
83
Albuquerque, NM
prospero.unm.edu
Charles, did you see this (restrict to between tags)? I can't figure out how to use it.

Yes, I spent a while messing with that one. I think that lets you set up subfilters that would affect everything between, e.g., <BODY> and </BODY>. But they don't affect anything outside the selection. So, not useful for my purposes.
 
Apr 18, 2014
269
9
I'm not sure if this will do what you need Charles, but a tpipe replace filter that extracts non matching text might be the answer. I've only tried this with a trivial test file as below, so apologies if it fails horribly for your use case.

This is my test file:
Code:
d:\batch\test>type foobar.txt
This is before the body tags
<Body> This is <sometag> the body text <anothertag> interspersed with <onemoretag> tags
This is more body text
</Body>
This is after the body tags

And my understanding of what you want to do is end up with just the text between the <Body> and </Body> tags, and also strip out any other tags between them.

This tpipe command works on the above sample file:
Code:
d:\batch\test>tpipe /input=foobar.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(<Body>.*</Body>)","$1" /simple=16
This is  the body text  interspersed with  tags
This is more body text

It uses a regex backreference to replace the text between the body tags with itself (so it's unchanged), and the replace filter is set to discard non-matching text, so you're just left with the text you want, and then the /simple=16 gets rid of the tags themselves.

Hopefully that might give you something that will help...

Edited to add: I forgot to say i did a Setdos /x-6 before that tpipe command to stop the tags being mistaken for redirection.
 
Last edited:
May 20, 2008
11,046
90
Syracuse, NY, USA
This is the first time I've seen a subfilter work. Below (1) the filter seems to affect only what's between the tags and (2) it seems to determine what is to be removed (as opposed to what's to be kept).

Code:
v:\> type tag.html
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /xml=2,0,0,0,32768,"body",foo,"/body" /startsubfilters /grep=4,0,0,0,0,0,0,0,"junk|junk|junk" /endsub
filters
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /xml=2,0,0,0,32768,"body",foo,"/body" /startsubfilters /grep=4,0,0,0,0,0,0,0,"before|inside|after" /e
ndsubfilters
before
<body>
</body>
after
 
May 20, 2008
11,046
90
Syracuse, NY, USA
So, you're essentially treating the whole file as one very long line? Clever! Thank you.
Hmmm! It doesn't work very well on TCC's dir.htm ... no output at all.

Code:
v:\help26> grep -i body dir.htm
html, body {
html, body { overflow: auto; }
<body>
</body>

v:\help26> tpipe /input=dir.htm /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1" /simple=16

v:\help26>
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,365
83
Albuquerque, NM
prospero.unm.edu
It's not a big file. But viewed as a single line.... That's a long line!

(That is what that /EOL is doing, right? Am I understanding correctly? That's not how I would read the help file. But it seems to work.)
 
Apr 18, 2014
269
9
That is what that /EOL is doing, right? Am I understanding correctly?)

I don't think so, but then I might be wrong - tpipe remains a bit of a black art to me, I'm sometimes surprised with the results I get!

What I meant the /EOL to do is convert the CR/LF line endings into LF, you can see it's not just one long line if you run the tpipe command on my sample file with just the /EOL filter. The reason for doing that is that my understanding (from my experimentation) of the PERL pattern matching in the /replace filter is that the ".*" pattern will match EOL, providing they are just LF. So by changing the line endings the pattern "<body>.*<body>" matches everything between the body tags, even if spread over multiple lines.

It's not a generic, bombproof solution unfortunately; Vince has discovered that large files need a bigger buffer, and the regex needs some work to match cases where the body tag is, for example:
Code:
<body bgcolor="#FFFFFF">
 

Charles Dye

Super Moderator
Staff member
May 20, 2008
4,365
83
Albuquerque, NM
prospero.unm.edu
What I meant the /EOL to do is convert the CR/LF line endings into LF, you can see it's not just one long line if you run the tpipe command on my sample file with just the /EOL filter. The reason for doing that is that my understanding (from my experimentation) of the PERL pattern matching in the /replace filter is that the ".*" pattern will match EOL, providing they are just LF. So by changing the line endings the pattern "<body>.*<body>" matches everything between the body tags, even if spread over multiple lines.

Thank you.
 
May 20, 2008
11,046
90
Syracuse, NY, USA
It's a little hard to figure out what's happening. If I put the bare LF in a file myself, then RogerB's strategy doesn't work (/replace doesn't see across the LF).

Code:
v:\> echos 1abc^ndef2 > 12.txt

v:\> type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

v:\>

If I've got that test right, what's going on?
 
Apr 18, 2014
269
9
If I've got that test right, what's going on?

It's me that's wrong in my use of the /EOL filter, Charles is correct and it's stripping out all of the line endings and making one long line of it.

Code:
:\>type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

d:\>tpipe /input=12.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"
1abcdef2

d:\>tpipe /input=12.txt /eol=2,0,0,0,0 | type /x
0000 0000 31 61 62 63 64 65 66 32                           1abcdef2

Oh well, back to the help file to see if I can understand the EOL filter! I'm coming to the conclusion that once you get something working with tpipe it's best not to think too hard about it :smile:
 
May 20, 2008
11,046
90
Syracuse, NY, USA
It's me that's wrong in my use of the /EOL filter, Charles is correct and it's stripping out all of the line endings and making one long line of it.

I think it's OK. TPIPE isn't stripping them, it's leaving LF and the regex in /replace is seeing that as a character (EOL being CRLF).

Code:
v:\> echos 1^r^n2 > 12.txt

v:\> type /x 12.txt
0000 0000 31 0d 0a 32                                       1..2

v:\> tpipe /input=12.txt /eol=2,0,0,0,0 /replace=4,0,0,0,0,1,0,0,0,"1.2","xxx"
xxx

(A little odd) /grep doesn't see the bare LF as a character.

Code:
v:\> type /x 12.txt
0000 0000 31 0d 0a 32                                       1..2

v:\> tpipe /input=12.txt /eol=2,0,0,0,0 /grep=3,0,0,0,0,0,0,0,"1.2"

v:\>
 
Apr 18, 2014
269
9
I think it's OK. TPIPE isn't stripping them, it's leaving LF and the regex in /replace is seeing that as a character (EOL being CRLF).

Hmmm. I was just testing something similar and came to the same conclusion, the /eol=2,0,0,0,0 definitely changes a CR/LF pair to just an LF, and I see the /replace pattern matching the embedded LF too.

Which leaves the question of why your example doesn't work:
Code:
v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

I can't see what's happening there at all. After all, it works with the foobar.txt file in my earlier post.
 
May 20, 2008
11,046
90
Syracuse, NY, USA
Which leaves the question of why your example doesn't work:
Code:
v:\> tpipe /input=12.txt /replace=4,0,0,0,0,1,0,0,0,"(1.*2)","$1"

I can't see what's happening there at all. After all, it works with the foobar.txt file in my earlier post.
Do you mean when the bare LF is already in the file? See me post in "Support". Apparently TPIPE changes that to a CRLF upon input!
 
Apr 18, 2014
269
9
Do you mean when the bare LF is already in the file? See me post in "Support". Apparently TPIPE changes that to a CRLF upon input!
Ah, I see! That certainly looks to be a bug in tpipe, you don't even need to use a filter, just let tpipe read the file and you see it:
Code:
d:\>type /x 12.txt
0000 0000 31 61 62 63 0a 64 65 66  32                       1abc.def2

d:\>tpipe /input=12.txt | type /x
0000 0000 31 61 62 63 0d 0a 64 65  66 32                    1abc..def2

We've got a bit sidetracked from the question Charles originally posed, I hope something in this thread has helped solve his issue!
 
Apr 18, 2014
269
9
(A little odd) /grep doesn't see the bare LF as a character.

The help describes /grep as a "line based filter", so perhaps it is only intended to work with lines from a file, and hence newlines will always terminate the expression that's being matched.

According to the help you can set newline matching behaviour for search/replace filters with /perl=. The "DotMatchesNewLines" option says "Allow the '.' operator to match all characters, including new lines. Default is true".
 
May 20, 2008
11,046
90
Syracuse, NY, USA
As for DotMatchesNewLine, it doesn't seem as though the default is TRUE.

Code:
v:\> type tag.html
before
<body>
inside
</body>
after

v:\> tpipe /input=tag.html /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1"

v:\> tpipe /input=tag.html /replace=4,0,0,0,0,1,0,0,0,"(<body>.*</body>)","$1" /perl=,,,1
<body>
inside
</body>
v:\>

"/perl=,,,1" seems to have no effect on a /grep filter.
 
Apr 18, 2014
269
9
if only I understood it better....
I know that feeling, most of my efforts to understand tpipe leave me feeling like I’m in a dark room looking for a black cat that isn’t there. However, it is an insanely powerful command if you can manage to wrangle a set of filters into doing what you want.
 
I played with this a little without success (probably should have used setdos first). I would think the DotMatchesNewLines option of /perl would be helpful. Regular expressions are complicated enough, but if TCC messes with the command line, you have no idea whether it is TTC or tpipe that is messing things up. A debug option for tpipe would help where it told you what it thinks you told it to do, i.e., what all the parameters are that TCC passes to it.
 
Thread starter Similar threads Forum Replies Date
Charles Dye T&T - TPIPE 0
Charles Dye T&T - TPIPE 1