Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Problem with the "List" command...

May
855
0
I've got the following batch file:
Code:
@Echo Off
SetLocal
Set FileName="The Name of A File Containing a WebPage.html"
Set FileSize=%@FileSize[%FileName]
Set FH=%@FileOpen[%FileName,r,t]
Iff %FH == -1 Then
  @EchoErr Unable to open webpage %FileName
  Quit 8
EndIff
SetDOS /X-45678
Set Content=%@FileRead[%FH,%FileSize]
SetDOS /X+45678
@Echo >NUL: %@FileClose[%FH]
SetDOS /X-45678
@Echo *****************************************************************
@Echo %Content
@Echo *****************************************************************
EndLocal
Quit 0
When I run it like this:
Code:
ReadHTML
I get this:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml"  xmlns:og="http://opengraphprotocol.org/schema/">
<head>
<title>Iron &amp; Wine Related Musicians </title>
 
 
... Lots of html ...
 
<div style='display: inline-block; vertical-align: middle; height: 20px; width: 90px; padding: 0px; margin-left: 10px;'><g:plusone size="medium" href="http://www.starpulse.com/"></g:plusone></div>
    <div id='footer_bar_close' title='close' onclick='document.getElementById("footer_bar").style.display="none"; document.cookie="footer_bar_disabled=true;domain=.starpulse.com";'>x</div>
</div>
</div>
</body>
</html>
However, when I run it like this:
Code:
ReadHTML |& List
I get this:
Code:
In other words, nothing at all.

And when I comment out the "SetDOS /X-45678" and use "Echo %@SafeExp[%Content]", TCC crashes (32-bit TCC on a 64-bit machine).

What is happening here and how do I fix it?

Oh, as an aside, using a URL works fine for a copy command, for instance, but not at all anywhere in the batch-file language (%@FileSize, %@FileOpen, etc.) This is just the way things are?

- Dan
 
However, when I run it like this:
Code:
ReadHTML |& List
I get this:
Code:
In other words, nothing at all.

Piping from your batch file to LIST works for me. You don't have an alias for LIST, by any chance?

And when I comment out the "SetDOS /X-45678" and use "Echo %@SafeExp[%Content]", TCC crashes (32-bit TCC on a 64-bit machine).

How big is your file? I would expect Bad Things to happen if it's more than about 16,000 characters (= about 32 kilobytes if it's UTF-16).

Oh, as an aside, using a URL works fine for a copy command, for instance, but not at all anywhere in the batch-file language (%@FileSize, %@FileOpen, etc.) This is just the way things are?

Yes; it only works where Rex adds code to make it work, and he's almost certainly documented all of them.
 
Thank you for your response, Charles, but it's completely off base (although how would you know?).

A. The list command works fine in other context(s), not just with this particular batch file.

B. I would never, ever, under any conceivable circumstance whatsoever because it's probably the command I use most often in TCC do an alias for the list command, and, in fact, my aliases that begin with "L" are:
Code:
Le*adPath=E:\DOS\Startx LeadPath
LF*unction=Function %$ | List
ListD*rives=E:\DOS\ListDrives.btm
So that isn't it.

C. The file is 23,328 bytes in UTF8, which would be more than what you suggest above, but the raw file lists fine with just the list command alone and I've never encountered this before. (Where is that documented?)

So the only possibility is C but, frankly, that seems unlikely to me (although, of course, I could be wrong. But, again, where is that documented?)

As an odd note TCC crashes consistently after displaying the second line of asterisks when run in a 32-bit TCC session with no redirection or piping of any kind (i.e., just the naked command alone). Not terrible because I seldom use 32-bit mode, but odd nonetheless.

- Dan
 
C. The file is 23,328 bytes in UTF8, which would be more than what you suggest above, but the raw file lists fine with just the list command alone and I've never encountered this before. (Where is that documented?)

I was addressing the possible crash in SafeChars; that plugin uses a lot of 16K-character buffers. Dumping a 22K-char file into it might very well gork the plugin. (I don't know why you would want to use @SAFEEXP anyway -- the contents of an HTML file are unlikely to be a valid TCC variable name or function!)
 
Charles,

Now I understand what you were talking about! Sorry! Unfortunately, it's also completely irrelevant. That is because the "@SafeExp" was put there after the fact (and not removed when I made my posting because by that time I considered it to be irrelevant and therefore wasn't thinking about it any more) in an attempt to fix the problem and it didn't. The problem existed before the "@SafeExp" was put there as well as after (and the problem still exists).

- Dan
 
Dan, your batch file piped to LIST worked OK for me using a 21K Unicode (BuildLog) HTM file. I know nothing of UTF8 so I tried this:

Code:
v:\> Set FileName=P:\synergy-1.3.1\gen\debug\buildlog.htm

v:\> echo %@filesize[%filename]
21262

v:\> echo %@utf8encode[%filename, utf8.htm]
0

v:\> dir /k /m utf8.htm
2012-08-28  20:17             134  utf8.htm

v:\> type utf8.htm
├┐├╛<

v:\>

Since the output file was only 134 bytes, I doubt it was a correct conversion of the original. And using TYPE on it resulted in only a handful of characters being printed. So I'll start a thread about @UTF8ENCODE[].
 
There would seem to be something wrong with @FILEREAD[handle,size] when the file is UTF8 (or perhaps more particularly contains CRCRLFs). It stops after one line even though an ample size parameter was given.
Code:
v:\> type leontiev.utf8
Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
Sciences in 1973, and three of his doctoral students have also been
awarded the prize (Paul Samuelson 1970, Robert Solow 1987,
Vernon L. Smith 2002).
 
Around 1949, Leontief used the primitive computer systems available
at the time at Harvard to model data provided by the U.S. Bureau of
Labor Statistics to divide the U.S. economy into 500 sectors.
Leontief modeled each sector with a linear equation based on the
data and used the computer, the Harvard Mark II, to solve the system,
one of the first significant uses of computers for mathematical modeling.
 
Input-output was novel and inspired large-scale empirical work; in 2010
its iterative method was recognized as an early intellectual precursor
to Google's PageRank.
 
 
v:\> echo %@filesize[leontiev.utf8]
841
 
v:\> set fh=%@fileopen[leontiev.utf8,r,t]
 
v:\> echo %@fileread[%fh,841]
Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
 
v:\>
 
There would seem to be something wrong with @FILEREAD[handle,size] when the file is UTF8 (or perhaps more particularly contains CRCRLFs). It stops after one line even though an ample size parameter was given.
Code:
v:\> type leontiev.utf8
Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
Sciences in 1973, and three of his doctoral students have also been
awarded the prize (Paul Samuelson 1970, Robert Solow 1987,
Vernon L. Smith 2002).
 
Around 1949, Leontief used the primitive computer systems available
at the time at Harvard to model data provided by the U.S. Bureau of
Labor Statistics to divide the U.S. economy into 500 sectors.
Leontief modeled each sector with a linear equation based on the
data and used the computer, the Harvard Mark II, to solve the system,
one of the first significant uses of computers for mathematical modeling.
 
Input-output was novel and inspired large-scale empirical work; in 2010
its iterative method was recognized as an early intellectual precursor
to Google's PageRank.
 
 
v:\> echo %@filesize[leontiev.utf8]
841
 
v:\> set fh=%@fileopen[leontiev.utf8,r,t]
 
v:\> echo %@fileread[%fh,841]
Leontief won the Nobel Committee's Nobel Memorial Prize in Economic
 
v:\>
P.S. Leontiev.utf8 was created from leontiev.txt (ascii) with @UTF8ENCODE. The original did not contain any CRCRLFs.

And it actually contains, as EOLs, 0x0D0D0A00
 
You can use Notepad to save in different encodings. It's under the "File | Save As..." menu. A simple text file just gets the BOM added to the start of the file. I had a text file that had a 0x92 in it (not sure how it got there) that got UTF-8 encoded as 0xE2 0x80 0x99. So my file grew by 5 bytes (3 byte BOM + 2 add'l. encoding bytes).

I used a 21349 byte UTF-8 encoded file. And I was able to pipe without issue. To be fair it was plain text and not HTML. And I had setdos /x0.

I tried with a 5495 byte UTF-8 encoded HTML file and once I did a setdos /x-6 I was able to pipe it to V so I could see it.

-Scott
 
Guys, I don't know if this has anything to do with anything, but I misspoke. The file isn't UTF8, it's plain old ASCII (8 bits confused me for a moment). And, as I said above, it has nothing at all to do with "List", it's purely a piping issued related to that batch file with that input (and possibly my system but it is independent of 32- vs. 64-bit). Sadly, the only (not completely reasonable!) alternative I can think of at the moment is a C++ program, and I've really been trying to get out of the C++ "habit" lately. I honestly don't know if doing what I want to do is worthwhile doing in C++ because it'll be a fairly substantive program (it has to read web pages off the web, which seems to be non-trivial in C++ after looking into it, whereas it's relatively trivial in TCC because a URL can be used as the source in a copy command). If I don't get an answer in a day or two, I'll either do it in C++ (sigh!) or give up on it altogether because I can live without it (sigh!!).

- Dan
 
I don't understand what you're trying to do: slurp the entire file into an environment variable, and then dump it with ECHO? Why? What's wrong with ye olde TYPE command?

At any rate, I think that if you want anyone else to be able to replicate the issue, I suggest you zip up the HTML file in question and make it available somehow, say as an attachment in this forum.
 
And guys, while I thought I had already done this, since I don't see it anywhere in this thread (?) I'll do it again. Attached to this posting is a zip file containing the actual HTML that I am trying to parse.

Also, it dawned on me that the primary thing I need to do is just write a C++ program to parse the HTML file and extract the data I'm looking for. Still some work, but not nearly as much as doing the whole thing in C++.

- Dan

P. S. This website is not allowing me to upload the .zip file ("The following error occurred: A server error occurred. Please try again later.") I'll try again much later tonight or early tomorrow morning.
 
And Charles, the reason you don't understand what the batch file is trying to do is because the one I'm working with at the moment (the one that just tried to verify that the entire HTML file was contained in an environment variable) isn't doing anything even close to what the final batch file is intended to do: Parse the HTML of possibly many different web pages to extract some specific data and then correlate that data between the different web pages. The batch file we've been talking about is just an early step along the way: verifying that the entire HTML file is actually contained in the environment variable from where it can be parsed (using @Index, primarily). However, being able to dump data along the way is an (essential! because I'm such a screw-up at this stage in my life) step in being able to verify that the batch-file-so-far is doing what I intend it to do at any given point.

- Dan
 
And Charles, the reason you don't understand what the batch file is trying to do is because the one I'm working with at the moment (the one that just tried to verify that the entire HTML file was contained in an environment variable) isn't doing anything even close to what the final batch file is intended to do: Parse the HTML of possibly many different web pages to extract some specific data and then correlate that data between the different web pages. The batch file we've been talking about is just an early step along the way: verifying that the entire HTML file is actually contained in the environment variable from where it can be parsed (using @Index, primarily). However, being able to dump data along the way is an (essential! because I'm such a screw-up at this stage in my life) step in being able to verify that the batch-file-so-far is doing what I intend it to do at any given point.

- Dan
Lotsa luck! What happens when you run into a file that's bigger than 32,767 bytes? Such a file won't fit in an environment variable.
 
Vince,

That could certainly be a valid concern which I'd already thought about, but doesn't seem to be. The largest such file I've found so far is about 28K, and, since there's very little size variation (the smallest I've seen so far is about 27K), it's doubtful that that will ever happen. And even if it does, assuming that the file is truncated at 32K, it won't matter at all because what I'm looking for is no more than about halfway into each file. And lastly, 100% accuracy, while certainly preferable, is not really needed in this application, 90% would pretty much be good enough.

But thank you for thinking about it.

And I still can't upload the file ("A server error has occurred.). But if anyone's curious/interested (which I tend to doubt ;)), you can get the page in question from http://www.starpulse.com/Music/Iron_&_Wine/MusicRelations/.

- Dan
 
Well, guys, it's become completely academic as of this point because I have, in fact, written and tested a C++ program to do the parsing of the web page(s). At this point it would seem that this would only be of interest to Rex because it's clearly an unexpected piping error as far as I can tell. But thanks to all of you who contributed to this thread! :)

- Dan
 
Well, thank you guys (I think!;)). You've just proved that there is something wrong with my system. (64-bit Windows on a new computer as delivered from the manufacturer with virtually no "customization" of any kind, not even very many non-mainstream non-Microsoft apps). But again, I've done it in C++ (which works) so it's now irrelevant other than wasting mine and several other people's time. Sorry!

- Dan
 

Similar threads

Back
Top