extract email addresses from FROM, TO, CC, and BCC in EML files

Apr 2, 2011
1,631
15
55
North Carolina, USA
Code:
[C:\Users\csgal\OneDrive\Desktop\Export]tpipe /input=*.eml,255,1 /simple=28 /output=output.txt
The filename, directory name, or volume label syntax is incorrect.
 "*.eml"

Any idea what i am doing wrong?

/INPUT is:

Code:
/input=filename[,subfolders[,action]]

filename - Filename or folder to read. This can be either a disk file, file list (@filename), or CLIP:. If it is not specified, TPIPE will read from standard input.

subfolders - How many subfolders to include (default 0):
0 - no subfolders
1 to 255 - subfolder(s)
255 - all subfolders

action - the action to take (default 1):
1 - include the files
2 - exclude the files
3 - ignore the files
 
I have the file file processing fixed just by doing a

Code:
for /R %fn in (*.eml) (echo Working on %fn... & tpipe /input="%fn" /simple=28 /output=c:\fldr\outfile.txt /outputappend=1)

but that extracts just the whatever@where.com - not the name associated with the email address. I am not clear about using /Regex or associated verbs - so what I guess I'd like is a regex to extract just the FROM, To, BCC, and CC names and addresses- where each might be multiple lines.

Once that is done - a way to output each name and address pair on a new line.....
 
This regex might get you started.

"^(To|From|Cc|Bcc):.*$"

Code:
v:\> tpipe /input="Re A question about pointers.txt" /grep=3,0,0,1,0,0,0,0,"^(To|From|Cc|Bcc):.*$"
From:   Joe Blow <j.blow@gmail.com>
To: Vincent E Fatica

I don't have .EML files but I believe the headers should be plain text, with header names at the beginning of a line in Mixed case followed by a colon.
 
here are some EML files - basically TXT files. the regex above works well but not if the to/cc/bcc are longer then 1 line.
 

Attachments

  • HM_CSG.zip
    135.4 KB · Views: 56
here are some EML files - basically TXT files. the regex above works well but not if the to/cc/bcc are longer then 1 line.
Yeah, I thought of that. The rules are (I believe) that if a header is continued on a new line, that line begins with a space or a tab and that the headers are separated from whatever comes next by a completely empty line. You could add to the regex lines that start with a space or a tab but that would give you a lot more than you want, notably continuations of headers which you're not interested in.

I did a very brief search for software to help and I didn't find anything. Maybe a BTM is in order. It could read the file line by line, keeping track of whether you're inside or outside a header of interest.
 
Is there a way to /grep for the additional lines as long as the first chatacter is a white space/tab or similar character?
 
I don't know. TPIPE has subfilters but I don't know how to use them

Maybe this will help. It seems to do the right thing (far below) on my modified version of one of your files.

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> type parse.btm
setlocal
setdos /x-1256789A
set inheader=false
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo %line
            set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
            echo %line
        else
            set inheader=false
        endiff
    endiff
enddo
:done
setdos /x+1256789A

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> parse.btm "01-Transfer of Google data requested.eml"
From: Google <no-reply@accounts.google.com>
To: CharlesSGalloway@hotmail.com
 vefatica@foo.bar
    JOE <joe@foo.bar>
Cc: person <person@server.foo>
  MARY <mary@xxx.yyy>
  phil@google.com
 
  • Like
Reactions: Charles G
Here's another test of that BTM.

Code:
c:\users\vefatica\desktop\hm_csg\hm_csg\inbox> do f in *.eml ( parse.btm "%f" )

Edit: It did the right thing, but I deleted the output since it's probably not a good idea to post a lot of valid email addresses.
 
Last edited:
  • Like
Reactions: Charles G
@vefatica - I am trying to modify parse.btm so that it writes all address for TO, CC, and BCC all on the same line - the problem I'm trtying to solve now is if FROM and TO appear on consecutive lines, such as "01-Transfer of Google data requested.eml" in post #5 above.

The current parse is below...

Code:
COMMENT

    Trying to put all all names, addresses for TO's, CC's and BCC's all on one line.

    Works fine except if From, TO appear on consecutive lines...

ENDCOMMENT

setlocal
setdos /x-1256789A
set inheader=false
echo ===============================================================================
echo File: %fn
echo ===============================================================================
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
        echos %@trim[%line]
        set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
          echos ` `%@trim[%line]
        else
          echo ' '
          set inheader=false
        endiff
    endiff
enddo
:done
setdos /x+1256789A
 
Go to a new line every time you encounter a header of interest. ... as you have it, but ...

Code:
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo.
            echos %@trim[%line]
            set inheader=true
 
You also should use %@char[32]. ` ` won't work because the special meaning of ` has been turned off.

I don't know what you're doing with echo ' ' in the else clause.

I believe two addresses on the same line will be separated by a comma. Add a comma in other cases line this.

Code:
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            echo.
            echos %@trim[%line]
            set inheader=true
    elseiff %inheader == true then
        iff %@regex["^[ \t]",%line] == 1 then
            echos ,%@char[32]%@trim[%line]
        else
            set inheader=false
        endiff
    endiff

I'm getting output like this.

Code:
From: Google <no-reply@accounts.google.com>
To: CharlesSGalloway@hotmail.com, vefatica@foo.bar, JOE <joe@foo.bar>
Cc: person <person@server.foo>, MARY <mary@xxx.yyy>, phil@google.com


compared to ...

Code:
From: Google <no-reply@accounts.google.com>
To: CharlesSGalloway@hotmail.com
 vefatica@foo.bar
    JOE <joe@foo.bar>
Cc: person <person@server.foo>
  MARY <mary@xxx.yyy>
  phil@google.com
 
@vefatica - thanks for your help! One thing I do see is that it will cause, given the command "parse.btm sample.eml > out.txt", the first line of out.txt to be blank. Anyway to have the fist line not be blank?
 
Hmmm! You could pipe to (for example)

Code:
tail /n 10 /n+1

or to

Code:
findstr /v /r "^$"

The first will give 10 lines (enough for one file) and skip the first line. The second will get rid of all empty lines (there should be only one).

Or, depending on your taste,

Code:
set newline=no
do line in @%1
    if "%line" == "" goto done
    iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
            if %newline == yes (echo.) else (set newline=yes)
            echos %@trim[%line]
            set inheader=true

And if you want a newline at the very end (there isn't one) ...

Code:
:done
echo.
setdos /x+1256789A
 
parse.btm is currently:

Code:
COMMENT

    Trying to put all all names, addresses for TO's, CC's and BCC's all on one line.

ENDCOMMENT

setlocal
setdos /x-1256789A
echo ===============================================================================
echo File: %fn
echo ===============================================================================
set inheader=false
set newline=no
do line in @%1
    if "%line" == "" goto done
  iff %@regex["^(To|From|Cc|Bcc):",%line] == 1 then
    if %newline == yes (echo.) else (set newline=yes)
    echos %@trim[%line]
    set inheader=true
  elseiff %inheader == true then
    iff %@regex["^[ \t]",%line] == 1 then
      echos %@char[32]%@trim[%line]
    else
      set inheader=false
    endiff
  endiff
enddo
:done
echo.
setdos /x+1256789A
 
I created a file with this contents:
Code:
$ type foo.txt
TO: a@a.com, b@b.net, c@d.info
CC: a@a.com, b@b.net, c@d.info
BCC: a@a.com, b@b.net, c@d.info
Code:
do l in @foo.txt (do i=1 to %((%@words[":,",%l]-1)) (echo %@word[":,",0,%l]: %@word[":,",%i,%l]))
TO:  a@a.com
TO:  b@b.net
TO:  c@d.info
CC:  a@a.com
CC:  b@b.net
CC:  c@d.info
BCC:  a@a.com
BCC:  b@b.net
BCC:  c@d.info
 
  • Like
Reactions: Charles G
the above parse.btm does work as expected. However if i wanted to check for email addresses when not in the header - how best and fairly efficiently could I use the search string that email[line] has and optionally what if the email address continues to the next line?
 

Similar threads