How to tell when it's the end of an email

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
I'm writing a program that will run with procmail, but my issue is that I don't know when to stop looking for data. How can I programaticly know when it's the end of the email? Do I need to look for char 255? (looks like a Y with a cross in it) Since the end of emails seems to be full of those, ex: if I grab more data then needed. But I'm not 100% sure if this is the way of doing it.

The content-lenght header seems to get added by the client and not by the MTA, so when the email goes through my program, that header does not yet exist. So that's not an option either.

Thanks in advance.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Are we talking about smtp here? The end of a message in smtp is a newline followed by a dot followed by a newline (aka a '.' on a line by itself)
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
Not exactly smtp, more pop3ish, but this is before it even hits the inbox, but from the tests I've done I don't get that (figured it was that too). Here's a sample email:

From test@borg.loc Tue May 24 19:16:56 2005
Return-Path: <test@borg.loc>
X-Original-To: test@borg.loc
Delivered-To: test@borg.loc
Received: from tools.loc (unknown [192.168.1.10])
by mail.icetekshq.ath.cx (Postfix) with ESMTP id 97901EC1BD
for <test@borg.loc>; Tue, 24 May 2005 19:16:56 -0400 (EDT)
Received: from 192.168.1.100
(SquirrelMail authenticated user test);
by tools.loc with HTTP;
Tue, 24 May 2005 19:16:56 -0400 (EDT)
Message-ID: <3074.192.168.1.100.1116976616.squirrel@192.168.1.100>
Date: Tue, 24 May 2005 19:16:56 -0400 (EDT)
Subject: 1111111112
From: "Test Account" <test@borg.loc>
To: test@borg.loc
Reply-To: test@borg.loc
User-Agent: SquirrelMail/1.4.3a
X-Mailer: SquirrelMail/1.4.3a
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
X-Virus-Status: No
X-Virus-Checker-Version: clamassassin 1.2.2 with clamscan / ClamAV 0.84/893/Tue May 24 02:27:20 2005

111111111111111112

ÿÿÿÿÿÿÿÿÿÿ

The ÿ's occure when my program keeps checking for data and the email is done, since I just set it to read 1500 chars with a for loop, for now, but obviously I'll want something more practicle since I won't just be dumping this in a file, but handling it then modifying it, then outputing it back.

 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
I think your best bet would be to find out exactly what the protocol is called and then find the specs for it (if no one here knows it). If it's a popular protocol then there's a good chance that there's an rfc or two about it.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
Well it's mail, but at this level not sure if it should be considered pop3 or smtp, since it's in the middle of landing in an inbox. But it's a standard proceedure, just not sure how other programs like spamassassin do it. For now my program checks for char 255, but that's not efficient, as if it happends to apear in an email, then it thinks that's the end.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
There's a definite difference between pop and smtp, one's for retrieving email, one's for sending it.

Anyways, I just looked up pop and it also uses a '.' on it's own line to specify the end of a message so if you're not seeing that then I suspect you're not using either pop or smtp.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
This is a little different since when the email is "received" by the program, it's in the process of going to the inbox, but is not there yet. (this is all server side)

Here's a better explaination of the mail route in my setup:

Fetchmail checks pop3 servers, receives mail then goes to: -> SERVER -> procmail -> Virus scan -> spamassassin -> My Program ->User's inbox

If I was to telnet to an smtp server to send mail then yes I'd use . on it's on line to end it, but in this case the message has already been sent. But I think spamassassin is open source, so maybe I can check their source to see how they do it.

I hope I explained this better. :D

Edit: I'm thinking this may be more procmail specific then protocol specific. I'm thinking there must be some kind of special ascii char/combination that is hidden somewhere or something, but hard to know is it does not show up in the text file that I made it generate.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
I'm not sure I understand the deal as I don't have much experience with mail servers but I would agree that you're not using either pop or smtp :)

Maybe you can dig into the procmail source code?
 

kevnich2

Platinum Member
Apr 10, 2004
2,465
8
76
You need to look at smtp settings since that's what your using. POP3 is ONLY used for retrieval and actually IMAP is not becoming very standard for retrieving mail. But from the point of sending data to it landing on the other receiving mail server, it's all port 25 (SMTP)
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
smtp is used on the internet, it doesn't have to be used by anything internal and by the time he's used fetchmail with pop to retrieve the email he's well past the point where smtp needs to be used. In theory he could be using any arbitrary protocol for passing the message along.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
procmail should give you 1 piece of mail at a time, all you should need to do is look for an EOF.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
Isin't EOF only for when you are working with a file stream? Or am I wrong? If yes then how do I check for it. Here's my code for a better understanding of how I'm getting this data:

while(1)
{
a[1]=a[0];
a[0] = getchar();
if(a[0]==255)break;

cout<<a[0];

fout.put(a[0]);
}

Basically the way it receives data is that it's "thrown" directly at the application. Same way you would do with a program using programname<file

Right now I just check for char 255, but this is not the right way, since if that char happends to be in an email, it will make my program think it's the end of it.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
I'm not using stdin though, just basically grabbing data with cin.get(), unless this is called stdin, if yes I did not know. What would be considered the handle then, since filestreams have a handle, so I can go while(!eof(handle)){ do stuff }.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Yes, you're reading the STDIN file descriptor. STDOUT is the default output and STDERR is the default for error output, all 3 are opened by default for all programs. For portability it's probably called STDIN or stdin, I can't say for sure since I don't do much C++. But you should be able to use 0, on most systems fd 0 is stdin, 1 is stdout and 2 is stderr, just realize that your program might not work properly on all systems if you use hardcoded numbers.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
Oh I see, so is this the right way of doing it? Just did some googling and tried it and it works but just want to ensure it is in fact the right way.

while(!cin.eof())
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
My bad, it's not working, I thought I had commented out the one checking for char 255, but it was not actually commented out in the executable version, commenting it out now makes it an infinite loop, it never receives a EOF.

string email;

while(!cin.eof())
{
a = getchar();

email+=a;
}
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
My C++ is extremely rusty, so I did this in C and it worked perfectly fine by running './test < somefile'. I don't have procmail handy to test with and I'd rather not risk breaking my mail setup just for this, but it should be no different.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
Thanks that seems to have worked, just had to switch the char back to normal char instead of unsigned char, but now that I'm not looking for ascii value I dont need it to be unsigned anyway.
 

Red Squirrel

No Lifer
May 24, 2003
70,403
13,702
126
www.anyf.ca
I was using if(a==255) since the char255 was indicating that the email was done, until I know a better way, but now I don't need that. With a normal char the range is different so 255 is out of range.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Yes, but you were using getchar and getchar's man page specifically says it returns a signed int.

Hell, the man page also says the getchar will return EOF if the end of the file is reached. Did you even look at the man page for getchar?
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
getchar is a C function and there are man pages for a lot C functions if you have the appropriate packages installed, C++ documentation is harder to come by.