6/13/2006

What's in a header

The protocol we use to transmit email was created many decades ago to be used by the military and the academics who ran the net that we know today as the Internet.
I guess that having computers connected and remote access to storage, someone came up with the idea of setting a directory structure for local users where others could save a file, a message, for that particular user. Eventually, SMTP was developed to perform that job in an orderly, practical and automatic way. SMTP stands for Simple Mail Transport Protocol, and it's exactly that, a simple protocol to pass mail files.
The file is built from the text you write, once is done the rest of the information is piled on top of it. Once you're done with your text, the mail application (or the server if you're dealing directly with it) will start adding information like who are you (FROM field), who is the mail addressed to (TO field), what is the message about (SUBJECT field), timestamp (DATE field), size of the message, carbon copy, blind carbon copy, etc. None of these fields are mandatory nor enforced, in fact there's a way to send a message without a TO field, even a blank message without any field at all. But almost all the mail client applications we use, including web interfaces, take good care of it. So, if you check the source code of any of your messages, you'll see that most of the fields are there.
Here's an example

Date: Tue, 13 Jun 2006 10:30:43 -0700 (PDT)
From: Barrister jones <barristeredwardjones2@yahoo.com>
Subject: REPLY ASAP.
To: jwolfy@gmail.com
(Message follows..........)


So far this is what's contained in a message file (almost all) at the time it's stored by your SMTP mail server. From this point up, is all routing. Every line added will say who received the file, from whom, when and how. The first line should be your SMTP server saying that he's got the message from you.
Received: from [196.3.62.3] by web55407.mail.re4.yahoo.com via HTTP; Tue, 13 Jun 2006 10:30:43 PDT

In this case, web55407 is the Yahoo's web interface from where Barrister Jones sent this message. His IP address is there, 196.3.62.3, and according to AFRINIC is in Ebene, Mauritius. The user itself could be somewhere else. In this case I think he's in Nigeria because the phone number he gave me is there, probably he's using a satellite link with an earth station in Mauritius or something like that.

Once the message is complete, Yahoo will try to get to the destination server. They actually pass the message to another process

Received: (qmail 20439 invoked by uid 60001); 13 Jun 2006 17:30:44 -0000

and that process sends to the destination server

Received: from web55407.mail.re4.yahoo.com (web55407.mail.re4.yahoo.com [206.190.58.201])
 by mx.gmail.com with SMTP id 37si1451863nzf.2006.06.13.10.30.44;
 Tue, 13 Jun 2006 10:30:45 -0700 (PDT)


Then Gmail passes it through two other servers, they're all in its local network (check the IP address with first octect 10), I can't say why but it has to do with their system structure. My guess is that they have a front end connected to the Internet (mx.gmail.com, most likely more than one server) who passes the message to a hub (10.37.15.13) who knows where each mailbox is located and passes the message to its final destination (10.36.250.24 in this case).

Received: by 10.36.250.24 with SMTP id x24cs7453nzh;
 Tue, 13 Jun 2006 10:30:45 -0700 (PDT)
Received: by 10.37.15.13 with SMTP id s13mr11224004nzi;
 Tue, 13 Jun 2006 10:30:45 -0700 (PDT)


Remember that the lines are added on top, so now you're looking at them in chronological order but going through the file upward.
In the middle you'll find some other information that servers add to improve the quality of the service like a message id, a Delivered-to field in case the To field doesn't exist or the destination address is in BCC (blind carbon copy), etc.
Yahoo adds a signature to each message (DomainKey-Signature) that allows them to check that each message is valid when passed from server to server through its network and, in case of an abuse report, that the message was originated from its servers.

As you can see, SMTP is very simple but also unsafe and unreliable. No one is to blame, the people who designed it in the first place was trying to solve a problem they had at that time and safety wasn't an issue.
In this example, Yahoo verified the "identity" of the sender by means of a password. Gmail took the message from Yahoo in good faith, it's not checking if it's really Yahoo sending nor can it verify the identity of the sender inside Yahoo.
In fact, take a look at this routing

Received: (qmail 11191 invoked by uid 0); 30 May 2006 17:51:46 -0000
Received: from unknown (HELO 89-178-30-158.broadband.corbina.ru) (89.178.30.158)
 by 0 with SMTP; 30 May 2006 17:51:46 -0000
Received: by nyf15.pamico.com id 86jo739p33c9 for <user@server.com>; Tue, 30 May 2006 19:51:44 +0100 (envelope-from <BerniceClark@kertel.com>)
Received: (qmail 15334invoked from network); Tue, 30 May 2006 19:51:44 +0100
Date: Tue, 30 May 2006 19:51:44 +0100
Subject: Erection problems can be fixed Franklin
From: "Reyes" <BerniceClark@kertel.com>
To: user@server.com


There was a lot of telltale in the header to identify this as spam, but I stripped it down to focus on the routing.
user@server.com is my mail address, the message was generated and addressed to me. Reyes, with the email address BerniceClark@kertel.com, sent it and amazingly he/she knows about my erection problems though he/she doesn't know my name.
The message is timestamped and sent to a process qmail, it doesn't identify the sending node and my best guess is that they're both on the same machine.
Then the message is sent to nyf15.pamico.com, a domain hosted by GoDaddy somewhere in Arizona. Remeber this because is important.
Next, my server (unknown) receives it from 89-178-30-158.broadband.corbina.ru, Moscow Russia, and passes it to qmail, a process that stores the message in its final destination.
I know the last part by heart, is my server and I know how it's configured.
But the odd thing is how a message sent to me goes through Arizona and Russia.
The answer is it doesn't, is a fake header. The message was generated by a client of Corbina broadband service in Moscow Russia.
The mass mailer aplication creates a fake header to make it harder to trace the source, it's silly because the people that take the time to trace won't fall for it.
And the other small detail is that BerniceClark@kertel.com didn't do it. It could have been any other mail address including yours, typically the spammer takes one from the lot he's spamming to.
So, how this routing must be read to know where it came from?
The routing is a list of declarations where everyone involved in the transit of the message takes custody of it. So we have to start from the one we believe, our own server.
It can be fooled but only up to some extent.
If you check the line where unknown receives the message, there's a HELO. This is a literal copy of the declaration the sender does at the start of the SMTP session, typically HELO and its name (smtp.server.com), but it could say anything. Is up to the sending server and is not mandatory nor enforced.
Besides that, there's also the IP number. My server logged it regardless of the sending server declaration.
And this is the starting point, my server says that it received the message from 89.178.30.158 and at this point is the only one I can trust.
Checking the IP number (you can try any "whois" web page online), I see that the name declared matches the IP number. In some other cases the spammer uses a domain name, that may or may not exist but not related to the IP, or nothing at all.
Here's one with a fake domain name, the name exist but there are no server running under it, with an IP belonging to Telenet in Bulgaria

Received: from unknown (HELO coy.slivnica.com) (213.169.59.20)

In this one the server identifies itself as Yahoo Argentina but the IP belongs to Cablevision Argentina (cable TV and ISP)

Received: from unknown (HELO yahoo.com.ar) (200.114.224.55)

This one has an empty HELO

Received: from unknown (HELO) (68.161.93.166)

And this one, anything

Received: from unknown (HELO 4736EC68) (221.168.136.188)

I think that this is a good point to control spam, all the receiving server has to do is check that the HELO declaration matches the IP of the sender. That alone would cut half of the spam, and some "legal"mail too. But that's easy to fix, all they have to do is use a proper HELO.
Then we can raise the bar a notch and check that there's an MX record for the sender's IP, just to filter those who doesn't lie in the HELO declaration.
An MX record is an entry in a domain name database (DNS, the service that turns the names we understand into the IP numbers the network understands) saying where's the server handling mail for a domain.
All mail servers should have an MX record and most do. The problem is with huge mail services that have many servers and they handle either reception or delivery. Most likely the receiving servers have MX records but the sending ones don't. I still think that's easy to fix.
And then we can raise the bar yet another notch and start banning servers that don't lie their HELO declaration, that matches their IPs and that have an MX record but spam like crazy.

Wouldn't it be nice?

Today we say goodbye to this users, may the ceiling fall over their heads

drmahmoudoffice@yahoo.fr
hamar1233@yahoo.co.uk
niclosedem@yahoo.co.in
john_wilson1947@yahoo.com
divinefoundation01@yahoo.com
legalmatterzng@yahoo.com


And a special dedication to my case officer, inspector Jonhson. I guess that without his e-mail account I'm free from my e-arrest

inspectorjonhson_britishpolice@yahoo.co.uk

No comments: