ENGLISH
NEDERLANDS
ENGLISH
Home
Products
Shop
Support
Downloads

In this article we will explain what needs to happen in order to make spam no longer profitable to send. In order to describe one of the problems regarding identifying and blocking of spam, we will give a short explanation of Bayesian Poisoning and probing. Finally, we will shortly discuss a method which makes it easier to differentiate between legitimate messages and spam.

By Erik Loman and Mark Loman on 15th June 2006 for Security.NL.
With thanks to John Graham-Cumming and Menno Herkes.


Effectivity of spam | Spammers vs Spam filters | How do spammers increase their volume?
Bayesian Poisoning | The rise of contamination | Probing
Hashcash | How does Hashcash work? | SpamAssassin


In January 2004 Bill Gates said: Two years from now, spam will be solved.

Today more than 90.000.000.000 (90 billion) spam messages are sent every day (source: Wikipedia).
Due to the fact, that spammers also randomly make up e-mail addresses, most of the spam messages are not sent to actually existing addresses. But you can imagine, that the internet and the computers of private users as well as companies are unnecessarily overloaded and slowed down because of this.

The fight against spam is therefore quite important. Not only does it overload computers and networks, it also irritates the users, who have to invest time and patience in manually working away spam. Obviously in this case visionary and successful Bill Gates got it wrong.


Effectivity of Spam

Source: John Graham-Cumming www.jgc.org

Why do we get spammed at all, if everyone is annoyed about it? Spammers claim that they are already on break even regarding their costs, if only 0,001% to 0,01% of the sent e-mails deliver a response. Research showed, that 5% of spam recipients actually answer the message and buy from the spammer. Because of this research result, spam filters have to become 5.000 times to 50.000 times better than they actually are today, in order to render spamming financially unprofitable.


Spammers vs Spam Filters

At the moment 90% of all e-mail boxes in the world work with a spam filter - either installed by the internet provider (ISP) or by the user himself. A reasonably effective spam filter holds 90% of the spam at bay. Looking at the effectivity this means that 80% of spam is detained and 20% reaches the inbox. Spammers therefore need to be 5 times as good in order to make sure they reach all mailboxes in the whole world.

Given that spam filters from mailboxes will get better and that they keep 99% of the spam messages away. Then this means that effectively 90% of spam is blocked.
And that means again that spammers have to become 2 times as good (as they are now) in order to win over such kind of spam filters.

Given that all mailboxes in the world (100%) have such kind of spam filters. Effectively, 96% of all spam would then be detained. Spammers would have to be5 times as good as they are now.

And now imagine, all mailboxes in the world would use the best spam filters with the result, that 99% of all spam would be blocked. Then spammers would have to be20 times better than they are today.


How do spammers increase their volume?

The question now actually is: how difficult is it for spammers to send 5 times to 20 times more volume than what they send now?

The way technology develops right now, is actually for the advantage of the spammer. We give you three examples:





The conclusion from all this is obvious: spammers simply need to be patient:



With other words: for the time being we won't be able to get rid of spam!



Bayesian Poisoning

Introduction

Bayesian filtering is a static analysis of an e-mail message which makes it possible for the filter to understand what is and what is not a spam message. A Bayesian filter can learn to look for words in the text and the subject, the header, but also for things such as the HTML code, combinations of words and sentences as well as meta information (in which a specific sentence is mentioned, for example). Words, like 'Viagra',for instance, are mentioned many times in a spam message but not in a legitimate (ham) e-mail. A filter does not know this in the beginning, but can well learn it. The next time a message is sent including the word 'Viagra', the filter will understand and classify this e-mail as spam.
Source: Security.NL www.security.nl

The rise of contamination

Nowadays spammers put a lot of legitimate text (e.g. whole passages from books) into their messages:



This is done in order to fool the Bayesian spam filter. After all, if the filter finds enough legitimate words in the message, it will no longer recognise the message as spam and will let it go through.

The message has reached the inbox and the user will want to make sure that in the future it will be recognised as spam to ensure that such kind of messages no longer get inside his mailbox. Almost all spam filters have a button to do this.
But since the Bayesian learning process will analyse the complete message, it will also classify legitimate words and sentences as spam. This increases the chance that legitimate e-mails will be classified as spam and thus plays into the hands of so-called false positives. To reduce the amount of false positives, a user will then set his filter less precise. And this again gives the spammer a better chance to reach the inbox of the user. A vicious circle.

See also: Caretaker Antispam and Training

Probing

Right now many spam messages are sent which actually do not offer any product. Such spams are called probes. They are used in order to find out if the message is actually received somewhere. As a result the spammer knows that this e-mail address actually exists (he will not get anyundeliveredmessage back).


Further we have to assume that such probes are used in order to understand, whether the message is added to collaborative spam filters. Collaborative spam filters are filters with a centrally shared database in wich people all over the world classify messages as spam to help each other fight back spam. Cloudmark and SPAMfighter have such kind of a function.

Of course spammers are using these spam filters themselves. So if a spam message does not get back through the own spam filter, after it was sent a couple of days earlier and the recipient marked it as spam, then the spammer knows that the address he used for his first mailing actually exists. So now he can use the address for his real spam run.
If the user uses a Bayesian spam filter either on its own or at the same time with other filters, then the spammer will now also have contaminated the whole database of this filter.


Such kind of spam is not easy to stop, since there are rather little connection points to it. Often enough the e-mail consists of just one sentence or a couple of words:




Hashcash

To be able to distinguish between legitimate e-mail and spam, every message needs to be brought under control of the spam filter. Hashcash makes it possible to identify legitimate e-mails faster and simpler.

The principle behind free available Hashcash relies on the fact that the sender invests in computing time, which can easily be checked by the recipient. Every time an e-mail is sent, the sender needs to compute which costs a bit of time.

A spammer would not want to compute, because then he cannot send enough messages and thus does not reach the desired effectivity. The spammer is therefore hit on his income he gets from spamming (less spamming - less income).

How does Hashcash work?

By using Hashcash the sender adds an extra header to the e-mail message.The so-called stamp. Here is an example:

X-Hashcash: 1:22:070507:mark@surfright.nl::8BD6F813DBAC51A4:001B954E

If the recipient calculates the SHA-1 hash (160 bits) value for this, the first 22 bits will be 0. The recipient thus knows, that the sender spent time in oder to calculate the Hashcash.

Making a Hashcash header takes time. The value of a Hashcash is put together (orange segment) by:

This is followed by a value (blue segment) which is continuously increased, until the SHA 1 hash starts with a 22 0-bits over the whole value.

Computing a 22-bits Hashcash on a 2GHz computer takes about 1 to 2 seconds (average 222 / 2 iterations). The amount of time needed to compute such a Hash collision thus runs exponential with the number of 0-bits.

Should spammers decide to use Hashcash stamps, then the computers which are part of the botnet system (zombie pc's) will attract even further attention because of their continuously high CPU usage.

SpamAssassin

Normally manufacturers of e-mail programs and spam filters have to implement the support for a stamp such as Hashcash. These manufacturers nevertheless do not make any agreements amongst eacher other or do come up with their own alternative, while big spam filters do use open source Hashcash. Many providers use SpamAssassin as a filter, which does work with Hashcash since 2004.

Microsoft Office Outlook 2007 and Microsoft Exchange 2007 use their own stamp named E-Mail Postmark, which is not publicly available. Consequently the sender as well as the recipient need to use one of these products.


Relevant links:


© SurfRight 2010  |  Disclaimer  |  Sitemap