Spam filters are all about learning and adapting. But it is possible that a malicious attacker could take advantage of these qualities to hijack the filter classification process and classify a large number of legitimate e-mails as spam, thus increasing the number of false positives.
Spam filters use learning algorithms to correctly identify legitimate messages from spam. The most popular filters – SpamBayes, BogoFilter and the SpamAssassin learning component – all use learning algorithms belonging to the same family, differing only in a few respects. In the absence of a reliable universal database of all spam in circulation at any given time, anti-spam tools must develop the ability to detect spam without any prior information. A group of Berkeley researchers recently discovered that a perfectly targeted attack can divert these filters from their intended purpose, and thus increase the number of false positives at the recipient. Such an attack could be useful, for example, if an organization wanted to prevent a rival organization from accessing certain important emails, or for a spammer to get the victim to completely disable his anti-spam, and thus receive all the spam sent to him.
Such an attack can only take place on one condition: the attacker must be able to send emails that will be used to teach the filter. In most cases, this condition is met: filters use the manual classifications made by users (classifying as spam a false-negative message, as non-spam a false-positive message, or as spam/non-spam a message classified as indeterminate by the filter) on the emails they have received to adapt their future decisions accordingly. Worse still, some anti-spam software features self-learning rules that automatically integrate spam detected in Bayesian databases, which only amplifies the process. All the attacker has to do is send e-mails to the victim for the condition to be met.
The very nature of the attack lies in the fact that it doesn’t seek to exploit loopholes, but instead relies on the very adaptive principle of the algorithm to manipulate its results. Let’s look at the case where we’re trying to classify legitimate, well-targeted messages as false positives. The attacker therefore knows at least part of the content of these messages (e.g. standard responses to invitations to tender). It will then send a number of messages to the victim, including the terms that will be placed in the legitimate messages. To conceal his true intentions, he may add other terms that have nothing to do with it and are just used as camouflage. Its sole aim is to have the messages it sends manually classified as spam by the victim. As these messages are classified in this way, the filter will assign higher and higher scores to the terms/words that regularly appear in them. When legitimate e-mails arrive on the user’s mail server, the filter will immediately identify the presence of the terms/words and immediately classify these messages as false-positive, automatically placed in the spam folder (or even destroyed, if this is the action requested by the user).
This type of targeted attack is particularly effective because other legitimate e-mails are normally received, and if the victim has not been alerted to the targeted e-mails by another means, they will never know that false positives have gone undetected.
In addition to the targeted attack, the same technique can be used to make a large number of non-targeted e-mails appear as false positives, by sending the learning device e-mails written using a dictionary of terms often in the victim’s own language. As a result, the victim will either have to check his junk mail folder every time to catch false positives, or decide to deactivate anti-spam for good.
The conclusion? A system based solely on a Bayesian filter, however effective, is unreliable, and we cannot recommend combining several anti-spam detection protocols, or entrusting the fight against spam to professionals who can react very quickly in the event of abnormal system behavior.