How do you fight spam? By implementing multiple techniques, all fallible but all effective in combination.
Ever since the first spam message appeared in Internet users’ inboxes, the battle against spam has been waged in a war of ever-increasing importance as the technological capabilities of both sides develop. Today, any anti-spam platform needs to use several approaches to stem the scourge of unsolicited mail. However, each approach has its own shortcomings, which either allow the spammer to slip through the net, or penalize the use of e-mail.
– the blacklist: used right from the start of the anti-spam war, this is based on a compilation of e-mail addresses, domain names or IP addresses that send or relay spam. The compilation is stored on an MTA or MUA. All e-mails from blacklisted entities will be automatically deleted. Theoretically, no false positives will be detected, since we’re certain that the entity is a vile spammer. The problem is that spammers can always use a new e-mail address, impersonate an entity other than their own, or even take control. Such a platform is also easy to attack via a DoS. Last but not least, if a spammer manages to infiltrate a legitimate domain, all normal users of the domain will see their mails subsequently rejected.
– the white list: the principle is the same as for the black list: we compile a list of entities, with the difference that here we store entities that we are certain are trustworthy. This anti-spam technique works perfectly on a closed system, with a defined and limited list of users. On an open system, all e-mails from unknown entities will be systematically rejected, resulting in a high false positive rate. On the other hand, if a spammer manages to gain access to the list, he can easily usurp any of these entities.
– keyword research: widely used, this technique is ineffective in that the spammer simply uses variants, synonyms or distortions of the words, while legitimate e-mails containing the search words will be rejected.
– reputation: a system monitors an entity’s e-mail traffic (domain, IP, address), and a significant change in volume will be taken as an indication of spamming. The problem is that by the time an entity is flagged, millions of spam messages have already been sent. What about entities usurped by spammers?
– hash/signature: an MTA database stores a hash of previously identified spam. Each new e-mail will be compared with this hash. This makes it possible to detect spam fractions in the middle of legitimate-looking messages. However, the spammer only needs to insert a random element (e.g. a timestamp) to modify the spam hash. The other problem is the growth in the number of hashes to be stored, and the time it takes to compare them.
– header analysis: the problem with this anti-spam technique is that a non-compliant header doesn’t necessarily mean spam, it can also mean a poorly configured server. Similarly, if the spammer controls a zombie, the header will be completely normal.
– heuristic analysis: we combine several of the above techniques (header, signature, etc.) to determine whether we’re dealing with spam. The battle against spam is waged at the user level: it is the user who decides the threshold beyond which an e-mail will be treated as spam. So it’s up to the user to decide, and the system takes an extremely long time to fine-tune before achieving acceptable efficiency. Despite these and other problems (use of images, coding of different characters, etc.), this is the method that gives the best results.
– artificial intelligence: the use of automated cognitive techniques and statistical methods enables the system to adapt and set thresholds on its own, without human intervention. It’s a doped-up version of heuristic anti-spam analysis. There are other problems, however: the system has to be thoroughly developed and tested before it can be used, it is machine-intensive due to the complexity of the process, and last but not least, it is not 100% reliable.
– micropayment: the sender must make a payment each time an e-mail is sent. The payment can be financial (a tiny sum, but which will be considerable if you send millions of e-mails) or in resources (captchas, etc.). This system is not very reliable, since those who implement it will see their customers flee in favor of free messaging services such as Gmail. And should sites that send out newsletters be charged? What about discussion lists? If you exclude them from payment, there’s nothing to stop a spammer from usurping them.