How is spam detected?

by Stephane

Spam analysis and detection procedures

A nuisance for users and a headache for administrators, spam is a real problem. Several techniques are available to eliminate these unwanted emails.

Spam accounts for more than two-thirds of all e-mail traffic, and poses a serious problem for all IT managers. The damage caused by spam is not limited to the influx of unwanted mail or the loss of legitimate mail. Spam is related tophishing, viruses and other malware, and scams. In this article, we present current spam control methods.
A large number of these seek to develop the most accurate spam detection possible using learning and probabilistic techniques, including Bayesian classification, artificial neural networks and text compression. Among these, the use of Bayesian filters is undoubtedly the most widespread practice, in various forms, with or without pre-processing, with or without learning, as part of a classification set or operating in isolation.

Techniques currently on offer include :
– router control: the SMTP flow is redirected to a specific gateway which applies detection by fingerprinting and Bayesian classification, resulting in detection accuracy of around 97%. Spam processing at network access routers also enables SMTP connections to be closed at this stage, saving network resources.
– proxy-based detection, which limits the flow of spam and protects mail servers through the use of blacklists and content-based sorting techniques. Mails are processed at Layer 7 by an MTA proxy, which reduces spam flows by refusing e-mail transfer requests during periods of heavy load, thus isolating and privileging the real MTAs from overload and spam attacks.
– spam detection on assembled e-mails based on content classification and fingerprinting. Because this technique only processes e-mails once they have been assembled, it relies on a software implementation. Assembly is also increasingly costly in terms of memory and processing capacity as network speeds and transport capacity increase.

It should be noted that detections based on the origin and routing of an e-mail (blacklist, sender history, etc.) come up against a major problem: lack of prior information. This is because the MTA receiving the e-mail does not have the information it needs to initiate the distinction process while the e-mail is being received, or during SMTP sessions, in order to be able to differentiate between e-mail transfers. The reason for this is that outbound spam checks are almost never carried out, either on the outbound or on the relay MTAs. The analysis is therefore the sole responsibility of the final recipient MTA. This characteristic means that detection techniques based on content analysis are currently more effective than those based on a blacklist or history.

Test Altospam’s solutions!

Thousands of companies, CTOs, CIOs, CISOs and IT managers already trust us to protect their e-mail against phishing, spear phishing, ransomware, …