All anti-spam technologies rest on a single foundation: distinguishing between spam and real mail. Beyond the criteria we humans use to distinguish spam (we identify spam primarily by analyzing its content), it’s worth noting some of its most important characteristics.
We currently handle spam in two ways: pre-acceptance and post-acceptance. With the former, spam is detected and blocked before it is received, while with the latter, it is detected and deleted after it is received. Examples of the first approach include black and gray lists, authentication of the sending server, e-mail pricing and even the introduction of legislation. Post-acceptance methods include Bayesian filters, collaborative filters, mail prioritization, challenge/response devices…
None of these methods is 100% effective, and it happens from time to time that normal emails are considered spam(false positives), or conversely, that some spam slips through the cracks(false negatives). To reduce false positives, anti-spam tools need to be better able to distinguish spam from normal email. In this article, we’ll look at what distinguishes them in terms of traffic and spammer behavior.
The first distinction is frequency and timing. Legitimate e-mail traffic generally follows a well-defined weekly or daily frequency, with a peak on working days and during the day, while spam is sent at any time of day, on weekdays or weekends.
The second distinction is that the arrival of spam and regular mail is a Poisson process, but whereas spam rates are virtually invariable whatever the period analyzed, regular email arrival rates vary from one period to the next.
The third difference is that the average size of legitimate mail is larger than that of spam. In addition, the variation in the size of legitimate mail is greater than that of spam. This is due to the automated, non-spontaneous nature of spam messages, which more or less always have the same content when they come from the same batch. It’s rare to see a spammer send out several hundred thousand messages with content of varying sizes.
The number of recipients is the fourth significant difference: compared with legitimate e-mails, which most often have a single recipient, spam is more often sent to multiple recipients (although in both cases, the single recipient is the most frequent case).
What can we conclude from these characteristics? Legitimate mail is the result of a bilateral relationship, usually initiated by a human being, and serves as a vector for a social relationship, whereas spam is the result of unilateral actions, carried out by automated methods, whose sole aim is to spray as many targets as possible. These distinctions alone won’t make the difference in the fight against spam and the 100% prevention of false positives, but they are (or will be) taken into account in the various anti-spam methods and will help to better distinguish between different types of mail, solicited or unsolicited.