Among the new ways of combating the influx of spam, the use of the body’s own immune system model offers some of the most promising prospects.
The idea of taking inspiration from the immune system is not new, since the first work on Artificial Immune Systems began in 2002 to explore the cognitive capabilities of the immune system in computer science. It is therefore in line with this work that the application of an algorithm inspired by the immune system has naturally appeared recently in the fight against spam. The similarities between hostile organisms that attack the body and spam are obvious:
– Spam is constantly evolving, just like viruses and bacteria: this year’s strain of flu virus is not the same as last year’s, and will change next year. Spam also changes in order to avoid detection by anti-spam content analysis techniques, either by paraphrasing the original content, or by using an alternative spelling, which may be incorrect but is perfectly understandable to the human eye, the sole target of spam (Ci@li$ instead of Cialis, for example).
– Spam can be identified by its content, thanks to a mechanism similar to that used by the immune system: pattern matching.
Comparing spam to a pathogen to be identified and neutralized by anti-spam lymphocytes is therefore a very pertinent observation. As this approach is relatively recent, several avenues are being pursued at the same time.
The first uses regular expressions to filter by pattern. The anti-spam system is equipped with several detectors (the equivalent of lymphocytes), each with a specific weighting. This weighting is increased when the detector binds to a pathogenic organism (when it recognizes an expression indicative of spam in an e-mail) and decreased when it recognizes a non-pathogenic organism (a normal message). After passing through a set of detectors, the e-mail is assigned a weight equivalent to the sum of the detectors’ negative and positive weights. When the value exceeds a certain threshold, the message is considered spam. In the event of a false positive or false negative, the anti-spam system can be updated by modifying the weighting of the lymphocytes that have recognized (positively or negatively) the message.
A second approach, called SRABNET (Supervised Real-Valued Antibody Network) is based on the antibody network: an antibody is created for each class of expression considered to be spam (the equivalent of antigens). Each antibody is weighted proportionally to the importance of the class in a batch of previously identified spam. When a message is presented to the system, the Euclidean distance between the antigen present in the class and the antibodies is calculated, and the antibody corresponding to the antigen is determined. The concentration level of this antibody in the ensemble is then increased, as is its weighting. In addition, antibodies that identify the antigen with the lowest affinity (greatest Euclidean distance) are cloned, and antibodies that do not recognize any antigen are eliminated.
Other avenues are also being pursued, such as an anti-spam system capable of identifying a conceptual shift (in this case, the functionality or distribution of spam data), or one based on the innate immune system, the body’s oldest defense system, which responds in a generic way to pathogenic organisms.
The analogy of the immune system in the fight against spam is certainly one of the most theoretically promising avenues. However, all experiments to date have been carried out in the laboratory, comparing these technologies with more conventional detection methods involving the processing of a batch of messages into which known spam has been inserted. While immune antispam may fare better, or at least as well, as other techniques, it remains to be seen how they behave in the real world, when faced with the constraints of load, relevance and unpredictability in the fight against spam. We can’t wait to see the first real implementations of this new technique.