Two databases are created, one for spam and one for hams (legitimate messages). Through a learning phase, a dictionary of keywords is created in which each term is associated with a probability. For example: viagra 100%, security 20%, messaging 10% and free 60%.then, when analyzing an e-mail, if the words in the lexicon exist, the sum of the probabilities of each keyword found is assigned to the e-mail. Following our example, if an e-mail contains the words “free mail server security”, the e-mail will obtain a score of: (20% + 10% + 60%) / 3, i.e.: 30%. This is a legitimate message, since the score is less than 50%. With a large number of spam and hams, this technique provides very interesting analysis results.
Most e-mail clients with built-in anti-spam features (Thunderbird, Outlook, etc.) use Bayesian filters almost exclusively. In ALTOSPAM, Bayesian filters are among the 15 technologies used. Depending on the score obtained (between 0 and 100%), the e-mail is more likely to be classified as spam or ham.