[Kmail][Spam Detecting] Check word in dictionary

When I was young, I write simple mail detection script for evolution. It was bash script, which counts words in native (my) language / each word in mail. If result was (for example) < 30%, it move mail to spam directory.

I decide to use this script, cause I do not read English newsletter in mail and do not talk with people from other countries.

Currently, it would not work, since AI.

But we can do something different: match user name and domain name to dictionary. Many spam are coming from domain such like feineuifnfui3fbgnu3fnb.com , or person like infeiuebefu@gmail.com . We can use this rule to detect spam. Simply: split address to user name and domain name. In next step split each result using dot (maybe others like hyphens, minus sign, etc.) as separator. In last step, we comparing everything with dictionary. But: we select dictionaries in such way:

  1. We look at last part of domain (com, pl, de)
  2. For non-country domain, we select English dictionary
  3. For domains such like pl, we selects Polish dictionary, for .de we selects German dictionary
  4. We always comparing with native dictionary (language selected in system)

What do you think?

We can also adds option to select languages, in which user except of mail. For example: I known Germany and little English, so I select Germany and English. If mail comes in Italian, we mark it as spam.