Smart SPAM & Fighting it
For any machine learning based SPAM filters, such as the popular Bayesian methods, the key to success is the body of previously identified SPAM and HAM (valid emails) or training data. In order for the spammer to trick the filter, they must try to be more HAM-like. The way to beat this is by giving your email classifier as much training data as possible, and continually updating it. Just learning from your company’s emails is probably not fool-proof when you consider the volume and variety of SPAM on the net. Web-based email on the other hand, like Gmail and the hosted version, should never have this problem because the filter learns from thousands of user’s SPAM folders.
Researchers from University of Calgary claim that the next evolution of will be smart SPAM, which will infiltrate your computer via spyware/viruses and ‘mine’ your emails. By creating emails based on the your actual messages you’ve previously sent, the spammers hope they will be more believable to readers.
I would argue, however, that such a situation would merely make services Gmail, more attractive. Firstly because they have a truly massive body of knowledge to use to fine tune their spam filters, and secondly because it is unlikely such spyware could infiltrate a web-based system. Even if a program was distributed that waited for someone to log on and then took over, Google could have it effectively neutralised in a matter of hours.
March 22nd, 2007 at 10:10 am
Interesting ideas.
I guess I am not as hopeful as you that a centralized solution like Gmail would be smart enough quickly enough to defeat spammers.
For example I get a lot of spam emails with near duplicate but natural enough looking content - the problem is that the content (which has been harvested from the web) is not the real message (which is often male performance enhancing drugs) and swamps the Bayesian filters. I am not sure that “Bayesian like” classification tools will ever be smart enough for this.
I think I have a blog entry on this .. no, I just checked, it is not up, so I will put it up now. Here it is http://dsanalytics.com/dsblog/why-bayesian-spam-filters-are-doomed_92
btw, the medium of choice for people under 20 seems to be texting.. email is seen as old hat and when it is used it is usually through eg hotmail who do seem to do a good job of spam suppression.
Email is still, however, the medium of business communication although I am intrigued by the gmail idea of conversations (not universally loved though - see http://philwilson.org/blog/2005/06/gmail-conversations.html)