It was a good chance to put into practice some of the theories I had already formed about the Internet. I had decided that since the Internet was built from open source software, a site which used and promoted open source would get listed in the search engines and would succeed in promoting itself.
All my assumptions proved correct. Eventually my site brought me a considerable amount of work.
Now looking after three kindergarten age kids is pretty much a full-time job, but I still had enough spare time on my hands to investigate the phenomenon of spam. This lead to the "Spam Diaries", which helped promote my site and even gained me a little notoriety as an anti-spam campaigner.
But if I had my time all over again, one of the things I would never do is put my email in a public space. At the time I did it because I took a rather combative approach to defending my inbox. But life is too short to spend that much time on such an exercise. I realise now that I should have created a special spamming address for my spam diaries adventures, something like firstname.lastname@example.org and circulated my real address to the people who needed to know it.
At first the spam trickled in. And I went after every spammer who dared to send me anything. But as I lost interest in the project, the spam still kept coming. And it kept on growing.
I still stand by my original assertions that spam is essentially bad manners. It is an abuse of the network. People who understand the extent of the abuse feel particularly aggrieved. And the outrage of some of those early pioneers has flowed into the mainstream and helped fuel the rather misguided attempts to take some sort of legal action against spammers. I think this is mainly due to the litigious culture of the USA, the home of the Free and the Brave, the Internet and the fountain of the new gospel of prosperity.
The rather foolhardy attempts at outlawing spam have if anything embolden and entrenched the criminal elements in the practice. Spamming was originally carried out by techies, propellar-heads and geeks. They were DIY bad guys with and attitude and (often) bad hygene, poor diet and a tendancy towords obesity.
But these days the real bad-guys have moved in. Organised crime now provides the funds for spamming. The practice is still just a lack of courtesy but the wares are mostly illegal, immoral, fraudulent and/or dangerous.
In the days immediately following the dot.com crash, there were lots of unemployed programmers and tech-heads. And the educational institutes were pouring out new graduates into the barren waste-land left after the crash. So it was possible to hire programmers on the black market if you were a criminal who did not have the necessary expertise to setup your own spamming enterprise.
In the latter half of this decade there has been a resurgence of demand for skilled programmers. The legitimate market is making it tough for the black market. And unless you just enjoy working with the dark side of the force there just isn't the same incentives for illegal operations.
That's why a lot of the spam these days comes from Eastern Europe, India and China. Programmers are still pretty cheap there and for the time being criminals will look for the best deal (not just criminals -- a lot of corporations are looking also -- but that is another story).
But this is where the hand-spammers make a contribution. Hand Spam is the name I give to the spam that is entered (mostly by humans) into web forms or addressed individually to various sites. It's not really spam at all, but the wages that hand-spammers get paid are so low that they can be competitive with programmers. A lot of the programmers that turned towards the dark side aren't terribly bright, and like a lot of criminals, often have an exaggerated sense of their own cleverness and self-importance. The quality of their work is often sub-standard, and the spam is easily blocked by bayesian filters, RTBL and/or a combination of both. Human operatives take that little bit of pride in their individual efforts like craftsmen, and despite the very low wages that they get paid for the efforts the results of their individually crafted labour will make its way through the various computer defences setup around a mailhub.
That's humans for you! They really are an ingenious species. Maybe there's hope for them still?
Of course the automated stuff is still coming. Microsoft Spam zombies have really revolutionised the industry.
I have had to resort to a combination of block lists and a bayesian filter. The filtering technology that I use is SpamAssassin, the well known perl module. This is not really for intended blocking spam, but for filtering it, although it can be adapted for blocking. You can install it in Ubuntu by typing:
sudo apt-get install spamassassin
If you have configured apropos, then the following command will retrieve the names of SpamAssassin perldoc pages:
man -k SpamAssassin
SpamAssassin Tools: The following perl scripts help interface with the SpamAssassin pod.
- sa-learn Utility that learns and/or unlearns spam.
- sa-update Admin utility to get updates for SpamAssassin.
- spamassassin The main interface to SpamAssassin.
There is also an sa-compile utility which most people don't use.
The sa-learn utility is a pearl script that gives an interface to Spam Assassin's learning features. Mail is defined as spam or ham.
Spam Assassin is much more effective if you give it a large sized samples of spam and ham. After it learns how to categorise mail, spam is marked up as such, whereas ham is left unmolested.
In order to teach the Bayesian filter about email in mbox format (e.g. a file from mutt), the following command will do it. I actually invoke this for each user using the sudo command as root.
sa-learn --mbox --spam /foo/bar/spam
In each user's home folder I put a .procmailrc like this:
* ^X-Spam-Flag: YES
There's a bit more to it of course. I also have a script which moves the data to the webmail interface. And I am working on another script that reads IP addresses from the suspect file and places them into the postfix access database (basically puts them in the sin bin for 24 hours. This will mean that any repeats can be blocked by the MTA (rather than being routed by SpamAssassin) And there is a confirmation script which moves the few spams that get through to a spam folder so that bayesian filter can learn about them as well.
Despite my best efforts however, I still can't get below three spam emails per week. The sheer volume has increased considerably however. The various defenses that I have in place are now fending off hundreds of emails every day.
Recent estimates of spam activity consider that spam now accounts for the majority of email traffic. A lot of it is just swatted down, but enough of it must be getting through to their targetted audience for the practice to continue.