Blog

Comments working again

I finally did it and wrote the spam filter that I had promised a while back. It was less work than I thought, actually. Anyway, you can now write comments again.

The filter is a so-called Naive Bayes filter. It calculates the probability that a comment is spam, based on how often the words in the comment were observed in spam comments and in normal comments. The implementation generally follows the english Wikipedia article about this, without any additional heuristics for rare words and the like.

If anyone cares, I can post the code for you all to read. It isn’t that much. The most significant single point that I noticed was that the spam filter might go crazy if it finds a word that was never seen either as spam or as not-spam, which is a so-called zero frequency problem. To solve that, whenever I add a new word, I first set both sightings as spam and sightings as not-spam to one (and then one more for whatever I saw it as). This makes the results slightly less accurate, but it remains good enough to work.

Currently the filter has three levels. If the probability that a post is spam is higher than 95%, then the comment isn’t even written to the database, but rejected immediately. A comment that has a chance of more than 70% is saved, but remains hidden until I’ve decided whether it is spam or not. Every time I make such a decision, the spam filter gets trained a little bit to become more accurate. Of course, I may have to change these thresholds in the future.

New Car

I had not originally planned to bother you with a blog post for this, but Björn argues that I ought to. So behold, my new car!

A grey three-door hatchback.

To provide some background, my previous car was damaged beyond repair in a small accident1, which made me rather sad. This new car, however, cures that. It is an Alfa Romeo 147, built in 2004, with 2.0 TwinSpark engine (petrol, 150 hp), manual transmission and lots of extras such as leather seats, lowered suspension, a sunroof, ESC, ASR and lots of other things I don’t technically need, but really enjoy having.

It needs a new radio, for iPod compatibility (I am currently using an FM transmitter, and while it works, it is not exactly an ideal experience). In every other respect, it is great and I love it.


  1. Nobody was hurt. The accident wasn’t my fault either, I was not even in the car when it happened. Needless to say that I will be more reluctant when it comes to lending this car to others. 

Spam fighting

I used to fight spam in comments here using a simple blacklist, which contained words that only spambots used and that were hence forbidden here. That worked pretty well. Now, however, I seem to be locked in some sort of epic struggle with a spambot who posts very similar posts, but has an astonishing variety of expressions at its disposal and thus manages to slip through again and again.

To stop this, I decided to write a more advanced, learning spam filter, similar to those used in E-mail clients. This isn’t as hard as it sounds, there are tutorials online, but to train this I need training data, i.e. actual and annoying spam.

Unlike normal messages, I don’t store Spam, so to get this I now deactivated my old spam filter completely. Since I don’t want lots of random crap to appear on this site, though, I have disabled display of new comments. They are stored in the database, but not shown anymore. How long this will take I don’t know (it is possible that the spam bot stops when it sees that none of its post appear anymore, believing the site to be broken), so please don’t be surprised if no new comments appear anymore!

By the way: I haven’t posted any new pictures here lately. That is because I plan to change that part of the site a lot. At the moment, you can find all my new railroad pictures only at http://zcochrane.deviantart.com/gallery/, though I plan to post them here as well once I’m done.

Sensors

First of all: Sorry it took so long, I finally put up a PDF version of the slides of my presentation on March 27th, 2010 about my bachelor thesis. The movie in there won’t play, this being PDF and all, but it really is just the same as in my last blog entry about this, minus the “real” part because that just didn’t look good enough to be projected with a huge beamer.

Now, about my current status. There are a number of official milestones, which are listed in the slides and the proposal, but I think just as important, if not more, are the informal milestones. This simulator just hit the most important one for me: It is fun to play with it. This is such an important step because “This really annoys me” is a far more precise and useful sentiment than “I guess I could do it that way as well”, and you won’t get it if you don’t play with the program.

The important piece was to implement sensors, which are now fully supported. Let me give you a short overview: