spam filters as a general sorter
zhouhui at wam.umd.edu
Sat Mar 12 06:17:47 PST 2005
I have been reading Paul Graham's essays on spam filters and amazed at
the effectiveness of his statistical filters.
I haven't encoutered a big spam problem (I guess I am not popular
enough yet) However I do have huge amount of mails that come into my
mailboxes: tons from mailinglists, and quite a few from my banks, my
universities, my friends, and a bunch of opt-in promotions, alerts
etc. Most of them don't qualify as spam, however, large percent of my
mails I don't want to read promptly, and some portion of my mail I
only read from time to time and skip most of the time.
My current strategy is to use procmail to sort my mails into different
mailboxes (over a dozen atm and growing larger). However, it still
annoys me because, for example, the most offen read inbox -- lfschat
still contains only very small portion of mail that I am really
interested in reading.
So during reading Paul's essay, I got this idea, apply the statistical
filter to all my mails to not only just two categories, but several
categories: such as Spam, Interesting, Advertisement, AccountUpdate,
StrangeLogEventsAndAlerts, PrivateMustRead, MildInterest,
Apparently the simple minded token treatment in Paul's essay may not
be quite effective against non-spam categories, but without actually
tring it out, who knows, it may amaze me.
More information about the lfs-chat