More thoughts Re: spam filters as a general sorter

Jason Gurtz jason at tommyk.com
Sat Mar 12 07:07:46 PST 2005


On 12-Mar-05 09:29, Hui Zhou wrote:

> Reading my own mail (this one that I just sent :)) and I realize that 
> simple token treatment definitely won't work good enough to mark sort 
> my post into interesting (How shameless :). It may work for 
> categorization of regular notifications and alerts, but for general 
> chatting list, something more need to be taken into account. Maybe the 
> the lengh of original post? or proportion of quotes against reply? or 
> average length of sentences?

I think the hard part is really to come up with the heuristics that do
the sorting.  Beyond that, it's just separating those heuristics into
classes that each do the sort.  I personally find it harder to come up
with regexes that generically match non-spam mail because I seem to
think more in terms of what I don't want.  Maybe you can take a similar
approach in a hierarchy from "least want to read" to "most want to read"

You may even want to look at something like MIMEDefang which gives you
access via perl to many different message qualities.  Number of
recipients, time it was sent, envelope From:, etc....  That may give you
 more options in developing the heuristics and then you can just use it
to add a custom header which procmail will then use for it's sorting job.

Sounds like an interesting project anyway.

~Jason

-- 



More information about the lfs-chat mailing list