How does phpSpamManager Works ?
v0.53
Guillaume Meister
20060706
g dot meister at sfig dot fr

Abstract

phpSpamManager (phpSM) is a maildir parser and spam analyser.
It's main goal is to help feed spamassassin spam and ham learning tool.

Requirements

phpSM does not require anything than its own files to be installed and configured on the mail server (or gateway).
phpSM requires PHP4 to be installed and well configured
phpSPM requires that mails stored in the maildir that it will parse be readable by the web server hosting him.
(On unix, file security of 744 is ok but unsecure, best is to have group to 4 aka rw, and php/apache user being in group owning the files)

Main structure

phpSM uses 4 main files :
- index.php : parses selected dirs and displays all messages in a table
- actions.php : process selected action on selected messages
- detail.php : shows mail fields full detail without body
- body.php : displays mail body as much as it can
- lasts.php : shows lasts messages list, on the time frame defined in include/phpsm.cfg

phpSM uses 4 include files :
- phpsm.cfg : default values
- config.inc.php : sets up main configuration (and regular expressions for parsing mail files)
- procs.inc.php : php procedure libraries that are used by main files
- resource.inc.php : internationalization enabling process
- parsemail.inc.php : SquirrelMail mail parser class
- htmlfilter.inc.php : HTML sanitizer code

phpSM uses language resource files :
- lang_en.res : english text
- lang_fr.res : french text
(all files should have exactly same variables defined)

phpSM uses one CSS file and one Javascript library:
- styles.css
- generic.js

How is used phpSM

When calling phpSpamManager using a web browser, a menu page is displayed, offering 4 choices :

1) SEARCH : displays search engine
2) SHOW SELECTED EMAILS : not implemented
3) SHOW LAST EMAILS : displays last received emails
4) DISCONNECT : not implemented

The search engine is rather complete, yet a bit complex. Especially the "Max file number" (search engine) and the "max file displayed" (search result) options :

- Max file numbers sets how many files (emails) in the timeframe we are going to parse. "0" means "parse ALL files", it can be rather long, so use only if you are searching a specific email.
- Max file displayed sets how many emails are displayed per page, amongst all the parsed mail in the timeframe.

TIP : the "Sender", "Recipient" and "Subject" search fields are regular expressions. So writing a word in a field will look for that word anywhere in the corresponding field in the parsed emails, and if you master regulard expression, you can use them here (PHP syntax).

What emails can phpSM parse and display

any SMTP emails can be parsed, but phpSM focuses on the spam scoring as returned by spamassassin (or, better, amavisd calling spamassassin).

Here is a sample of mail header phpSpamManager will parse and render :

Return-Path: <toto@wanadoo.fr>
X-Original-To: statsquarantine@localhost
Delivered-To: statsquarantine@localhost
Received: from localhost (localhost.localdomain [127.0.0.1])
by hermes (Postfix) with ESMTP id 6377437F19
for <belleville@titi.com>; Thu, 6 Jul 2006 14:04:15 +0200 (CEST)
Received: from meteor.synten.com (ns4.synten.com [193.47.141.42])
by hermes.sfig.fr (Postfix) with ESMTP id 1520137F5F
for <belleville@titi.com>; Thu, 6 Jul 2006 14:04:14 +0200 (CEST)
Received: from GFT2 (gft2.synten.com [193.47.141.142])
by meteor.synten.com (8.13.2/8.12.9) with SMTP id k66C4Cc4002100
for <belleville@laforet.com>; Thu, 6 Jul 2006 14:04:12 +0200
thread-index: Acag9Eu12ubKfskITG2s56xMNFkIZw==
Thread-Topic: LE SITE WEB
From: <toto@wanadoo.fr>
To: <belleville@titi.com>
Subject: LE SITE WEB
Date: Thu, 6 Jul 2006 14:04:12 +0200
Message-ID: <029501c6a0f4$4bb511e0$ca7c10ac@lsih.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_0296_01C6A105.0F3DE1E0"
X-Mailer: Microsoft CDO for Windows 2000
Content-Class: urn:content-classes:message
Importance: high
Priority: normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.1830
X-SYNTEN-MailScanner-Information: Please contact the ISP for more information
X-SYNTEN-MailScanner: Found to be clean
X-Virus-Scanned: clamav-sophos
X-Spam-Status: No, hits=2.942 tagged_above=-999 required=3
tests=[BAYES_05=-0.413, DNS_FROM_RFC_POST=1.614, HTML_00_10=1.068,
HTML_MESSAGE=0.001, NO_REAL_NAME=0.007, SUBJ_ALL_CAPS=0.665]
X-Spam-Level: **

The most important field is :
X-Spam-Status: No, hits=2.942 tagged_above=-999 required=3
tests=[BAYES_05=-0.413, DNS_FROM_RFC_POST=1.614, HTML_00_10=1.068,
HTML_MESSAGE=0.001, NO_REAL_NAME=0.007, SUBJ_ALL_CAPS=0.665]

it gives us 3 informations :
- is it spam ? (Yes,No) ==> No
- what was the spam score (hits=) ==> 2.942
- what were the matching tests (here in brackets) ==> ...

Should your amavisd/spamassassin return different format for this field, it would require to change the regular expressions used in config.inc.php, and maybe the "parse_file" procedure in procs.inc.php.

Defining which columns to display

You can choose which columns phpSM will display in the table, and in which order, by modifying $tabheader in config.inc.php (all column names are described).

for example :
$tabheader=array(0=>COLDATE, COLSPAM, COLFROM, COLTO, COLSUBJECT,COLSCORE,COLTESTS,COLFILENAME);

Actions to be taken on selected mails

phpSM allows to select mails and to pass them to a program.
Currently defined actions are :

- learn as spam
- learn as ham
- forward (to a chosen recipient or multi-recipients)
- resend (to its original recipient)
- delete

if archivedir variable is set, files are moved to this directory after having been learnt as spam or ham

1) to define programs and parameters used for each actions, change corresponding strings in config.inc.php. Samples for Windows and Linux are defined.

2) to add new programs:
- add a program string in config.inc.php
- add a resource string in all .res files
- modify procs.inc.php, print_tab function (add a "echo "<input type='radio' name='checkaction' value=' etc." line)
- modify actions.php to manage the coresponding action

Optional parameters and default values
[DEPRECATED : all parameters are set using graphical interface]


phpSM is called using an HTTP GET/POST (usually an URL on your fav browser)
many parameters can be passed to phpSM using url params.
Each one is define here by // param (role, default value)

// noham (dont display HAM messages, unset)
note : if "nofilter" param is set, the ham messages will display using "filtercolor" background

// notag (don't display tagged messages, unset)
note : if "nofilter" param is set, the tagged messages will display using "filtercolor" background

// nospam (don't display SPAM messages, unset)
note : if "nofilter" param is set, the spam messages will display using "filtercolor" background

// showparams (display url params, unset)

// scoretag (required score for tagging, 3)
note : messages under this score (ham) will display using "hamcolor" background, while messages between this score and "scorespam" (tagged) will display using "tagcolor" background.

// scorespam (required score for rejecting, 6)
note : messages over this score (spam) will display using "spamcolor" background

// tagcolor (TAG messages color, bright orange)
note : use hexadecimal notation #rgb, like #ff0000 for full red.

// spamcolor (SPAM messages color, bright red)
// hamcolor (HAM messages color, pale green)
// filtercolor (filtered lines color if nofilter is set, pale grey)

// nbmails (messages per page, 200)
note : defines number of displayed messages. Dont act on parsed messages

// start (start position in mail list, 0)
note : enables to browse thru parsed messages showing a few at a time

// maxfilesize (max mail size for parsing, 5000 bytes)
note : file of size bigger than this variable won't be parsed

// maildir (array of directories to parse, "Dir3")

// archivedir (path to store mails after action)
note: if defined, mails on which action is taken (learn as spam or ham) will be archived in this dir

// destlist (regexp fot mail recipient to display, unset)
sample : /(jean|pierre)/ will show all mails received by jean@toto.com, jeannot@titi.com, pierre@tutu.com, etc.
note: / are not mandatory, phpSM will add them if they are missing

// nofilter (display all messages even if filters are set, unset)
note: in this case, lines that would have been filtered out will display using "filtercolor" background.

// startdate (start date for messages, 20000101000000)
note : format is YYYYMMDDhhmmss

// enddate (end date for messages, 30000101000000)

Sample: http://..../phpspammanager?noham=1&scorespam=8&nbmails=300