Discussion:
Date adjustment script
(too old to reply)
daniel.poelzleithner
2003-09-11 18:53:01 UTC
Permalink
Hi,

I wrote a script that do check the mail date tolerantly.
When the time is to much in the future or to much in the past, the date
header is replaced by the current date.

You can find the script at http://files.poelzi.org/procmail/testdate.py
The procmail rules you need are in the documentation.

please report bugs you found :)

regards
Daniel
Professional Software Engineering
2003-09-11 20:35:16 UTC
Permalink
Post by daniel.poelzleithner
You can find the script at http://files.poelzi.org/procmail/testdate.py
The procmail rules you need are in the documentation.
please report bugs you found :)
You might want to check the archives for the discussions about using the
Date: header against the date data in the From_ header to determine if the
mail is suspect or not. On those messages which have had stale, advanced,
or completely invalid date headers, it's been quite useful.

IMO, one shouldn't "fix" the date, since more than likely, you'll be
bringing it into your timezone (not that of the sender) -- the date header
should reflect the attributes of the sender, not the recipient. This holds
especially true if you ever FORWARD one of these messages.

Excepting for the 'date' command itself, the entire operation of comparing
dates for staleness/advancement/basic validity can be performed from within
procmail - no external scripting language is necessary.

You should also note that some external scripting languages can incurr a
_significant_ CPU and memory load. I don't use Python since I'm familiar
with Perl and C/C++ and haven't seen anything compelling in Python, but you
should definatley benchmark the processing time against a large mailbox
with and without calls to your Python script to see what sort of overhead
you're adding.

AWK is a language which for example has a tremendous CPU load considering
what it accomplishes.

---
Sean B. Straw / Professional Software Engineering

Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
daniel.poelzleithner
2003-09-11 20:59:10 UTC
Permalink
Post by Professional Software Engineering
IMO, one shouldn't "fix" the date, since more than likely, you'll be
bringing it into your timezone (not that of the sender) -- the date
header should reflect the attributes of the sender, not the recipient.
This holds especially true if you ever FORWARD one of these messages.
Maybe, i don't know procmail that well.
I don't know if you understand that script corretly. Normaly it do not
change any headers. The script is tollerant when comparing the date and
respects the timezone the sender resists. When you manage your computer
correctly, the you shouldn't have a time that is more than a hour in the
future, not even that. Normally a email do not need 3 days to get routet
through the internet, even with server problems in the route. Both
thresholds are adjustable. Only if one of them is surmounted, the date
will be replaced by a new one. The old is renamed to Old-Date.

This prevents your Mailbox from beeing flooded by mails from slutty users.
Post by Professional Software Engineering
You should also note that some external scripting languages can incurr a
_significant_ CPU and memory load. I don't use Python since I'm
familiar with Perl and C/C++ and haven't seen anything compelling in
Python, but you should definatley benchmark the processing time against
a large mailbox with and without calls to your Python script to see what
sort of overhead you're adding.
Python is very fast.
Professional Software Engineering
2003-09-11 21:58:15 UTC
Permalink
Post by daniel.poelzleithner
Maybe, i don't know procmail that well.
Well, hitting the list archives would net you the procmail scripts which
check date validity and skew, which coincidentally rely on only standard
system components, not on optional script languages.

I'm not off to knock your use of Python, or that you wrote a script to do
something, just pointing out that there are scripts available which can
accomplish the same thing without special requirements, and that "fixing"
the problem isn't really fixing it, it's disguising it.
Post by daniel.poelzleithner
I don't know if you understand that script corretly. Normaly it do not
change any headers. The script is tollerant when comparing the date and
respects the timezone the sender resists. When you manage your computer
correctly, the you shouldn't have a time that is more than a hour in the
future, not even that.
Well, my hosts all run calibrated clocks. That's totally aside from the
point - you're changing the Date: header from what the SENDER set it to,
and in the process, you're legitimizing hokey email.

I've been running the date checking recipe on ALL of my email (and I get a
LOT), and thus far, almost exclusively, the stuff that gets tagged by it is
spam (and since I don't use it as a POSITIVE identifier for spam, merely a
strong characteristic, things which merely have a boffed date don't get
trashed as spam) or mail which lags through some lists (for some reason, it
isn't uncommon for some messages through to be delayed by more than
24h). I also utilize the date at the end of the From_ header as the
comparator, which means I can run the filter against stored email at will -
versus running it against week-old saved email and having it flag
everything as goofy.
Post by daniel.poelzleithner
Normally a email do not need 3 days to get routet through the internet,
even with server problems in the route. Both thresholds are adjustable.
Only if one of them is surmounted, the date will be replaced by a new
one. The old is renamed to Old-Date.
At least you keep it, but the point is, if the date is far out of whack,
it's out of whack - making it _today_ doesn't make it any more right
because the sender certainly didn't send it to you right at this
instant. This is especially true if in fact the message IS old:

* messages get held up in the delivery system. FTR, 5d is a more
common default timeout with MTAs, at least that seems to be the case
when I see delayed message announcements via the large discussion
lists I manage.

* messages to lists might be held for moderator approval. Passing
over a weekend and then some wouldn't be unheard of. Add to that that
a possible delay TO the list, plus a delay waiting for the moderator
(incl. possible delays delivering it to them for their approval and
delays sending it back to the list), and delays delivering it out of
the list, I could easily see the delay being amplified if something
went wrong, esp. at the common point: the list server.
Post by daniel.poelzleithner
This prevents your Mailbox from beeing flooded by mails from slutty users.
I prefer to filter cruft out, rather than make it appear legitimate. If
someone is such a moron that their clock is that far off, do you really
want to be reading what they've sent?
Post by daniel.poelzleithner
Python is very fast.
I'll take your word for it - I'm just suggesting that you might want to
benchmark a procmailrc with and without the call to the script to see what
overhead it actually adds, esp. since you're presumably subjecting ALL of
your email to this. Also, check how much memory resource the tool uses,
since you could receive quite a bit of mail concurrently if say, your mail
host is bumped offline for a while.

---
Sean B. Straw / Professional Software Engineering

Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
daniel.poelzleithner
2003-09-12 04:42:29 UTC
Permalink
Post by Professional Software Engineering
* messages get held up in the delivery system. FTR, 5d is a more
common default timeout with MTAs, at least that seems to be the case
[...]
Post by Professional Software Engineering
* messages to lists might be held for moderator approval. Passing
over a weekend and then some wouldn't be unheard of. Add to
[...]

Thats a good point. A past limit of 5 days would be better.
Post by Professional Software Engineering
I prefer to filter cruft out, rather than make it appear legitimate. If
someone is such a moron that their clock is that far off, do you really
want to be reading what they've sent?
Yes, because spam is filtered seperatly.
Do you never sit on a computer without bios battery ? ;) In fact not
everyone knows how to sync the system clock with ntp servers, unfortunately.
Post by Professional Software Engineering
I'll take your word for it - I'm just suggesting that you might want to
benchmark a procmailrc with and without the call to the script to see
what overhead it actually adds, esp. since you're presumably subjecting
ALL of your email to this. Also, check how much memory resource the
tool uses, since you could receive quite a bit of mail concurrently if
say, your mail host is bumped offline for a while.
[***@quamquam]> bin/ $ time ./testdate2.py Thu, 14 Sep 2003 14:38 -0400
Fri, 12 Sep 2003 04:25:57 -0000

real 0m0.098s
user 0m0.070s
sys 0m0.000s

fast enough for me.
I don't think that calculating the secounds back from rfc date is in
procmail easier and faster than python. My server idles in any case :)
Professional Software Engineering
2003-09-12 19:17:27 UTC
Permalink
At 08:39 2003-09-12 -0400, R A Lichtensteiger wrote:
[originally offlist, but we've agreed to pop it back onlist]
I wrote some recipes like that a while ago and found that the one that
nails future dates is a good indicator, but that past dates were almost
always a false indicator.
How does your experience compare?
I score date problems as "spammish", and as a result, a funky date in and
of itself isn't enough to identify something as junk, so even false
positives aren't a problem - it's all taken in conjunction with other
characteristics of the message. This allows me to be a bit more arbitrary
about my use of the filter - it doesn't have to be 100% because I'm really
not likely to lose legitimate email because of it.

I also score for INVALID date formats (which typically seem to have some
bogus text describing what the timezone is) - they seem to almost
universally be spam, though I merely score them with a higher spammishness
score.

Messages < 200K sec BEFORE reception tend to be list-delayed and twits with
erratic clocks, but I have an 18H threshold there anyway (yes, less than 3D
or 5D - but as I said, I'm using it as an indicator, not an
absolute). Bugtraq for instance seems to frequently have 140Ksec+ delays
(that list strips incoming Received: headers, so it's difficult to
determine exactly where the delay was inserted, but it isn't critical
because the single characteristic isn't enough to flag it as spam).

Very LARGE lags in the clock seem to be indicative of spam:

SPAM: +100+100 Date is suspicious at 121651249 seconds {312 00:00:49}
BEFORE reception

SPAM: +100+100 Date is suspicious at 121651249 seconds {312 00:00:49}
BEFORE reception

Curiously, both of those are from _SEPARATE_ messages from the same spammer
and are messages sent at different times.

I threshold advanced clocks at +2H, since it seems most legit mail which
has an advanced clock skew is under about 5K seconds (about 1.5 hours),
which can sometimes be attributed to morons having their machine set to the
wrong timezone.

Excepting the low thresholds, pretty much any advancement of the clock is a
consistent indicator of spam. Just reviewing filtered messages since the
beginning of this month, I see that a clock in excess of +2H has been spam
in every instance except for one, which was a bugtraq message
("SRT2003-09-11-1120 - setgid man MANPL overflow"), which because the date
characteristic is merely contributory, that message was NOT classified as
spam - however, all the others suffered from MULTIPLE spam characteristics,
for example:

SPAM: +125 Single received header for foreign sender
SPAM: +135 Advisory - relayed through backup MX
SPAM: +300 Foreign character set encoding (Windows-1250) in body.
SPAM: +100+100 Date is suspicious at 2678343 seconds {030 23:59:03} AFTER
reception
SPAM: +75 Advisory - no non-list cleartext recipient matching X-Envelope-To
SPAM: +249+58 Subject Scoring match 58
SPAM: +(249*0.75) text/html ONLY
SPAM: +249 Abundance of triggers
SPAM: Advisory - spammishness is 1577.75
SPAM: spammishness exceeds threshold of 249
INFO: SpamFilter v03.05.00 SBS 20030517/1243
From ***@web2mail.com Tue Sep 9 22:57:55 2003
Subject: Do YOU know how to earn lot of money on gold rate change?
Folder: gzip -9fc >> spam.gz 2440


If a message is 18H hours BEFORE or 2H AFTER reception, I add 100 to my
spammishness. If it's >72H out, I add an additional 100.


Overall, what I have has been working wonderful for me - just 5 spams so
far this month have actually gotten past my filters, and three of those
were some eBay scam received nearly concurrent to one another (for which my
spewhosts filter has been updated - a filter which adds a score based on
whether the message appears to have passed through a mailserver associated
with the domain of the From: address, used to flag potential forgeries).

In fact, of the two other spams I received, both of them would now be
tagged because I expanded some subject keyword filters (adding prostitute
and underwear), as well as having recently narrowed the advanced clock
threshold (from +18H to +2H) and bumping up the scoring for invalid date
formats.

I also recently modified the recipes to allow for a list skew of 24H if a
LISTNAME variable has been defined, so there's an automatic allowance for
delays on discussion lists (which in my system already get a boost to their
allowed spammishness threshold), which sharply reduces the number of
entries in my logfile when handling lists such as bugtraq (I have a spam
report emailed daily, and that includes messages which were spammish, not
strictly tagged as spam, so I can see how close iffy messages are).

Dates are but one characteristic of my filtering, and they've been useful
thus far.

---
Sean B. Straw / Professional Software Engineering

Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
Continue reading on narkive:
Loading...