Like everyone else connected to the Internet, picoParsec (the
Starving Artist) has been increasingly assailed by spam (UCE – Unsolicited
Commercial E-mail), much of it targeted with e-mail addresses scraped from his
web pages.
In an attempt to mitigate the problem, or at least reduce the
torrent, he has taken several active approaches toward improving the situation,
and sincerely hopes that you will in turn play your part to help him out.
The three main approaches he has adopted to date are:
Defeating e-Mail Address Scrapers: finding ways to prevent address-scraping web robots from
harvesting a valid e-mail address from any of the pages within the compass
of the PicoParsec family of web sites.
Using a Web Form: bypassing e-mail altogether in favour of an interactive
web page.
Using Encryption: eliminating all casual unsolicited communications by
requiring the use of a public key to encode all communications with picoParsec
(the Starving Artist).
(1) Defeating e-Mail Address Scrapers
Because e-mail remains both the preferred and the most flexible
means of communication over the Internet, the primary approach has been to find
ways of eliminating, or at least significantly reducing the harvesting of valid
e-mail addresses from across the web sites within the picoParsec domain.
Alas, it is impractical, as well as irritating to correspondents, to keep changing
contact e-mail addresses, so the emphasis needs must be on foiling the scrapers.
Thus far, the steps that he has taken to defeat e-mail
address-scraping robots fall into four general classes:
Funnelling all communications
through a single contact page, removing all traces of any
e-mail address from every other page. Thus the fact that any “contact”
link on any page within the picoParsec family of web sites leads to the
sole active contact page.
Obfuscating the encoding
of e-mail addresses within the HTML coding of any page on which an address
appears; ideally only the singular contact page. This is done
either with in-line character encoding or by using JavaScript to dynamically
generate the e-mail addresses as the page is rendered.
Replacing HTML
encoding of any e-mail address with an image (a JPEG) of the e-mail address.
This requires readers to transcribe the address by hand when they want to send
an e-mail.
Adding very aggressive spam filters at the receiving end, requiring readers
to ensure that a special text token is included in the subject line of any e-mail
they wish to send to picoParsec (the Starving Artist). Note: providing that JavaScript is enabled, the token can be
provided automatically.
Each step has advantages and disadvantages, and are not mutually
exclusive. More detailed discussions of each of these techniques follow.
Step 1: Funnelling All Communications
The huge advantage of funnelling all requests for communication
through a single page is that this page becomes the sole place that any valid
e-mail address appears. On every other page in the Starving Artist’s
Garratt there is simply a link to the singular contact page,
giving address-scraping robots nothing to harvest.
The obvious disadvantage is that it adds one level of indirection
to the communications process, which may irritate some readers. However, this can
be mitigated to some degree by using scripting to automatically fill in the subject
line from the invocation context. It also allows the presentation of a choice of
possible communication mechanisms, which empowers the paranoid (among whom the
Starving Artist includes himself) to choose to encrypt their e-mail.
It is perhaps worth noting that the contact page
is the only page in the Starving Artist’s Garratt on which there
is any scripting, and that is limited to a very few snippets of verifiably benign
JavaScript.
This is a step that the Starving Artist has already taken,
not least because it allows him to concentrate all his experiments relating to
foiling address scraping on the single contact page.
Step 2: Obfuscating the Encoding of Addresses
The second part of the solution is the
insertion of a small, benign chunk of JavaScript to dynamically generate the e-mail
address on each page where it is displayed, in an attempt to foil scraping
or harvesting of e-mail addresses. This little piece of code is utterly
harmless, but if you choose not to enable JavaScript in your browser you will not
see the Starving Artist’s e-mail address, nor will you have a
clickable dynamic link.
If you cannot see the e-mail address where it obviously should be,
a little further down the page, or if your browser throws up some sort
of scripting-related error message, this is almost certainly the
reason.
Should this happen to you, and you are not prepared to enable
JavaScript in your browser, please take note of the next part of the
Solution.
Here is the dynamically-generated e-mail address:
(assuming that you have JavaScript enabled in your browser)
For reference, here is the JavaScript in question:
<script type="text/javascript">
<!-- Begin
var unam = "WebAdmin";
var dom2 = "picoparsec";
var dom1 = "com";
var subj = "subject=Contacting%20the%20picoParsec%20Administrator";
var titl = "title=\"Send e-mail to the picoParsec Administrator\"";
document.write('<p><a href=\"mailto:');
document.write(unam + '@' + dom2 + '.' + dom1 + '\?');
document.write(subj + '\"' + titl + '>');
document.write(unam + '@' + dom2 + '.' + dom1 +
'<\/a><\/p>');
// End -->
</script>
Essentially, it is just three document.write statements to
construct the string representing the mailto: URL
that, when you click on the resultant link, will invoke your default e-mail
application with both the To: and the
Subject: already filled in.
Here is the Starving Artist’s contact e-mail address
presented from an obfuscated HTML in-line character encoding, representing the
second part of the solution. Using this technique obviates
the need for any javaScript at all. However, there is a suspicion that some HTML
scrapers may be sufficiently sophisticated to see through this type of
character-encoding obfuscation.
mailTo:
For reference, here is the in-line coding in question:
Yes, I have shown the key pieces in plain text to make it easier
to understand what is going on. In reality, they could be obfuscated too.
The Starving Artist plans to run some careful comparison
experiments to determine if either of these two techniques, JavaScript or in-line
character encoding obfuscation, is vulnerable to sophisticated scraping.
Stay tuned for announcement of the results of the experiments.
Step 3: Replacing HTML Address Encoding with an Image
The third part of the solution is to
replace an in-line dynamic generation of the e-mail address with the display of a
totally graphic representation. For obvious reasons this cannot be a dynamic link,
so of necessity you will have to transcribe the displayed address by hand. Such a
graphic presentation of the address should be visible immediately below:
In most well-behaved browsers, if JavaScript has been disabled
the dumb graphic will be rendered. This should be visible immediately below, and
should you so wish you can experiment by enabling or disabling JavaScript and then
reloading the page, which will either render the graphic or will use JavaScript
to dynamically create the active link.
Step 4: Adding Very Aggressive Spam Filters
The fourth part of the solution is to
introduce extremely aggressive spam filters on incoming e-mails at the destination
mail server. Any incoming e-mail that does not include one
of the two following text strings somewhere in the subject line will probably end up
being filtered out, meaning that the Starving Artist will never see it.
PPS:
(in upper case, with the colon)
or
picoParsec
(in any case, including mixed)
A context-specific subject is automatically supplied whenever
the e-mail address is generated dynamically. If you are hand-transcribing
the e-mail address, please enter a subject that contains one or other
of the above text strings, as well as anything else you wish. (When
the in-basket is particularly full, the Starving Artist has
been known to cherry-pick solely on the basis of the message subject...)
(2) Using a Web Form
Bypassing e-mail altogether in favour of an interactive web page.
(3) Using Encryption
Eliminating all casual unsolicited communications by
requiring the use of a public key to encode all communications with the
Starving Artist.