an intentional error?
    
      while looking up email addresses for some faculty in my department, I noticed an interesting anomaly. if I simply click on the faculty member's email address, it opens my email client with a new email properly addressed. however, if I right click, select "Copy Email Address," then paste it in the To field for a new email, there is a leading "%20", the HTML code for a space. I initially thought this might be just a typo, that there was an extra space in the mailto tag, but the character appear on all the email addresses. perhaps this is some sort of counter-spam effort.
    
    
    
  
  looking closer at the HTML source, it does seem to be counter-spam. for example, the link to email Richard Taylor shows up as follows:
<a href="'mailto
:%20taylor@ics.uci.edu'">taylor&
#64;ics.uci.edu</a>
:%20taylor@ics.uci.edu'">taylor&
#64;ics.uci.edu</a>
to the average human eye, this looks mostly like a bunch of mumbo-jumbo, but effectively it's encoding the mailto tag using ASCII code points, swapping the letters for numbers. thus, if someone looks directly at the HTML source, as most spam harvesters do, it doesn't look like an email address. even if someone does try to automatically harvest it, there's an extra space in there. however, when the browser renders it, everything looks normal, and you can even click on the address and your email client will automatically remove the extra leading space for you.
what I don't quite understand is how or whether this actually impedes spam harvesters. if a browser can render the above coding into a meaningful email address, why can't a email-harvesting bot do the same? do most harvesting bots just go for low-hanging fruit rather than trying to decode obfuscated email addresses? is it just a matter of adding one more layer of resistance? or is there something intrinsically difficult about having a bot resolve the above HTML to a meaningful email address?
Labels: email, sousveillance, spam




1 Comments:
You should try it and report back :)
My bet is that you can just grab the rendered page with something like curl or parse the html with hpricot just fine.
By Admin, at Tuesday, January 13, 2009 8:14:00 AM
 Admin, at Tuesday, January 13, 2009 8:14:00 AM
	   
Post a Comment
<< Home