TextMate has this nifty feature that automatically encodes an email address in a hex format so that spammers can't easily scrape the addresses from your site. I'm working on a client site right now and wanted to build this component into the site, so that anywhere an email address is mentioned in a post or page it will be automatically converted into the hex encoded version.

Here is an example of the formatted email address we're going for: phgrw@phg ...

The first step is writing a PHP function that does the actual conversion from a normal string into this format. Here is my stab at this, ported from some nifty Python I found elsewhere online. I sure wish PHP had a lambda ability, this function could be a single line.

<?php
function email_encode($email) {
    $hexed = array();
    for ($i=0; $i < strlen($email); $i++) { 
        $hexed[] = sprintf("&#x%s;", dechex(ord($email[$i])));
    }
    return implode("", $hexed);
}
?>

This takes the string and loops over each character (yet another beautiful feature of Python, strings can be treated as arrays). You can do something similar in PHP (accessing a specific char in a string) but not using array manipulation methods such as foreach. Each character is converted to the ordinal equivalent, then into hex, and then sprintf()'d into the wrapper characters used to display them inside of a web browser. For the record, I freaking LOVE sprintf() and use it everywhere. It's the closest thing to Python's incredibly elegant string interpolation tecnique.

The next step is applying this function to email addresses found inside of content. Initially I had the idea of doing this on the post_save hook, but then you wouldn't be able to edit the email address again unless a reverse conversion was made on a pre_edit sort of hook. The better idea is of course to do this as the content is being displayed. Fortunately, there is a simple hook for doing that: the_content, which is applied when the aptly named the_content() function is called inside of the loop. My code is below:

The first step would be making sure to register your filter/hook using add_filter()

<?php
/* This would be inside of the functions.php file
 * in this case, I'm calling a function inside of my PHGRW class 
 */
add_filter('the_content', 'PHGRW::wp_content_hook');
?>

The next step is of course writing the filter itself to parse the content searching for an email address, send the email address through the email_encode() function we created, and finally replace that string inside of the content before returning the modified content to be displayed on the site. My code for doing this is below:

<?php
function wp_content_hook($content) {
    if (preg_match('#[^\W][a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\@[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\.[a-zA-Z]{2,4}#i', $content, $email_address)) {
        $content = str_replace($email_address[0], Template::email_encode($email_address[0]), $content);
    }

    return $content;
}
?>

This will do a simple regex search over the content to find any email addresses. If one is found, the str_replace() method is used to replace the found address with the encoded one. Finally, the content is returned regardless of whether it was modified. This could probably be re-written using preg_replace() or more likely preg_replace_callback() but the code that I have here does the job quite well.