Removing SPAM Comments From Old WordPress Exports

When it came to migrate my blog from an older installation of WordPress to the latest version on WordPress.com, I had a problem. The content export was 20Mb, but the limit was 15Mb. The cause was obvious – SPAM comments had filled up the old one (even though marked as SPAM, they are still exported), and until recently there has been no way other than modifying the database to get rid of them entirely. I had hoped to do that from the new system, but I couldn’t get past square one.

To resolve this, I put together this XSLT to remove SPAM comments from your WordPress exports:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:wp="http://wordpress.org/export/1.0/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns="http://www.w3.org/1999/xhtml">
  <xsl:template match="//wp:comment&#91;wp:comment_approved = 'spam'&#93;">
  </xsl:template>
  <xsl:template match="//content:encoded">
    <xsl:copy>
      <xsl:value-of select="." disable-output-escaping="yes" />
    </xsl:copy>
  </xsl:template>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:copy-of select="@*" />
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

Unfortunately, I had to manually go and fix some malformed comment HTML (or in the case of SPAM, just delete it) for it to work, since the XML wouldn’t even parse, but once that was done the result worked perfectly for the import. Then it was just a matter of cleaning up old blog links and some HTML that didn’t look quite right in the new template by searching within WordPress.

I hope someone else finds this useful!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s