When it came to migrate my blog from an older installation of WordPress to the latest version on WordPress.com, I had a problem. The content export was 20Mb, but the limit was 15Mb. The cause was obvious – SPAM comments had filled up the old one (even though marked as SPAM, they are still exported), and until recently there has been no way other than modifying the database to get rid of them entirely. I had hoped to do that from the new system, but I couldn’t get past square one.
To resolve this, I put together this XSLT to remove SPAM comments from your WordPress exports:
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:wp="http://wordpress.org/export/1.0/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns="http://www.w3.org/1999/xhtml"> <xsl:template match="//wp:comment[wp:comment_approved = 'spam']"> </xsl:template> <xsl:template match="//content:encoded"> <xsl:copy> <xsl:value-of select="." disable-output-escaping="yes" /> </xsl:copy> </xsl:template> <xsl:template match="*"> <xsl:copy> <xsl:copy-of select="@*" /> <xsl:apply-templates /> </xsl:copy> </xsl:template> </xsl:stylesheet>
Unfortunately, I had to manually go and fix some malformed comment HTML (or in the case of SPAM, just delete it) for it to work, since the XML wouldn’t even parse, but once that was done the result worked perfectly for the import. Then it was just a matter of cleaning up old blog links and some HTML that didn’t look quite right in the new template by searching within WordPress.
I hope someone else finds this useful!