Tools to Convert Moveable Type to Pebble

Last task on the major list for migrating to Pebble was to convert all of that content out of Moveable Type. There is some code in the attachment of this blog.

I was a little hopeful at first with all these standard XMLRPC APIs that there might be a straight conversion, perhaps having to map some properties, but it’s a long way from that. In fact, I’d had have settled for one of them being able to access all the available data, but as far as I can tell that’s not the case.

I mostly focused on the metaWebLogAPI, since that appeared to be more complete than Blogger, and only those two seem to be supported by Pebble at this time. However, it seems that this only let’s you retrieve a subset of the entries, and Moveable Type’s documentation wasn’t much help on this.

So instead I exported the MT content from the admin interface which contains everything in the database in a simple text format, and got to work importing that into Pebble.

Again, I hoped to use XMLRPC and did so in the code below with some success, but a number of fields couldn’t be stored, nor could I find a method to add a comment (as the API is understandably designed for blog editing clients more so than full backup and conversion).

I found in the end that the best way to go ahead was to parse the MT content into a little model, then write that model out into Pebble’s XML format which is very straight-forward. You can then lay that out over the blogs directory and hit “rebuild index” in the utilities menu of Pebble.

I still found a couple of issues doing this: there was some obscure NullPointerException in the rendering filter that appeared to be caused by the filename not matching the publish date (because of the timezone, I was sometimes off by one).

Even once I’d straighted that out I had a lot of problems with particular entries, but they seemed to go away after several iterations of delting the indexes, adding them one by one, indexing, and occasionally restarting when it got confused. This code has changed since so hopefully not an issue next time around.

Ideally, this would be built straight into Pebble and added using the blog API. Something for my abundant free time no doubt 🙂 Still, the code is here in case anyone wants to volunteer…

The code posted here is very rough – a simple main app with no error checking that will convert a MT export into a pebble blogs data directory (sans metadata), to be overlaid onto an already created blog.

Hope this helps someone else!

pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>home</groupId>
  <artifactId>mt2mwla</artifactId>
  <version>1.0-SNAPSHOT</version>
  <dependencies>
    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>xmlrpc</groupId>
      <artifactId>xmlrpc</artifactId>
      <version>2.0.1</version>
    </dependency>
    <dependency>
      <groupId>commons-codec</groupId>
      <artifactId>commons-codec</artifactId>
      <version>1.2</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
  <build>
<plugins>
<plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.5</source>
          <target>1.5</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

src/main/java/home/App.java:

package home;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.StringReader;
import java.text.DateFormat;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Locale;
import java.util.Set;
import java.util.TimeZone;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Hello world!
 */
@SuppressWarnings(
    {"UseOfObsoleteCollectionType", "CollectionDeclaredAsConcreteClass", "UseOfSystemOutOrSystemErr", "ReturnOfDateField", "AssignmentToDateFieldFromParameter"})
public class App
{
    private App()
    {
    }

    private enum PublishStatus
    {
        Draft, Publish
    }

    private static final String MULTI_LINE_FIELD_SEPARATOR = "-----";

    private static final String ENTRY_SEPARATOR = "--------";

    private static final Pattern FIELD_PATTERN = Pattern.compile( "([^:]+):(.*)" );

    private static class Comment
    {
        private Date date;

        private String ip;

        private String email;

        private String url;

        private String author;

        private String text;

        public void setDate( Date date )
        {
            this.date = date;
        }

        public void setIp( String ip )
        {
            this.ip = ip;
        }

        public void setEmail( String email )
        {
            this.email = email;
        }

        public void setUrl( String url )
        {
            this.url = url;
        }

        public void setAuthor( String author )
        {
            this.author = author;
        }

        public void setText( String text )
        {
            this.text = text;
        }

        public Date getDate()
        {
            return date;
        }

        public String getIp()
        {
            return ip;
        }

        public String getUrl()
        {
            return url;
        }

        public String getEmail()
        {
            return email;
        }

        public String getAuthor()
        {
            return author;
        }

        public String getText()
        {
            return text;
        }
    }

    private static class Entry
    {
        private String body;

        private String extendedBody;

        private String excerpt;

        private String author;

        private List<Comment> comments = new ArrayList<Comment>();

        private String title;

        private Date date;

        private boolean convertBreaks;

        private Set<String> categories = new LinkedHashSet<String>();

        private PublishStatus status;

        private boolean allowComments;

        private boolean allowPings;

        private String keywords;

        public String getAuthor()
        {
            return author;
        }

        public void setBody( String body )
        {
            this.body = body;
        }

        public void setExtendedBody( String extendedBody )
        {
            this.extendedBody = extendedBody;
        }

        public void setExcerpt( String excerpt )
        {
            this.excerpt = excerpt;
        }

        public void setAuthor( String author )
        {
            this.author = author;
        }

        public void addComment( Comment comment )
        {
            comments.add( comment );

        }

        public void setTitle( String title )
        {
            this.title = title;
        }

        public void setDate( Date date )
        {
            this.date = date;
        }

        public void setConvertBreaks( boolean convertBreaks )
        {
            this.convertBreaks = convertBreaks;
        }

        public void addCategory( String field )
        {
            categories.add( field );
        }

        public void setStatus( PublishStatus status )
        {
            this.status = status;
        }

        public void setAllowComments( boolean allowComments )
        {
            this.allowComments = allowComments;
        }

        public void setAllowPings( boolean allowPings )
        {
            this.allowPings = allowPings;
        }

        public void setKeywords( String keywords )
        {
            this.keywords = keywords;
        }

        public String getTitle()
        {
            return title;
        }

        public Date getDate()
        {
            return date;
        }

        public PublishStatus getStatus()
        {
            return status;
        }

        public String getBody()
        {
            return body;
        }

        public Set<String> getCategories()
        {
            return categories;
        }

        public boolean isConvertBreaks()
        {
            return convertBreaks;
        }

        public String getExcerpt()
        {
            return excerpt;
        }

        public boolean isAllowComments()
        {
            return allowComments;
        }

        public boolean isAllowPings()
        {
            return allowPings;
        }

        public Iterable<Comment> getComments()
        {
            return comments;
        }
    }

    public static void main( String[] args )
        throws IOException, ParseException
    {
        DateFormat importDateFormat = new SimpleDateFormat( "MM/dd/yyyy hh:mm:ss a", Locale.US );
        importDateFormat.setTimeZone( TimeZone.getTimeZone( "America/New_York" ) );

        File file = new File( args[0] );

        BufferedReader reader = new BufferedReader( new FileReader( file ) );
        String line = reader.readLine();
        boolean inField = false;
        String field = "";
        List<Entry> entries = new ArrayList<Entry>();
        Entry entry = new Entry();
        String fieldName = null;
        while ( line != null )
        {
            if ( inField )
            {
                if ( MULTI_LINE_FIELD_SEPARATOR.equals( line ) )
                {
                    if ( "BODY".equals( fieldName ) )
                    {
                        if ( entry.isConvertBreaks() )
                        {
                            field = convertBreaks( field );
                        }

                        entry.setBody( field );
                    }
                    else if ( "EXTENDED BODY".equals( fieldName ) )
                    {
                        if ( entry.isConvertBreaks() )
                        {
                            field = convertBreaks( field );
                        }

                        entry.setExtendedBody( field );
                    }
                    else if ( "EXCERPT".equals( fieldName ) )
                    {
                        entry.setExcerpt( field );
                    }
                    else if ( "KEYWORDS".equals( fieldName ) )
                    {
                        // Hmm, not in the MT docs
                        entry.setKeywords( field );
                    }
                    else if ( "COMMENT".equals( fieldName ) )
                    {
                        entry.addComment( parseComment( field, importDateFormat ) );
                    }
                    else
                    {
                        // TODO: if this were generic, we would parse PING as well
                        throw new IllegalArgumentException( "Bad multi-line field: " + fieldName );
                    }

                    line = reader.readLine();
                    if ( line.trim().length() > 0 )
                    {
                        Matcher m = FIELD_PATTERN.matcher( line );
                        m.find();
                        fieldName = m.group( 1 );
                        field = "";
                    }
                    else
                    {
                        inField = false;
                    }
                }
                else
                {
                    field += line + '\n';
                }
            }
            else
            {
                if ( MULTI_LINE_FIELD_SEPARATOR.equals( line ) )
                {
                    inField = true;

                    line = reader.readLine();
                    Matcher m = FIELD_PATTERN.matcher( line );
                    m.find();
                    fieldName = m.group( 1 );
                    field = "";
                }
                else
                {
                    if ( ENTRY_SEPARATOR.equals( line ) )
                    {
                        entries.add( entry );
                        entry = new Entry();
                    }
                    else if ( line.length() > 0 )
                    {
                        Matcher m = FIELD_PATTERN.matcher( line );
                        m.find();
                        fieldName = m.group( 1 );
                        field = m.group( 2 ).trim();

                        if ( "AUTHOR".equals( fieldName ) )
                        {
                            entry.setAuthor( field );
                        }
                        else if ( "TITLE".equals( fieldName ) )
                        {
                            entry.setTitle( field );
                        }
                        else if ( "DATE".equals( fieldName ) )
                        {
                            entry.setDate( importDateFormat.parse( field ) );
                        }
                        else if ( "PRIMARY CATEGORY".equals( fieldName ) )
                        {
                            entry.addCategory( field );
                        }
                        else if ( "CATEGORY".equals( fieldName ) )
                        {
                            entry.addCategory( field );
                        }
                        else if ( "STATUS".equals( fieldName ) )
                        {
                            entry.setStatus( PublishStatus.valueOf( field ) );
                        }
                        else if ( "ALLOW COMMENTS".equals( fieldName ) )
                        {
                            entry.setAllowComments( "1".equals( field ) );
                        }
                        else if ( "ALLOW PINGS".equals( fieldName ) )
                        {
                            entry.setAllowPings( "1".equals( field ) );
                        }
                        else if ( "CONVERT BREAKS".equals( fieldName ) )
                        {
                            entry.setConvertBreaks( "1".equals( field ) || "__default__".equals( field ) );
                        }
                        else
                        {
                            // TODO: if this were generic, we would parse NO ENTRY as well
                            throw new IllegalArgumentException( "Bad field: " + fieldName );
                        }
                    }
                }
            }

            line = reader.readLine();
        }

        System.out.println( "Creating entries" );
//        createPostsViaXmlRpc( entries );
        createPostsData( entries, new File( file.getParentFile(), "pebble-data" ) );
    }

    private static String convertBreaks( String field )
        throws IOException
    {
        BufferedReader reader = new BufferedReader( new StringReader( field ) );
        String line = reader.readLine();
        String paragraph = "";
        String content = "";
        boolean inParagraph = false;
        while ( line != null )
        {
            // a bit hack on this side
            if ( line.contains( "
" ) )
            {
                inParagraph = true;
            }
            if ( line.contains( "
" ) )
            {
                inParagraph = false;
            }

            if ( line.length() == 0 )
            {
                if ( inParagraph )
                {
                    content += paragraph;
                }
                else
                {
                    content += "
" + paragraph + "
";
                }
                paragraph = "";
            }
            else if ( paragraph.length() > 0 )
            {
                paragraph += "
" + line;
            }
            else
            {
                paragraph += line;
            }

            line = reader.readLine();
        }
        if ( paragraph.length() > 0 )
        {
            if ( inParagraph )
            {
                content += paragraph;
            }
            else
            {
                content += "
" + paragraph + "
";
            }
        }
        return content;
    }

    private static Comment parseComment( String field, DateFormat importDateFormat )
        throws IOException, ParseException
    {
        BufferedReader reader = new BufferedReader( new StringReader( field ) );
        String line = reader.readLine();
        Comment comment = new Comment();
        boolean commentStarted = false;
        String text = "";
        while ( line != null )
        {
            if ( !commentStarted )
            {
                String fieldName = null;
                String value;

                Matcher m = FIELD_PATTERN.matcher( line );
                if ( m.find() )
                {
                    fieldName = m.group( 1 );
                    value = m.group( 2 ).trim();
                }
                else
                {
                    value = line;
                }
                if ( "AUTHOR".equals( fieldName ) )
                {
                    comment.setAuthor( value );
                }
                else if ( "EMAIL".equals( fieldName ) )
                {
                    comment.setEmail( value );
                }
                else if ( "URL".equals( fieldName ) )
                {
                    comment.setUrl( value );
                }
                else if ( "IP".equals( fieldName ) )
                {
                    comment.setIp( value );
                }
                else if ( "DATE".equals( fieldName ) )
                {
                    comment.setDate( importDateFormat.parse( value ) );
                }
                else
                {
                    commentStarted = true;
                    text = line + "\n";
                }
            }
            else
            {
                text += line + "\n";
            }
            line = reader.readLine();
        }
        comment.setText( convertBreaks( text ) );

        return comment;
    }

/*
    private static void createPostsViaXmlRpc( List<Entry> entries )
        throws XmlRpcException, IOException
    {
//        DateFormat metaWebLogAPIDateFmt = new SimpleDateFormat( "yyyy-MM-dd'T'HH:mm:ss", Locale.US );

        XmlRpcClient xmlrpc = new XmlRpcClient( "http://blogs.maven.org/xmlrpc/" );

        for ( Entry e : entries )
        {
            Vector<Object> params = new Vector<Object>();
            params.add( "testimport" ); // your blog ID
            params.add( "..." ); // username
            params.add( "..." ); // password

            Map<String, Object> hashtable = new Hashtable<String, Object>();
            hashtable.put( "title", e.getTitle() );
            hashtable.put( "description", e.getBody() );
            // not used by pebble
//            hashtable.put( "dateCreated", metaWebLogAPIDateFmt.format( e.getDate() ) );
            hashtable.put( "pubDate", e.getDate() );
            // Can't use author, it uses username
            Vector<String> categories = new Vector<String>( e.getCategories() );
            hashtable.put( "categories", categories );

            params.add( hashtable );
            params.add( e.getStatus().equals( PublishStatus.Publish ) );

            String postID = (String) xmlrpc.execute( "metaWeblog.newPost", params );

            System.out.println( postID );
        }
    }
*/

    private static void createPostsData( List<Entry> entries, File directory )
        throws IOException
    {
        DateFormat fileFormat = new SimpleDateFormat( "yyyy/MM/dd", Locale.ENGLISH );
        fileFormat.setTimeZone( TimeZone.getTimeZone( "America/New_York" ) );
        DateFormat dateFormat = new SimpleDateFormat( "dd MMM yyyy HH:mm:ss:S Z", Locale.ENGLISH );
        dateFormat.setTimeZone( TimeZone.getTimeZone( "America/New_York" ) );
        for ( Entry e : entries )
        {
            Date date = e.getDate();
            File xmlFile =
                new File( directory, e.getAuthor() + "/" + fileFormat.format( date ) + "/" + date.getTime() + ".xml" );
            xmlFile.getParentFile().mkdirs();

            PrintWriter w = new PrintWriter( new OutputStreamWriter( new FileOutputStream( xmlFile ), "UTF-8" ) );
            w.println( "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" );
            w.println( "<blogEntry>" );
            w.println( "<title>" + e.getTitle() + "</title>" );
            w.println( "<subtitle></subtitle>" );
            w.println( "<excerpt>" + e.getExcerpt() + "</excerpt>" );
            w.println( "<body><!&#91;CDATA&#91;" + e.getBody() + "&#93;&#93;></body>" );
            w.println( "<date>" + dateFormat.format( e.getDate() ) + "</date>" );
            w.println(
                "<state>" + ( e.getStatus().equals( PublishStatus.Publish ) ? "published" : "unpublished" ) + "</state>" );
            w.println( "<author>" + e.getAuthor() + "</author>" );
            w.println( "<staticName/>" );
            w.println( "<commentsEnabled>" + e.isAllowComments() + "</commentsEnabled>" );
            w.println( "
<trackBacksEnabled>" + e.isAllowPings() + "</trackBacksEnabled>" );
            for ( String c : e.getCategories() )
            {
                if ( c.trim().length() > 0 )
                {
                    w.println( "<category>/" + c.toLowerCase() + "</category>" );
                }
            }
            w.println( "<tags></tags>" );

            for ( Comment c : e.getComments() )
            {
                w.println( "<comment>" );
                w.println( "<title>Re: " + e.getTitle() + "</title>" );
                w.println( "<body><!&#91;CDATA&#91;" + c.getText() + "&#93;&#93;></body>" );
                w.println( "<author>" + c.getAuthor() + "</author>" );
                w.println( "<email>" + c.getEmail() + "</email>" );
                w.println( "<website>" + c.getUrl() + "</website>" );
                w.println( "<ipAddress>" + c.getIp() + "</ipAddress>" );
                w.println( "<date>" + dateFormat.format( c.getDate() ) + "</date>" );
                w.println( "<state>approved</state>" );
                w.println( "</comment>" );
            }

            w.println( "</blogEntry>" );

            w.close();
        }

    }

}
Advertisements

2 responses to “Tools to Convert Moveable Type to Pebble

  1. An alternative is to write an app that uses the internal Pebble APIs to create the content for you … something like the MovableTypeImporter. If you run into anything else, I’m more than happy to help out with this kind of stuff. Just drop me an e-mail. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s