From Wordpress to Confluence with Python
During the last 15 months of my tenure at Sensory Networks, I had a Wordpress blog that was only available on the internal network. Wordpress (i.e. Apache httpd and MySQL) ran on my Fedora Core 4 workstation; this was fine at the time but meant problems when I decided to leave. The blog contained over 200 posts, some of which actually contained information useful to other people in the company. These posts needed to be imported into Sensory’s Confluence Wiki.
I spent some quality time with Google trying to find a tool to do this. I found Ryan Lee’s Blogger API Client (BAC) plugin for Wordpress, and Confluence’s Blogging RPC Plugin. The nice sysadmins at Sensory installed the Confluence plugin, and I managed to get the BAC plugin working with my Wordpress installation.
Unfortunately the BAC plugin had a few limitations:
- I still needed to write code to convert from HTML to Confluence markup.
- The BAC plugin works by posting to another blog when you hit the “Publish” button in Wordpress. I wanted more of a batch conversion tool.
- The Blogger API is quite limited, for example it does not provide a way to attach a file to a post.
I decided to write my own script. I borrowed the regular expressions in this PHP script to do the markup conversion, and used Python’s xml rpc library to talk to Wordpress via it’s metaWeblogApi and talk to Confluence via its Remote API.
I also extended the markup conversion capabilities from the original PHP script. Tables and lists are supported, although list nesting will be lost. Images attached to the Wordpress post are fetched with w get and then attached to the Confluence post. The image handling is a little dodgy but worked well enough for my purposes.
The script can be downloaded here. I’m not planning to make it work any better since I no longer have any use for it, but if you want to use it and have problems with it, let me know. I don’t mind improving the script if it will help somebody out.
Postscript: Further Googling has revealed that that there is a Perl module HTML::WikiConverter that can convert from HTML to quite a few different Wiki markups.