From College Publisher to WordPress: My own daily WTF

Up until June 2011, The Signpost (Weber State University’s student news organization), was hosted by College Publisher — A content management system designed for university newspapers like The Signpost. In June 2011, the paper made the jump to WordPress.

The first order of business was to import past stories from College Publisher into our new WordPress hosted site. Easy right? Get a SQL dump from CP, format the columns for the WP database, upload the new SQL file. Bada-boom bada-bing. If only…

After waiting nearly two weeks for a response from College Publisher, they provided us with a temporary FTP site which contained the archived file export:

Oh, Jupiter! Why?!
Oh, Jupiter! Why?!

 

Working with large Excel files can be difficult, working with nearly 4 GB excel files can be damn near impossible — especially when working on a office computer with 4 GB of of memory. After closing all non-essential processes, I was able to open up the file — only to find 60,00 rows — many of which contained classified ads that were stored with stories, incomplete records, and other corrupted gobbledygook.

Sorting by story size, and filtering my some key words, allowed me to filter out 20,000 bogus records — bringing the Excel file down to about 40k. Upon further inspection, nearly all of these remaining stories contained duplicate headlines. A typical duplicate set contained 4 records, one of which being the whole correct story.

40k records, with 4 duplicates per unique story leaves 10k unique stories — more stories than would be feasible for one individual to sort through in a reasonable amount of time.

The eventual solution? I created SQL insert scripts from the Excel file and imported them into a MySql database. Then I whipped up a PHP  application that allows searching by article title, author, and body content. Duplicates are displayed to the user, along with a Reddit-esque voting mechanism for voting correct stories up, and bad ones down. As prior authors search for their old work, they improve the system — “crowd sourcing” the work or filtering out bad data. To date, 7,810 records have been voted up or down — 18%.

Thanks, College Publisher!

Oh, and as far as importing the stories into WordPress? I guess we can tackle that in another few years when the archives have been filtered.

Installing WordPress on a Windows Server (not through Gallery)

I administer a Windows server which runs multiple instances of WordPress.  Today I needed to install another instance, but the Gallery Installer through IIS kept failing. I resorted to installing manually, and wrote up this guide to installing WordPress on a Windows server manually. Hopefully this is what you are looking for!

This guide makes two assumptions:

  1. PHP is already installed on your server (PHP Install Guide)
  2. MySql is already installed on your server (MySql Install Guide)

1. Download the latest stable release of WordPress

The latest release is available for download a http://www.wordpress.org/download. Choose the .zip package.

2. Extract the .zip file

Locate and extract the .zip file (named “wordpress-[versionnumber].zip” to the location of choice. Typically this will be in your “inetpub>wwwroot” directory. Rename the default folder “WordPress” to your site name.

3. Add site in IIS

In IIS, right click on your server name in the treeview, and choose “Add Web Site…”. Enter your site name and path into the Add Site window. The important inputs here are the path and site bindings (domain name).

4. Create & Configure Database

Open the MySql command line tool, and log in using the admin credentials. First create a new database for WordPress to use:

Next, add the admin user to the new database:

Note: this code does include the quote marks

5. Set up wp-config.php

Wp-config.php is the file that contains all the critical settings for WordPress including database connection information. Locate wp-config-sample.php and rename it to wp-config.php.

Open wp-config.php using your IDE of choice, and make the following changes:

6. Final Configuration

Browse to the new WordPress installation. If your domain is not already pointing to the site (bound in step 3), the easiest way to open your site is to add an entry to the windows hosts file. The hosts file is located at “Windows > system 32 > drivers > etc”, named “hosts.” Open this file using notepad (right click, run as administrator), and add the following entry

Once this is saved, browse to the new WordPress installation using your browser of choice, and follow the onscreen prompt to create a user, name your site, etc.