Interesting interview question #1

During an interview (for a software developer position) with a major employer in the state of Utah, I was asked an interesting programming question:

Create a function that, given an integer parameter, will return the middle three digits

I mentally wiped the sweat away from my forehead: whew — not computing bigO notation,  inventing a new search algorithm, or performing my favorite song!

I verbally walked through my thought process:

There will be a need for some looping to trim off the first and last digit of the integer each iteration — perhaps a while loop with the condition that the value is greater than 99? No, there is a better way: I’ll trim the digits recursively.

I asked for some feedback on my thought process, then set out writing some code. I came up with something like this:

“Nice solution,” the interviewer replied. “How will your function handle 4501?” Uh-oh: It will trim 4 and 1, and return 50. “Hmm… What about 40251?” Darn: It will trim 4 and 5, recursing on 025, which will be interpreted as 25 and fail.

So, as it turns out, programming questions in interviews might not be about creating the elegant, recursive solution — but about asking the right questions. Thorough requirements gathering.  Lesson learned.

After the interview, I couldn’t get the question out of my mind, and spent some time giving it a proper solution: handling input less than 999, error-ing on even values (finding the middle three of an even digit value is a somewhat undefined behavior), and polishing it off with some unit testing.

See a live version here

 

From College Publisher to WordPress: My own daily WTF

Up until June 2011, The Signpost (Weber State University’s student news organization), was hosted by College Publisher — A content management system designed for university newspapers like The Signpost. In June 2011, the paper made the jump to WordPress.

The first order of business was to import past stories from College Publisher into our new WordPress hosted site. Easy right? Get a SQL dump from CP, format the columns for the WP database, upload the new SQL file. Bada-boom bada-bing. If only…

After waiting nearly two weeks for a response from College Publisher, they provided us with a temporary FTP site which contained the archived file export:

Oh, Jupiter! Why?!
Oh, Jupiter! Why?!

 

Working with large Excel files can be difficult, working with nearly 4 GB excel files can be damn near impossible — especially when working on a office computer with 4 GB of of memory. After closing all non-essential processes, I was able to open up the file — only to find 60,00 rows — many of which contained classified ads that were stored with stories, incomplete records, and other corrupted gobbledygook.

Sorting by story size, and filtering my some key words, allowed me to filter out 20,000 bogus records — bringing the Excel file down to about 40k. Upon further inspection, nearly all of these remaining stories contained duplicate headlines. A typical duplicate set contained 4 records, one of which being the whole correct story.

40k records, with 4 duplicates per unique story leaves 10k unique stories — more stories than would be feasible for one individual to sort through in a reasonable amount of time.

The eventual solution? I created SQL insert scripts from the Excel file and imported them into a MySql database. Then I whipped up a PHP  application that allows searching by article title, author, and body content. Duplicates are displayed to the user, along with a Reddit-esque voting mechanism for voting correct stories up, and bad ones down. As prior authors search for their old work, they improve the system — “crowd sourcing” the work or filtering out bad data. To date, 7,810 records have been voted up or down — 18%.

Thanks, College Publisher!

Oh, and as far as importing the stories into WordPress? I guess we can tackle that in another few years when the archives have been filtered.