Hoist by My Own Petard

The former incarnation of this website was a homegrown godawful mess of a CMS that I cobbled together using PHP and MySQL. It did a so/so job of separating content from presentation.

Not that I bothered to get a backup of the actual MySQL database before abandoning the site. I think it was one of those things that got lost in the shuffle of preparations for the move back east.

I did manage to grab a more-or-less complete mirror of the old site with wget, which enabled me to brute force my way through the old static HTML files (thankfully generated using a somewhat-thoughtfully commented template) and kluge the contents into Movable Type's import format.

I took a closer look at my archive files tonight and realized that all of the old photo pages are intact as well. The gears started turning, and here it is 1:21 AM and I'm up to my elbows in a procedural PHP script to process the old pages into something I can reincorporate into my new site without too much trouble. My plan of attack was something like this:

  1. Find a text pattern distinguishing the photo pages and retrieve a comprehensive list using grep
  2. for each file in that list, read the contents into memory and use regexes to extract the title, photo filename, and description data.
  3. Read the EXIF data from the photo file to get the photo date and time
  4. Generate thumbnails using GD
  5. Assemble the resulting data into a Movable Type import file
  6. Upload all the photos and thumbnails
  7. Run the import
  8. Rebuild and revel in how L337 I am

The only catch is that the FileDateTime info in the headers was set to the date upon which I copied them to my desktop from whatever old CD-ROM I found them on, not the actual dates of the photos.

Fortunately the dates are contained in the photo pages themselves - a bit more regex hacking will take care of that. I don't have the times, so by miraculous coincidence each one will appear to have been taken at 12:00 PM precisely.

Some of the thumbnails are missing, so I'm now in the process of downloading the ImageMagick binaries for Darwin... I could probably put together a PHP script to do it with GD in the time the download will take, but I don't feel like burning the mental energy.

The biggest problem of the whole process will probably be uploading all these photos to the server.

Syndicate content

Twitter

  • Weirdness with #Drupal site time, server time, and format_date() applied to dates stored in the DB? Check MySQL's timezone too. 4 years 29 weeks ago
  • The key to #drupal's format_date() is that it assumes your timestamp is in sys time; adjust dates accordingly before invocation. 4 years 30 weeks ago
  • CRAP why did Ikea discontinue the Fira drawers? 4 years 30 weeks ago
  • Hey AT&T, what happened to my 3G signal? 4 years 30 weeks ago
  • Rene Raincourt singing the anthems. I can't help but think this is a good omen. #bruins #fb 4 years 30 weeks ago

Older

Contact

Andy Chase
(978) 297-6402
andychase [at] gmail.com
GPG/PGP Public Key