LGF Technical Update - Database Conversion Hijinx
It took about 9 hours to read through 24,745 flat files and import 3,625,599 comments into MySQL, but it’s done. I imported all of them up to about 7 days ago, so that when I’m ready to switch the Blog Engine to MySQL entirely (and it’s not very close yet) I’ll only have to shut down comments for a short time while the remaining stuff is moved over.
I’ve been running some tests on query speed with 3.6 million records, and so far I’m very pleasantly surprised. At first, a search for all comments belonging to a post somewhere in the middle of the stack was taking more than 14 seconds; but I realized I hadn’t indexed the entry_id field of the comments (the field that shows which post the comment belongs to). After adding that index (which took over a minute to process), the same query now executes in well under a hundredth of a second.
On the application design front, I now have to decide if I’ll switch to a completely dynamic method of displaying comments, meaning that each time a visitor loads a page, the comments for that post are pulled out of the database, formatted, and displayed. The current system rebuilds a flat file containing the HTML version of the page, every time a comment is posted, and then that page is loaded and displayed.
UPDATE at 3/16/07 1:39:29 pm:
After looking at logs and discovering that the majority of page requests at LGF are for the front page, I’m also experimenting with a five-minute cache of that front page. One effect of this cache is that the comment counters will not be updated each time you load the page; it’s possible that there will be more comments than the counters indicate.
Comments pages aren’t cached, so that you can see any comments as soon as they’re posted.
UPDATE at 3/16/07 1:44:14 pm:
Hah! The caching makes our online visitor count go way down. Interesting. I’ve disabled caching while I ponder this one.
UPDATE at 3/16/07 2:47:17 pm:
Just needed to move the caching code a little bit further into the startup process. It’s now caching the front page for five minutes again.
UPDATE at 3/16/07 3:53:24 pm:
I’ve been testing the effects of the cache with the Linux ‘top’ command, and the load on the server definitely decreases when caching is enabled.
So I’ve also enabled caching for all individual entry pages without comments (the pages that have ‘&only’ appended to their URLs). These pages will change much less often, so they’re being cached for 15 minutes. If we get a link from a high-traffic site to one of the entry pages, this should (might?) help us to stay online, since the flood of visitors will be served pages without a lot of the dynamic overhead.