Tech Note: Bottlenecks Caused By Bottom Feeders
Running a site like LGF with a custom-designed content management system is a constant process of finding and opening bottlenecks. If you’re lucky, the bottlenecks result from an easily addressed database design problem.
Tonight I discovered a bandwidth bottleneck that negatively affected every page served by the hamsters in our web server: our “blacklist” of spammer websites that try to insert their URL’s into our logs. These spammers are the true bottom feeders of the Internet spam world, but their bone-headed persistence makes them just enough of a significant pain in the ass that special measures are called for.
When the MySQL table for our blacklist was originally designed, we accessed it with a query that used the LIKE operator to search for matches to the spammer list, like this:
SELECT badref FROM badrefs WHERE badref LIKE '$searchterm' LIMIT 1
Today Stinky Beaumont and I realized that this check for a blacklisted spammer was beginning to take way too much time, because the LIKE operator doesn’t scale well when the table gets larger. Thankfully, MySQL provides a much better option: a FULLTEXT lookup. So I re-indexed the blacklist table, and changed the code that checks for blacklisted referrers to this:
SELECT badref FROM badrefs WHERE MATCH(badref) AGAINST('$searchterm') LIMIT 1
Bam. Things are noticeably snappier. MySQL’s generic “LIKE” operator is fine for smaller searches, but once you have thousands of possible search targets, the FULLTEXT features of MySQL are a much better option.
We now return you to our regularly scheduled Incomprehensible Techie Open Thread.