December, 2006 Archive
December 29th, 2006 by chris in Regular Stuff
LibraryThing is growing, in many ways, but I’m going to focus on the technological aspects of that growth.
When I started back in June, our MySQL server was using MyISAM tables - a single server keeping the entire site running. That day, we were being hammered by the Wall Street Journal and had 89,108 visitors. For two months I was killing myself trying to keep the site running while simultaneously attempting to build confidence at a new job - two extremely challenging tasks that should never be combined. I was losing this uphill battle, and very nearly gave up on LT completely… then Tim had to take off for a few days and I knew this was my chance to do some major retooling without him noticing that he couldn’t get work done. That was August 30th, we had 78,061 visitors that day. Yesterday, three months later, we had 166,108 visitors. For perspective, that’s just over 25% of the total for the entire month of February.
The database was 5.3GB in size when I started, today it is just about to cross the 25GB mark. We have added five million cataloged works, eight million tags, 82,000 users, and numerous additions to the site that put extra strain on the database servers. So, now here’s the inside scoop about how we’re dealing with our growth. I’m going to do something unusual and split this with a “More” link, so…
Read the rest of this entry »
»
December 28th, 2006 by chris in Regular Stuff
Not many people read this blog. I don’t think it’s only because the content is dry, but because nobody really knows it’s here. I don’t go on a mad publicity junket or try to put myself on digg or anything, I don’t even tell my friends or family that this exists, mostly because it’s not about me personally. I also don’t update this enough, although it’s not for lack of thought - it’s just that I have enough to do at LibraryThing to keep two people plenty busy. For those of you who do read this, you may actually enjoy today’s entry, so please read on!
I’ve been trying to come up with ways to spice up my blog. I want people to link to me, because according to Technorati, nobody does! So, starting after the 1st of the year, I’m going to begin posting fixes to actual problems that we’ve had at LT. Eventually, I think I’d like to turn it into a more community-based resource for programmers, sysadmins, net admins, and other technical professionals to post real-world problems and solutions, but make them search-able and easy to access from the outside world. As useful as Google is for finding solutions to problems, I often have to go to a ton of different sites just to find one useful answer - mostly going through sites with people who are asking a question and not getting a response. The idea here is to post the question, but only post ones with answers to the public. It’s just that I don’t see a reason to post redundant unanswered questions - I want to become a public resource for even non-technical people so they can solve problems on their own without having to contact the help desk.
Some common problems that people may have to solve would be MySQL error 1062 during replication. Or the Grub bootloader losing its mind after a BIOS upgrade on a Tyan-based Opteron server. Or how to integrate InnoDB and Lucene to create a nice-working full-text search engine without having to compromise the integrity of your data. How about why certain transparent PNG’s don’t work on Flickr? Solutions to these problems, while neither difficult nor particularly unusual, are often difficult to solve because the information needed isn’t available or has become obscured in a sea of flaming, one-upsmanship, and general ego massaging. I don’t want any of that, as Joe Friday used to say, “Just the facts, ma’am”.
Worth a shot? Initially my stuff will just get posted to this blog in its own category with byline links in between the main stories. Once I have enough of them, I’ll merge them into their own site. If you contribute some, I’ll start sooner :-) but in the meantime, any contributions will be posted in the manner I mentioned. I’ll also provide message group areas and other community features such as personal blogging tools - so other people can post dry stuff like my blog! - but the core Q&A portion of the site will be moderated. Well, that’s it for now, what do you think?
»
December 27th, 2006 by chris in Regular Stuff
Finally, a nice framework to keep developers from jumping ship to Rails.
I’m not a big fan of using frameworks for these high-level languages because I already feel pretty detached from my roots as it is. Adding another even higher-level layer just seems to distance me further than I ever wanted to be, but I have to say, the Zend Framework has made me change my tune a little. When I was beginning to work with Lucene, I was afraid I was going to have to complicate the LibraryThing site code by adding java directly to the mix. This would never do because Tim doesn’t program Java and learning it would be way more of a hindrance than a help. Let’s face it, we have a business to run here, and to that end, we have to produce a working product that we actually know how to do manage. Zend has managed to obscure the Java portion of Lucene enough that even if something were to happen to me, Tim would be able to figure it out in a very short period of time. Even though he doesn’t like object-oriented code, ZF is well documented and its developers are easy to reach.
To that end, I think I may have found the first project that LibraryThing would be able to contribute back to on a regular basis. I have made numerous adjustments to the framework, most notably in Zend_Search_Lucene and Zend_Filter_Input where some interesting items were either left out or implemented strangely. We also may be able to contribute some extended documentation in the form of usage examples for high-usage/availability applications. With all of that said, I need to disclose that they haven’t even made a formal release yet - the code is pre-1.0 - but it’s remarkably stable even for a released library.
I am looking to try out the database abstraction layer to see how it compares to PHP’s own Pear libraries. And, in other things related to LT, I’d like to try the Service portion of the framework for its direct support of Amazon and Google (among other) features. It also has direct support for Flickr, which could be an interesting place to experiment with user-contributed images. Of course, that’s not a promise of a new feature of function, I’m just thinking out loud. Believe me, using the framework in production for normal site operation would require me jumping a lot of human hurdles that wouldn’t be worth it in the end.
Another thing I like is that it’s integrated neatly into Zend’s own IDE: the Zend Studio. The tight integration of Zend’s products has really helped PHP a lot more than people give them credit for, but at the same time, I think that products like ActiveState’s Komodo are more feature-complete and independent-developer oriented than Zend’s products. I only wish I could turn off font smoothing (anti-aliasing) in Komodo’s editor environment, and get it to read the code I’m working in to provide code hints. But, I digress. As much as I hate sounding like I’m pimping things, it should be known that I am always more than open to trying something new and possibly better than what I’m using.
The only major concern I have here is that it seems a little expensive to run at the moment. A lot seems to go into the loading and management of the framework, and its memory requirements are significant. I don’t see how a site the size of LibraryThing could use it continuously as a major portion of their operating site. As it is right now, I only use the Zend_Search_Lucene functions from a scheduled execution script, and its memory requirements and startup times are less than stellar.
»
December 26th, 2006 by chris in Regular Stuff
I don’t work at a job that most people (myself included) would consider “tip worthy”. Heck, forget “most people”, I don’t think anyone would think my job deserves a gratuity of any kind. Marie’s job is totally tip worthy, but outside of the initial bunch of holiday gratuities, she basically got stiffed. I’m not talking about occasional clients, either, I’m talking about regulars who use her service daily and call at the last minute (literally!) with scheduling changes. A lot of times, these people won’t even call to cancel an appointment, which means time that could have been freed up for a deserving client is now becoming a black hole: most of these (wealthy) clients refuse to pay.
So, imagine my surprise when I got up Christmas day and found that I had received a thank you card in the form of what could only be interpreted as a gratuity. A LibraryThing member sent me an Amazon gift certificate. I was floored. I’m still floored. I sent that member a thank you note, but it just wasn’t enough to express my gratitude to this person for thinking of me. I have never met this person in real life, I won’t mention their real name or their LT user ID, but I just want them to know how much I appreciate their gift. Thank you.
»
December 21st, 2006 by chris in Regular Stuff
You know, I’ve been asked often about LT’s infrastructure. I finally decided to write it out, but I posted it in an LT message board thread instead of here where it belongs, so I’m going to repost it here because it’s something I’ve been wanting to talk about for a long time.
LibraryThing consists of 7 servers, but they’re not all used for serving pages. All our servers run Linux (mixed between Fedora Core 5 and FC 6), all of our clients run MacOS, and I have VM installs of Windows Vista (RC), Windows XP, Ubuntu, and FC6 that run on my Macbook Pro. Those are used solely for testing.
We have one server with a good-sized hardware RAID-5 (10K SAS drives) that comes in very handy for keeping our data integrity high, 8GB RAM, and redundant power supplies to redundant UPS.
Another server is built for speed and has six 15k RPM SAS drives on a hardware RAID-0, 16GB memory, Dual 3GHz Woodcrest CPUs, and also the same redundant power as the first server.
We have two smaller server that act as database slaves for performing API functions, “thinking”, and backing each other up. They keep old MyISAM versions of our database for some remaining FULLTEXT queries that I’m currently working to replace with Lucene. Until then, I have some pretty constant maintenance to do on those machines, and still need to keep a pager running for those middle-of-the-night breakdowns that constantly happen.
Yet another machine is nothing more than a live backup slave. It’s basically a 286 running DOS compared to the other servers. All it does (and can do) is store a backup copy of the master database so in case of emergency we always have something standing by for data recovery. I can’t send queries at it because it’ll hang there for hours. It’s really pretty sad that only 6 years ago that would have been the fastest machine on the planet.
There are two other machines in the process of being re-purposed, each with 4GB memory, dual Opterons, and 4-drive SATA RAIDs. At one time, those machines used to be LibraryThing! Now, they’re going to be … well… something we haven’t decided on yet. Right now all they do is consume electricity and take up space in the rack.
Our server software environment is highly customized. I do a lot of builds by hand of our core services and make many changes to libraries and other stuff. I tend to blog about these things over on my personal site (yes, right here!), but I haven’t been doing a very good job keeping up with it recently because I’ve been so busy!
In the past I have been in charge of much larger networks, many more servers, and OS mixes that still make my head spin to this day. I can still very easily maintain a Windows Server network (AD or not) but am much happier to be back to an all-Linux playground.
»
December 16th, 2006 by chris in Regular Stuff
You’re never going to believe this… not in a million years… I snapped this photo earlier today when I was out walking my dogs. Yes, today, the 15th of December, and I was in my home town of South Portland, Maine (43.661N -70.255W). I wish I had a newspaper with me because nobody is going to believe this, but here’s what I found:
There were a few more nearby, but I had to chase my dogs and forgot to take pictures when I was back. I have no doubt they’ll still be around tomorrow and I’ll definitely go out and take more photos with my own camera. I used Marie’s little point-n-shoot to take this, but I want to use my nice Rebel XT (350D) to get a good high-res version.
The bottom line? Who says the Bush policy on global warming is all bad? If you ignore the Earth’s warnings, drive your car around to your mailbox, and burn tires in your back yard, you get pretty little flowers that you can make tea out of.
»
December 13th, 2006 by chris in Regular Stuff
Disclaimer: I work for a company with the word “Beta” in the site’s masthead.
Not to sound like Jerry Seinfeld, but what’s the deal with “beta” websites? I don’t see beta as being a very good descriptive term for most of the sites that have the label attached to them. It is connotes that the site isn’t quite ready for prime time, but they’re still trying to make money on it anyway. You know, I like working for a work in progress, but the term needs to go. It’s become totally over-used, the meaning of the term beta has been diminished severely. An example would be LibraryThing itself: we’re not really beta anymore. We have a working, vibrant, diverse, and dedicated community of members, recognition from the press, and a viable product that has more inherent value to it than many final-version pieces of software I personally have used. It’s alive, growing, evolving*, and learning. Sounds familiar doesn’t it? Beta is an idea, a proof-of-concept, a theory being played out.
I think a new term may be in order for sites such as this. Sites like LT are market leaders, trend setters, businesses in their own right, and worthy of a much more appropriate distinction. Beta is just a synonym for untested, unready for public consumption. Certain LibraryThing features may not be completely tested, but labeling the entire site that way is either a cop-out, or inappropriate. Now, I don’t mean to unfairly pick on LT, it’s just the easiest one to point a finger at because I have access to it from all sides. Calling LibraryThing “alive” like I did before may be stepping across a line, but saying that it has its own presence and sense of being is appropriate.
Personally, I like to blame Google :-) I think they pioneered the whole “beta” labeling for the mainstream and a lot of non-programmers picked up on this because it sounded cool. Before anyone says I’m taking a dig at Tim, I want it to be clear: I’m not! A lot of people who had great ideas and no money learned to make these ideas come into fruition on their own, which i admire greatly. In some cases, they learned their programming by themselves, out of books, from websites, etc. without any prior experience in the industry. Google didn’t have this excuse. I know not to label something as a beta until it’s really being tested, but a lot of these really bright people who made “Web 2.0″ sites did not. An example would be that my Search code for LT is in alpha, it’s not being formally tested, and it’s certainly not ready for outside eyes! Heck, its barely ready for inside eyes. Beta phase will be when I open it to a select few people to assist me in testing and searching out some of the final bugs. Then, we go to release, when “real people” begin using it and any remaining bugs get that may have slipped through testing will get squashed.
By the way, the alpha is going well, but I’ve found some shortcomings that may force me to re-think how I handle the indexing and take it out of real-time mode.
*unless you live in Kansas.
»
December 12th, 2006 by chris in Regular Stuff
I have been doing everything I could to stay away from using Java. I even make fun of the commodity Java programmers who live near me. But, I finally found a use for it that’s made me sing a different tune: Lucene. Woweewow! I’m not done learning what I want to know, but if early indications mean anything, then I think I may have finally solved a long-standing issue at LibraryThing: Full-text search for users, groups, and the entire site in general.
I suppose a little back story* would be in order here, or this just wouldn’t be one of my entries. A few months ago, during the thick of the worst database issues LT has had, I had proposed switching away from MyISAM to InnoDB in MySQL. That presented the distinct challenge of maintaining full-text searching on live, ever-changing data but also having a database that wasn’t going to die every few minutes. At the time, I just wanted the easy win, so I converted two machines to Inno, and left two machines as ISAM, that way I could do the bulk of the work on the Inno tables and do anything that required some of the ISAM features on the other slaves. This is, of course, a huge waste of computing power. The way I saw it, I had to make a call, and the call was to go for reliability so that we would still be around in 6 months to making this faster and more scalable. Well, here we are!
So, a few ideas got bandied about the table over at the LT world headquarters. Tim had has solutions, I had mine. I threw away a couple of mine pretty quickly, and then dissed the Lucene solution as being too much of a pain and probably too fallible. Then I played with it some more on a larger scale and found out that the more I threw at it, the better it performed (relatively**) so I had to make a decision. I got online, sent Tim an email, told him that I think I may have been wrong about something and that I strongly recommend we take a serious look at using Lucene on the site. Since he was busy trying to live the impossible dream of writing his own full-text parser in PHP, I think I did him a favor. I’ve been where he is, and I know what he was going through: I didn’t envy him. He was trying to roll out in 3 days what would really have taken 3 months to write. “Agile” development isn’t always careful development. Our members have very high expectations of our product, and I don’t think it would be good to give them any less than our effort to make the best possible experience for them.
So, tonight I’m working on writing some wrapper functions to handle searches, updates, inserts, and deletes on the Lucene indexes. I use the Zend Framework to actually interface with Lucene through PHP, but it’s a development release and not the easiest to navigate. At the very least I can wrap this up into a neat little package so we can deploy quickly even if the initial development was a little slower than I would have liked. To be fair, I have put in an easy 60 hours since last Friday on this problem. It’s like when I was addicted to Tetris: even when I wasn’t playing, I was still thinking (even dreaming) about it. Don’t tell me you haven’t done the same thing, maybe not with Tetris, but you know it’s happened before!
The bottom line: let’s see how it does. In the future I’ll document the wrapper I wrote and maybe even put it online if Tim goes for it. There’s nothing proprietary about it, so I don’t think it’ll be a problem, and it will hopefully help lower the barrier for entry to the Lucene world for other site admins without the time to learn entirely new APIs. If you would like to learn about it, I recommend Lucene in Action because it really has some pretty slick examples and speaks plainly about a very complex subject.
A couple of the next entries I have coming up are: The Entry About Six Months, and The Entry About Zend Framework.
*Think “LOST” for nerds.
** Relatively speaking, because it took .32 seconds to find “programming” with 20,000 records and .61 seconds to find “programming” with 2.5 million records. I’ll take that hit, thank you.
»