RSS

IBM.com's Man in the Hot Seat

By: Christine CanabouWed Dec 19, 2007 at 9:14 AM
Keeping IBM's Web site up and running is David Leip's nightmare-inducing responsibility. But the company's Webmaster sleeps better knowing he's built the site to keep going and going and going.

When David Leip first road tested an early version of the Internet as a computer-science grad student at the University of Guelph, he was intrigued but skeptical. He even remembers thinking to himself, "Gee, it's a pity that nothing will ever come of this cool technology." That was 10 years ago, and the entire Web, which was then text-based, could be surfed in a couple of hours.

Today, as corporate Webmaster for IBM, Leip, 34, runs one of the world's largest Web sites. Last year, IBM.com generated $9.2 billion, or more than 10% of IBM's $88.4 billion revenue. The site, which has grown to 4.5 million pages and is available in 31 languages in 59 countries, is a powerhouse version of that "cool technology" Leip glimpsed years ago.

Just last Monday, traffic to the site hit a record 861,000 home-page views, thanks to promotions surrounding the 20th anniversary of the PC. And yet, despite the heavy traffic, so far this year, IBM.com has maintained a nearly flawless Web presence -- 99.998% uptime to be exact. The site was down for a total of nine minutes last April due to what Leip calls a "silly human error." (Web sites that achieve "best-in-class" levels of service availability offer less than 9 hours of unplanned downtime and less than 12 hours of planned downtime a year, according to the Gartner Group. The company estimates that fewer than 5% of mission-critical services today reach that level of service.)

But Leip will be the first to tell you that the Web wasn't always such a hit. When Leip joined IBM as a software developer in the Toronto office in 1992, the Web was barely on IBM's radar screen. It fell to Leip and a small team of Internet evangelists to launch a grassroots effort to spread the Web gospel throughout the company.

Now, nine years later, Leip manages 22 Webmasters worldwide. And while he acknowledges that the position requires superb technical skills, he also believes that having a sense of humor is critical -- if for no other reason than to keep him and his team sane in the face of unrelenting performance pressure.

"It's a high-profile job within the company," Leip says. "That can be a good thing and a bad thing, because you take a fair amount of heat when there's a problem." Sure, the first-ever outage to the IBM home page was stressful, he says, but that was nothing compared to the time Lou Gerstner, IBM's CEO, contacted him to point out a glitch that he had discovered while surfing the site on his new BlackBerry. "In a big company like IBM," says Leip, "a call from the CEO isn't a common occurrence."

Leip's concern with site performance can be, well, a 24-7 obsession. "Sometimes, I have these horrible dreams that the site has been down for hours at a time. They're the equivalent of that childhood nightmare when you go to school in your pajamas," he says, laughing. "Then you wake up, look at your clock, and breathe a sigh of relief that it was only a dream."

"Continuous availability is not just a nice thing to have," he says. "We have customers who expect to be able to get to the site any time of the day. What makes the Web generally wonderful is that you aren't limited to calling an 800-number between 9 AM and 5 PM."

For Leip and his crack team of Webmasters, running a world-class site starts with a relentless commitment to keeping the site up and running -- no matter what it takes. "Moving traffic to IBM.com is more important than ever," Leip says. "If it can be done via the Web, most customers want to do it via the Web."

Here are Leip's tips for keeping a busy site up and running 99.998% of the time.

Build It to Keep Going and Going and Going

From the beginning, you have to build your system architecture with a high level of availability in mind. The system should be designed to run in a distributed environment, where you have multiple servers or processors within a location. That avoids having a single point that can fail. It also allows you to keep the site running while you deal with unplanned hardware failures and planned outages or maintenance. The customer isn't disturbed by changes to the site, because the architecture is engineered to accommodate those changes.

Even more reliable is a distributed environment across multiple locations, which are mirrored in real time. That ensures that the site will continue to run, even if a catastrophe strikes. If a location shuts down because of a natural disaster, such as a fire or an earthquake, IBM.com continues to serve the world.

Multiple locations also mean better performance -- faster load times -- on average. IBM.com users are served from the location closest to them. If the infrastructure nearest you goes offline for some reason, you're automatically routed to the next-closest location. IBM.com has three server facilities.

December 1969

Sign in or register to comment.
or