I’m sitting in the “War Room” at PayPal’s headquarters in San Jose, California, talking to Sri Shivananda, the company’s VP of global platform and infrastructure. Despite the evocative name, it’s just a conference room. But one of its walls is actually a giant window—and what it reveals on the other side of the glass is anything but ordinary.
Behind the glass is a room whose other walls are plastered with enormous displays, dense with numbers and graphs. PayPal employees sit at rows of desks, staring at additional information on PC monitors. There’s data everywhere, being updated in real time. Even observing it from a distance–and not having a clue what most of it means–is an intense experience.
“This,” Shivananda says as he gestures at the window, “is basically the pulse of PayPal.”
More specifically, it’s the company’s command center–the facility where it monitors whether the PayPal service is functioning properly on a millisecond-by-millisecond basis. Just to state the utterly obvious, there’s real money at stake: With 173 million active customers, PayPal processes an average of $8,773 in payments every second.
Shivananda, whose responsibilities include managing the San Jose command center as well as similar outposts in Scottsdale, Arizona, and Chennai, India, assumed his current role only last July. But he worked at eBay for 15 years, where he helped merge the eBay and PayPal platforms after the e-commerce site bought the payment service in 2002. Much more recently, he spent nine months breaking them up again in the wake of eBay’s 2014 announcement that it would spin PayPal out into an independent public company. “It was a triple-MBA in change management, and a double-MBA in crisis management,” Shivananda says.
The split provided PayPal with an opportunity to reassess how it monitored its service’s status. At the time, the command center kept tabs on about 150 different data signals. That might sound like a lot, but in a company with tens of thousands of servers and 1,500 discrete software applications, it captured only a shallow portrait of how PayPal was faring at any given moment.
“A server has a voice,” Shivananda explains. “An application has a voice. An operating system has a voice. And they are telling you things. They are telling you their own health, and that translates into about 350,000 signals. Per second.”
The traditional, people-intensive command-center model could never scale up to deal with hundreds of thousands of data points. Being smarter about which signals to watch in the command center is part of the solution. So is teaching software bots to identify and solve problems on their own.
For all the ways in which PayPal is rethinking its response to technical snafus which can impact the service, the job still involves human beings making collaborative decisions on the fly. As problems are identified, the company’s response is managed by a team of technical duty officers, who occupy the back row of desks within the command center. Shivananda likens these TDOs to Gene Kranz, the legendary NASA flight director whose achievements included guiding the Apollo 13 astronauts to safety.
Also on the premises: specialists such as systems administrators, database engineers, and security experts. On a normal day, these staffers may be tending to other tasks as they sit in the room. But if a problem arises, they’re there to spring into action. “If you go to Mission Control at NASA, you have jet propulsion and telemetry and you’ll actually have a doctor sitting there in the back row,” Shivananda says. “It’s very similar to that.”
Another frame of reference that may spring to mind–especially if you’re a Star Trek aficionado–is the bridge of the USS Enterprise. It certainly occurred to PayPal: The company commissioned a custom-built replica of Captain Kirk’s chair and put it right up front.
What’s on the command center’s seven giant screens is based from input from the staffers themselves. It includes everything from a snapshot of the payment processing that’s going on at eBay–still a key PayPal partner even though the two companies are no longer a single entity–to breakouts of how the service is interacting with all the credit-card processors it supports.
Everything is organized with logic and legibility in mind. “The cognitive load on an engineer is minimal,” Shivananda says. “They don’t have to think where to look. They know that just like English goes from left to right, these signals go from left to right.” In the case of really serious malfunctions, the system may even pipe a verbal alert into the room via loudspeakers.
PayPal’s multiple command centers provide the company with business continuity in the case of worst-case scenarios–a virtue which isn’t just theoretical given that Chennai has recently been hit by devastating floods, which shut down its command center for a time. They also allow for coverage that’s both 24/7 and global. The service has a peak hour of activity–Shivananda doesn’t want to reveal what it is, for competitive reasons–and it kicks in multiple times a day as regions around the world wake up. “When you look at the chart, you’ll see a spike for Europe, then a spike for the U.S,” he says. “And Asia kind of blends in.”
Shivananda starts to enumerate the things that can go wrong during any given day: “There are bugs in a software update that cause errors. A team is doing maintenance on a load balancer. A database crashes. An ISP can’t take traffic. Any one of those is a chance for an error to occur.”
As PayPal rebuilt itself for a post-eBay future, it took a fresh look at the vital statistics it was monitoring for signs of trouble. “Over four months, we did a lot of analysis to figure out which are the most high-value signals,” Shivananda says. “Even though we run the science across all 350,000 signals a second, the high-value signals, we put up on the walls. Everything else goes on an alert mechanism.”
The company is also beginning to automate some aspects of addressing common problems, in instances where it was confident that applying the same fix in the same way to the same type of flaw will work every time. “If I can repeat it consistently and achieve the same outcome, I’ll put it in a bot,” Shivananda says.
Already, PayPal has created bots that are capable of patching servers on the network which have out-of-date software. “The future of the ecosystem is not visual indications,” says Shivananda. “It’s basically the science that runs says ‘This application is throwing more errors, please roll back.’ And after that, it’ll go ‘I know the signal. I know what to do. I’ll just do it.”
There’s lots more to come. “Imagine an army of software bots that will use these signals, understand the situation, know what action to take, and just take the action,” he says. “Even before a human realized that there was an issue.”
Even if bots end up doing much of the heavy lifting of keeping PayPal up and running, there’s still room to improve the human element of the process. For one thing, the San Jose command center I visited will soon be moving to a new room nearby, with higher-resolution screens capable of crisply displaying even more information, plus other technical upgrades.
Beyond that, Shivananda sees a day when the command center could go virtual, with staffers using a VR headset such as Oculus Rift to analyze PayPal’s vital signs. “The Oculus could be a virtual interface to a lot more information than what you can put on seven displays, including possibly head nods that will take you to a certain level of detail,” he muses.
No matter what sort of technology PayPal comes up with, the fact that it has 350,000 data points a second to consider means that it’s not going to run out of opportunities to make the service more robust any time soon. “That data has all the signals,” Shivananda says. “The question is, are we listening to it?”