It’s shocking to be reminded of how little people understand software. After the Nasdaq stock exchange paused trading for three hours on August 22, an incident now being called the "flash freeze," wild rumors flew through the press, speculating on what had caused the problem. At first, Nasdaq attempted to point fingers at various trading companies, arguing that the issues stemmed from someone else’s network, not their own. In a press release last night, the group finally took responsibility for the freeze, and offered a pretty decent explanation.
So what’s the deal with the comments that Nasdaq CEO Robert Greifeld gave large international newspapers? He told the Wall Street Journal: "We know there's going to be issues with software. We know there's going to be bugs." And he was quoted in the New York Times as saying, "We certainly understand that code has this nefarious way of working and then not working."
Ah, yes. Nothing to see here. Just that crazy code being crazy as usual. To be fair, there are probably only a couple of people on staff at Co.Labs who could remotely begin to understand or interpret the code underlying Nasdaq trading. I’m not one of them. And given Greifeld's history in financial technology, he probably understands more than he's letting on. But there's a significant disconnect between his explanations, which come across as either ignorant or deliberately dumbed down, and the descriptions of the problem in the press release, which make sense and should be perfectly understandable to most people—one part of the network sent too many requests, and that overloaded the entire system’s capacity:
In January 2013, a regularly scheduled systems capacity test showed the SIP system was capable of handling approximately 500,000 messages per second across 50 of the SIP system's ports. . . On August 22, the SIP received more than 20 connect and disconnect sequences from NYSE Arca . . . Available capacity was further eroded as the SIP received a stream of quotes for inaccurate symbols from NYSE Arca, and generated quote rejects. Both of these actions served to degrade the system below the tested capacity of 10,000 messages per per-port, per second . . . [and] exceeded the SIP's planned capacity, which caused its failure and then revealed a latent flaw in the SIP's software code. This latent flaw prevented the system's built-in redundancy capabilities from failing over cleanly, and delayed the return of system messages.
From a developer perspective, the outage raises an important question about continuity of service. Obviously, users will only be loyal to a service they feel they can rely on, especially where investments are concerned. But the Nasdaq outage, which was resolved within a few hours and contributed to only minor economic losses, shows that "going down" may not be a death knell so long as it can be resolved quickly and responsibly and doesn’t happen often. Though it is obviously significant in trading because blackouts are an unpredictable risk, the Economist points out that Amazon and Google both had blackouts in the last few weeks that were totally unmemorable: "Zero tolerance of failure, which applies to airlines, bridges and tunnels is not so vital for electronic operators and financial firms."
That means that tech and financial companies shouldn’t be afraid of admitting error, and should instead build trust by explaining what happened and what they learned. To their credit, after dodging blame for a while, the explanation for the outage in the press release makes sense.
So why is Nasdaq’s CEO resorting to arcane descriptions of code as an untamable beast? Greifeld might be trying to reach a broader audience, or he might be trying to obscure the issues that caused the outage by resorting to vagaries. Either way, descriptions like these create the sense that companies aren’t in control, and that tech literacy isn’t important. In fact, computer and code illiteracy is becoming an increasingly noticeable problem in the U.S. and worldwide.
Technology moves so quickly and humans so slowly that we’re still catching up with the realization that most workers today need at least a basic understanding of computer systems to be competitive. If computer literacy is going to become a priority in education, society needs to view code as productive, applicable, and conquerable. When downtime happens, leaders should be able to explain why. You don’t have to devote your life to analyzing Shakespeare to be able to say that you know how to read.
[Image: Flickr user Valsts Kanceleja]