It's still unclear exactly how the National Security Agency (NSA) is carrying out digital surveillance on us. But we know one thing for sure: the government is collecting a whole lot of data. And privacy concerns aside, the real challenge for the NSA is not so much collecting that information but figuring out how to use it to help keep the country safe--and doing it against the clock while minimizing mistakes. "This is a big-data challenge," says Viktor Mayer-Schönberger, the Oxford Internet Institute's professor of Internet governance and regulation. "You have lots and lots of noise with a potential signal buried inside, but it's hard to differentiate the two."
With the national-security establishment still a black box--albeit a leaking one--it's worth looking at how private companies are grappling with those big-data challenges. From logistics firms trying to keep millions of packages moving on schedule to airlines trying to predict delays, businesses are working to make sense of their own vast data sets. And they're turning to specialized data-consulting firms that have the expertise to pull the signal from the noise through something called complex event processing. "There's so much data flowing around now and a huge need to analyze it," says Matt Quinn, the CTO of Silicon Valley enterprise-software firm the Information Bus Co. (TIBCO). "That's led to companies like us developing the technology to take advantage of it."
How does it work? Say you're a financial firm looking to detect fraud. You may have as many as 300,000 transactions per second, each of which can be considered an event. And each of those events has countless data points that go along with it--the size of the transaction, its type, its location. Out of that overflowing stream of data, complex event processing tries to pull out recognizable patterns that can alert you to aberrations. And it has to happen in near real time--a fraud alert that goes off days after the theft will do little to prevent loss. "What you try to do is correlate those events into something larger," says Quinn.
Sifting for patterns and trying to predict the future isn't new for businesses or governments. What's different is the sheer scale of the data that's being collected in a connected world--and the computing power available to mine that information. In 2008, engineers at the NSA began developing Accumulo, a program that allows the agency to store and analyze vast amounts of data across thousands of computer servers. How much data isn't known, though the NSA is building a sprawling $2 billion computer center in Utah. Accumulo could help the NSA make real-time connections among the data points it collects--say, linking phone calls in the U.S. to terrorist chatter overseas. "There is definitely the technology in place that allows you to search that information very quickly and draw correlations about it," says Joseph Turian, an analyst for GigaOM Research and president of the consulting firm MetaOptimize.