IBM is making one of the biggest investments in history in big data software. As of this week, the tech giant has committed more than 3,500 researchers and developers to the Apache Spark project, along with hundreds of millions of dollars in funding and resources. The open-source Apache Spark initiative makes it possible to analyze massive amounts of data in near-real time, and is used in everything from health care to retail to energy utilities. It is a transformative move for a host of industries–and one that also positions IBM as the vendor of choice for those companies using Spark.
“We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” IBM Analytics general manager Beth Smith said in a press release. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”
The Spark initiative is a bit more complicated than a typical cash infusion. It includes building the company’s analytics and e-commerce platforms around Spark, integrating Spark into its Watson health care product, making IBM’s machine learning technology open-source, offering Spark cloud services, committing more than 3,500 employees to Spark research projects, and educating more than 1 million engineers and data scientists through open online courses.
Spark was originally developed at the University of California, Berkeley, and many of the researchers who first created it moved on to a startup called Databricks, which is working closely with IBM. The Spark pivot means that IBM is on track to become a dominant player in the big data vendor industry.