Defeat Your Job Hunt Forever By Learning These Big Data Tools

Data science is changing so fast that companies are facing a shortage of experts.

Defeat Your Job Hunt Forever By Learning These Big Data Tools
[Image: Flickr user Beraldo Leal]

Sylvana Coche, a SAP reseller, had a company of 50 employees before downsizing to five employees. Stripped down to just the core management team, her company couldn’t afford to keep its technical services personnel on its payroll. The problem: Her tech employees’ skills were in such high demand that she couldn’t keep them as full-time employees.


“These days, the skills sets are: the cloud, big data, and mobility,” Coche says. And if you can call yourself an expert in one of those technologies, she says, then you belong to a very small community, one whose skills have become a commodity. And thanks to the open source nature of the projects that underpin the data processing software Hadoop, those developers are all too easy to find (and compete over).

Because of the shortage of skilled workers in big data, companies that provide these services are now burdened with trying to attract and retain tech-savvy employees. Wielding common data-management technologies–like Hadoop and SAP’s data management platform HANA–commands a skills set that only a handful of people in the world have. If a company wants to get a good techie who is well versed in these big data solutions, then it needs to find an intro to this exclusive society. Hadoop and HANA experts have their pick of the lot.

So what kind of expert are these companies searching for, exactly?

The Lowdown On Star Developers

“You usually know the good ones by name, and they usually know your company’s reputation,” says Coche, referring to developers who focus on Hadoop and HANA. SAP relies on data technologies like these because searching for a specific piece of data in these tools is quicker than putting the same search term into SAP. SAP users are better off dumping their data into Hadoop and HANA and then performing their searches. Coche needs these experts to help her clients manage these data processes.

Hadoop helps machines distribute large data processing tasks across clusters of servers. Its technology sprang from the Google File System and Google’s MapReduce algorithm, both of which are responsible for Google’s search engine’s speed. And users can easily add processing nodes to Hadoop. The New York Times and Facebook use Hadoop to process their data, and there are countless other users. HANA, on the other hand, is used for numerical data, while Hadoop only handles text formats.

Coche started her consulting company, Gravity Pro Consulting, after having steadily built up a career around SAP over the years. She has been able to secure important clients, like Southern California Edison and other utilities, while maintaining relationships with her star data software experts. They come to work for her on special projects for her clients.


“It’s better to be flexible and have a flexible workforce,” Coche says, when it comes to managing a small business. The market value of these tech experts’ skills outpaced the resources of her company.

Hadoop, specifically, has only about 25 cooperative committers, or contributors, around the world. They constantly work toward improving Hadoop, all through various organizations. Arun Murthy, one of the original cofounders of the Hadoop development company Hortonworks, says Hortonworks employs most of them.

“You don’t even have to be known as one of the big names of Hadoop, but for example, if you’re a committer of Hadoop, your market value today goes up by a factor of two or three,” says Murthy. Putting Hadoop on your resume gives you clout in the industry, with the freedom to work where you choose, even if your reputation hasn’t yet skyrocketed. But staying loyal to one company has its upside, he says.

“On the converse side of this, you definitely have much more mobility if you are an employee,” says Murthy.

It’s A Marketplace For Open-Source Experts

There are several implementations of Hadoop out there, but Hortonworks focuses on developing the open-source version that the Apache Foundation supports. Intel and IBM also have implementations, but big Hadoop clients have informally chosen Hortonworks’ version as the industry standard. And this keeps Hadoop experts around.

“The Apache Foundation is like the Switzerland of open source,” says Murthy. When tech industry giants give money to support Hortonworks’ work on Hadoop, they do it through the Apache Foundation, in the spirit of impartiality. It just turns into a pool of money that constantly fuels Hortonworks’ work. “Everyone likes Hortonworks’ model,” he says.


The industry trusts Hortonworks so much that HP recently invested $50 million in Hortonworks’ implementation of Hadoop. And SAP happens to be one of Hortonworks’ customers, complementing its HANA with Hortonworks’ Hadoop. Murthy says that this leverage in the industry helps Hortonworks attract the best committers, but he realizes that they can easily move from one company to another.

“What open source does is it opens the doors to talent which is significantly wider beyond the boundaries of the organization. What that does is it allows the cream of the crop to rise to the top,” Murthy says.

Murthy’s own work on open-source projects that used PHP opened up his professional prospects. He recalls a period in his career where he would receive phone calls from recruiters, asking him to join companies that were using PHP. One of those was Facebook, in its early stages.

He joined Yahoo in 2001 to manage operations and support there. During that time, he started working on Hadoop, when Yahoo became the first global company to use it in production. Ten years later, he started Hortonworks with 24 other ex-Yahoo engineers. When we interviewed him, Murthy was recovering from the previous night’s celebrations, a toast to Hortonworks’ third birthday and a milestone for him as a cofounder. Today, the company has grown to 400 people.

Still, he needs to sell Hortonworks to his star techies by stressing his team’s stronger points. Most of Hortonworks’ team members have been working on Hadoop for years, with so much talent cultivated in-house that the third and fourth generations of Hadoop coders are now emerging at Hortonworks. The true Hadoop experts are at Hortonworks, he says.

To boot, a lot of employees can become shareholders in the company, and Hortonworks’ projects span a variety of industries, rather than centering on one or two, which could happen at other companies. But the biggest selling point is that his developers get to work on the open-source version of Hadoop, not a proprietary one.


“More importantly, because everything we do is open, we’re not limiting the potential of your marketability,” says Murthy. “It just opens up the market if you’re an individual, if you’re doing open source,” he says.

And he doesn’t worry too much that his developers will leave after having helped them build up these skill sets in-house. In fact, Hortonworks mentors young engineers to become experts in its tech, and some have become the best-known names in Hadoop circles. Hortonworks hopes to keep replicating that.