Our Blog

Our Blog

Top 5 Takeaways from Strata


By Stephen Sklarew

I attended the Strata Hadoop conference in San Jose two weeks ago.  I got lucky, the rain clouds parted for the conference and there were only a few crowded sessions. On Wednesday, the exhibit hall opened and the keynotes began – a growing wave of energy and excitement ensued which made it very difficult to sift through all the buzz.  Here are my top 5 takeaways:

1.   AI everywhere

Almost all the keynote presentations focused on the application of Artificial Intelligence (less about Big Data) -- from Coursera in Education, to Microsoft’s CRSPR-ML in Genetics, to the National Basketball Association in Professional Sports, AI is being applied in all types of organizations at an accelerating rate.  It reminded me of the first internet wave in the late 90s when the World Wide Web was no longer just an academic thing.

I couldn’t stop thinking about how quickly all our jobs are going to change given this surge in automation. I recently picked up a book called “When Machines Do Everything” that’s quite an insightful read on this very topic.  I highly recommend it.

2.   Hadoop isn’t in the limelight anymore

While Hadoop was mentioned in a number of sessions, it was only talked about as a “distributed file infrastructure”. MapReduce was tagged as a “slow”, antiquated technology. There were many more conversations around Spark. And Google’s Tensorflow drew large crowds.

In light of Hadoop’s decreasing importance in the space, O’Reilly renamed the conference to the “Strata Data Conference”. Goodbye Hadoop, you had the limelight for a little while…

3.   So many options it’s overwhelming and there’s only more to come

Walking the exhibit hall was like going to Disney World at peak season.  People were everywhere buzzing around like bees. While there were many vendors pitching their products, most fell into three categories:

  • Big Data storage technologies
  • Streaming technologies
  •  “Everything you need to do Big Data and Data Science” platforms

The two most interesting products I found were Trifacta and Domino.  Trifacta is a “data wrangling” tool.  It allows you to explore data without having to use spreadsheets.  Domino is a data science collaboration tool that’s like the “GitHub for Data Science”.  It allows disparate Data Science teams to manage and run all their experiments in one place, compare results, and deploy their models into Cloud environments.

There was lots of talk about Kafka and all the things it will be able to do very soon.  And the AMPLab (now called “RISE Lab”) is about to release an Alpha version of its new streaming platform called “Ray” that may supplant Spark one day.

4.   All data science and technology, little user experience

Only one session in the entire conference talked about the importance of the user experience in AI systems. Unfortunately, it was the one session I didn’t get a chance to attend. Nevertheless, it amazed me that the “AI is the new UI” viewpoint really hasn’t taken hold at Strata yet. User input and output experiences are critical for full AI adoption and really no one was talking about it.

5.   Everyone is hiring

Someone wheeled in an old-fashioned cork board for companies to post their open data science and data engineering jobs.  At first there were a few flyers posted. By the end of the conference, it was survival of the fittest. The most recent posters put their flyers on top of the others and the board was a mess. Clearly, the demand for these skills is much greater than the supply.  So, if you want to hire a data engineer or data scientist (or keep the ones you have), get in line and be ready to pay for it.

The bottom line

Like the Mobile wave before it, AI is an incredibly exciting but confusing space. It has much broader application to our personal and professional lives than Mobile and many believe it’s the next commercial revolution.

If you don’t have a strategy on how to leverage your data to optimize your costs or build new revenue streams, you may be a fish out of water very soon.  And if you don’t have a productive data engineering and data science team that stays informed of the latest platforms and algorithms, your data strategy will likely fall short.