Rajesh's Tech Blog: “big data” and “machine learning” as connected activities

People have been talking about the need for more ‘analysis’ and insight in big data, which is obviously important, because we’ve been in the 'collection’ phase with big data until now. But the innovation in the big data world that I’m most excited about is the 'prediction’ phase — the ability to process the information we’ve collected, learn patterns, and predict unknowns based on what we’ve already seen.

Machine learning is to big data as human learning is to life experience: We interpolate and extrapolate from past experiences to deal with unfamiliar situations. Machine learning with big data will duplicate this behavior, at massive scales.

Where business intelligence before was about past aggregates ("How many red shoes have we sold in Elante mall Chandigarh?"), it will now demand predictive insights ("How many red shoes will we sell in Elante mall Chandigarh?"). An important implication of this is that machine learning will not be an activity in and of itself … it will be a property of every application — in every use case imaginable — should and will become inherently more intelligent as the machine implicitly learns patterns in the data and derives insights. It will be like having an intelligent, experienced human assistant in everything we do..

The key here is in more automated apps where big data drives what the application does, and with no user intervention.

And computation of big data...

Think of big data and machine learning as three steps (and phases of companies that have come out of this space): collect, analyze, and predict. These steps have been disconnected until now, because we’ve been building the ecosystem from the bottom up — experimenting with various architectural and tool choices — and building a set of practices around that.

The early Hadoop stack is an example of collecting and storing big data. It allows easier data processing across a large cluster of cheap commodity servers. But Hadoop MapReduce is a batch-oriented system, and doesn’t lend itself well towards interactive applications; real-time operations like stream processing; and other, more sophisticated computations.

For predictive analytics, we need an infrastructure that’s much more responsive to human-scale interactivity: What’s happening today that may influence what happens tomorrow? A lot of iteration needs to occur on a continual basis for the system to get smart, for the machine to “learn” — explore the data, visualize it, build a model, ask a question, an answer comes back, bring in other data, and repeat the process.

The more real-time and granular we can get, the more responsive, and more competitive, we can be.

Compare this to the old world of “small-data” business intelligence, where it was sufficient to have a small application engine that sat on top of a database. Now, we’re processing a thousand times more data, so to keep up the speed at that scale, we need a data engine that's in-memory and parallel. And for big data to unlock the value of machine learning, we're deploying it at the application layer. Which means "big data" needs "big compute"

This is where Apache Spark [or SAP Hana] comes in. Because it's an in-memory, big-compute part of the stack, it's a hundred times faster than Hadoop MapReduce. It also offers interactivity since it's not limited to the batch model. Spark runs everywhere (including Hadoop), and turns the big data processing environment into a real-time data capture and analytics environment.

We've invested in every level of the big data/big compute ecosystem, and this remains an exciting, active space for innovation. Because big data computing is no longer the sole province of government agencies and big companies. Even though the early applications tend to show up in industries where data scientists have typically worked, machine learning as a property of all applications — especially when coupled with an accessible user interface — is democratizing who, what, and where this kind of real-time computing and learning can happen … and what great new companies can be built on top of it.

My belief is every application will be re-constituted to take advantage of this trend. And thanks to big data and big compute innovations, we finally have the ingredients to really make this happen. We’re at the threshold of a significant acceleration in machine intelligence that can benefit businesses and society at large.

Ref:http://a16z.com/2015/01/22/machine-learning-big-data/

3 comments:

Anonymous said...: I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Spark TECHNOLOGY, kindly contact us http://www.maxmunus.com/contact
MaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ pieces of training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Pratik Shekhar
MaxMunus
E-mail: pratik@maxmunus.com
Ph:(0) +91 9066268701
http://www.maxmunus.com/; May 17, 2017 at 3:54 AM
Unknown said...: This comment has been removed by the author.; July 6, 2017 at 11:22 PM
Joe Fung said...: Machine learning offers the accuracy, scale and speed needed to fully analyze the data that the organization plans to assemble. Because a larger variety of large data can be analyzed, the boundaries of many companies expand to unlimited possibilities. When you need analysis from various databases in real time, machine learning becomes an indispensable tool that an enterprise can not ignore. In many respects, machine learning is already integrated in many aspects of our life, we just do not notice it. Online recommendations (Amazon or Netflix) are a product of machine learning algorithms. In real time, ads found on websites and mobile applications come from the analyzed data of numerous sources with machine learning. Even spam e-mail filters are a form of machine learning. Please find out more at www.activewizards.com; July 18, 2017 at 12:17 AM

Rajesh's Tech Blog

Friday, January 30, 2015

“big data” and “machine learning” as connected activities

3 comments:

Search This Blog

Blog Archive

About Me