Rajesh's Tech Blog

Wednesday, September 21, 2022

What can matter to analyze Data

A good read https://www.thehindu.com/opinion/op-ed/beyond-the-statistical-soundbites-why-data-matter/article65917243.ece?homepage=true

Tuesday, July 3, 2018

The dream of being an AI powerhouse

First, if the government is serious about AI solutions powering agriculture or healthcare, it must collect and digitise data better under its existing programs.

https://www.thehindu.com/opinion/op-ed/the-dream-of-being-an-ai-powerhouse/article24305644.ece?homepage=true

Thursday, June 15, 2017

Big data, Privacy, Big dangers

It raises eye brows and we still have lot to tell about it ;-)

Our neighbour China understood this threat and encouraged the formation of large Internet companies such as Baidu and Alibaba and thus less use of services of Internet giants Google, Amazon etc. We understand that we are not economical matched with China (at this time) and it would be better if we take some steps now to encourage big data technologies companies to set up their data centres here in India by providing appropriate subsidies such as cheap power and real estate, and cheap network bandwidth with pre-defined agreement to use data in mentioned geographical boundaries. It would be better if we start research and development activities in Big Data Science and data centre technology at our academic and research institutions which will help youth of India to understand pros and cons of access they provide to these big Internet giants and they can also think out of the box to make the Start-up India dream true.

Ref: Big data, big dangers

Enjoy Programming!

Wednesday, May 17, 2017

AI (Artificial Intelligence) and it's influence... Wait and watch!!!

As an IT professional, I'm always keen to see the developments in IT and for sure; AI (Artificial Intelligence) is going to be the big thing in future with Big data and IoT (Internet of things).

Good read: AI the next big thing

Enjoy Programming!

Friday, January 30, 2015

“big data” and “machine learning” as connected activities

People have been talking about the need for more ‘analysis’ and insight in big data, which is obviously important, because we’ve been in the 'collection’ phase with big data until now. But the innovation in the big data world that I’m most excited about is the 'prediction’ phase — the ability to process the information we’ve collected, learn patterns, and predict unknowns based on what we’ve already seen.

Machine learning is to big data as human learning is to life experience: We interpolate and extrapolate from past experiences to deal with unfamiliar situations. Machine learning with big data will duplicate this behavior, at massive scales.

Where business intelligence before was about past aggregates ("How many red shoes have we sold in Elante mall Chandigarh?"), it will now demand predictive insights ("How many red shoes will we sell in Elante mall Chandigarh?"). An important implication of this is that machine learning will not be an activity in and of itself … it will be a property of every application — in every use case imaginable — should and will become inherently more intelligent as the machine implicitly learns patterns in the data and derives insights. It will be like having an intelligent, experienced human assistant in everything we do..

The key here is in more automated apps where big data drives what the application does, and with no user intervention.

And computation of big data...

Think of big data and machine learning as three steps (and phases of companies that have come out of this space): collect, analyze, and predict. These steps have been disconnected until now, because we’ve been building the ecosystem from the bottom up — experimenting with various architectural and tool choices — and building a set of practices around that.

The early Hadoop stack is an example of collecting and storing big data. It allows easier data processing across a large cluster of cheap commodity servers. But Hadoop MapReduce is a batch-oriented system, and doesn’t lend itself well towards interactive applications; real-time operations like stream processing; and other, more sophisticated computations.

For predictive analytics, we need an infrastructure that’s much more responsive to human-scale interactivity: What’s happening today that may influence what happens tomorrow? A lot of iteration needs to occur on a continual basis for the system to get smart, for the machine to “learn” — explore the data, visualize it, build a model, ask a question, an answer comes back, bring in other data, and repeat the process.

The more real-time and granular we can get, the more responsive, and more competitive, we can be.

Compare this to the old world of “small-data” business intelligence, where it was sufficient to have a small application engine that sat on top of a database. Now, we’re processing a thousand times more data, so to keep up the speed at that scale, we need a data engine that's in-memory and parallel. And for big data to unlock the value of machine learning, we're deploying it at the application layer. Which means "big data" needs "big compute"

This is where Apache Spark [or SAP Hana] comes in. Because it's an in-memory, big-compute part of the stack, it's a hundred times faster than Hadoop MapReduce. It also offers interactivity since it's not limited to the batch model. Spark runs everywhere (including Hadoop), and turns the big data processing environment into a real-time data capture and analytics environment.

We've invested in every level of the big data/big compute ecosystem, and this remains an exciting, active space for innovation. Because big data computing is no longer the sole province of government agencies and big companies. Even though the early applications tend to show up in industries where data scientists have typically worked, machine learning as a property of all applications — especially when coupled with an accessible user interface — is democratizing who, what, and where this kind of real-time computing and learning can happen … and what great new companies can be built on top of it.

My belief is every application will be re-constituted to take advantage of this trend. And thanks to big data and big compute innovations, we finally have the ingredients to really make this happen. We’re at the threshold of a significant acceleration in machine intelligence that can benefit businesses and society at large.

Ref:http://a16z.com/2015/01/22/machine-learning-big-data/

Tuesday, April 1, 2014

Big Data & Hadoop

1. For installing Hadoop, it requires a working Java; Java 1.6 (aka Java 6) is recommended for running Hadoop.

2. You can download the latest Hadoop software.

http://apache.mirrors.hoobly.com/hadoop/common/hadoop-0.23.9/

3. After download, you need to extract the Hadoop .tar.gz file, use the the following command:

$ sudo tar xzf hadoop-1.0.3.tar.gz

4. After download Hadoop, follow the steps as described in following link to set up Hadoop as single node:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

5. After installing Hadoop You can download No SQL database like MongoDB or Cassandra

http://www.mongodb.org/downloads

6. After downloading MongoDB in windows machine, you can transfer it to Linux through winscp .

7. After transfer mongo to install MongoDB as given below link:

8. After installing MongoDB, you can download Mongo Hadoop connector, required link is give below:

https://github.com/mongodb/mongo-hadoop/releases

9. After that, you can install Mongo Hadoop connector it can be used .

After set up of Hadoop at local:

http://localhost:50070 i.e. NameNode
http://localhost:50090 i.e. SecondaryNameNode
http://localhost:8088 i.e. cluster

Enjoy Programming!!!

Thursday, December 5, 2013

Continuous integration - White paper from Zend

As programmers, it's always in our favor to learn from the experiences of others and maintaining quality work with planned efforts. I really find white paper from Zend useful in the context of continuous integration which is the core of agile software development approach and you can access it using following link:

https://drive.google.com/file/d/0ByBsxd4DRmCWdHhLbElFb3c4TFk/edit?usp=sharing

Feel free to share your experiences.

Enjoy programming!!!