Why is Big data becoming so popular?

Published at NLPPeople.

Big data is becoming a really popular trend of research and business developments. So it would be interesting to discuss the reasons for such popularity.

What is big data?

The classical and the most well known definition of ‘big data’ describes it as being characterised by three ‘Vs’:

  • volume;
  • velocity;
  • variety.

Big Data is a massive amount of versatile data that is constantly changing and updating. For example, monitoring the activity of users in a social network or their use of information  in a retrieval engine can be considered big data. This data can also be drawn from the enormous number of clinical trials of new medications or information about natural disasters all over the world.

There are a lot of different ways to define big data, however it is not the main topic of the article (a huge set of various definitions can be found here).

So why now?

A lot of research dealing with big data is appearing now. Plenty of startups started offering to extract value from your data and provide valuable insights. Surprisingly, methods allowing to process data automatically such as statistical ones – regression, classification, clustering have been known for many decades. So what are the reasons behind such a spike of popularity of this field? Why is it happening now?

Several explanations of this trend can be identified.

1. Technological progress

Technological progress has resulted in the fast development of computers and the computing power has increased considerably. Computing power that was previously only accessible to large companies is becoming available to a wider market.

This technological progress has also resulted in the price drop of the hardware. Whereas the computing power can be even rented at an acceptable rate. Previously one would have needed to use mainframes that would cost a lot of money to process a large amount of data. Today one can rent Amazon Elastic Compute Cloud (Amazon EC2) to speed up the computing drastically. It can rented at affordable price and for necessary time. The problem of storage is also becoming less and less critical with the appearance of cloud storage and reduction of prices for storage devices.

2. Development of infrastructure

The infrastructure for analysis of big data is developing rapidly. Software programs and various libraries within different programming languages are becoming accessible for all those who want to explore this field.

 Big-Data-Landscape-2.0

Bid data landscape 2.0

This chart shows a set of instruments already available for analysis of data. Some of them are free and can be used by all interested parties. The advent of NoSQL databases, that became an important addition to classical relational databases, was also a powerful spur to the popularity of big data.

Open source project Hadoop has also helped to make implementation of distributed computing much easier.

3. Accessibility of data

Rapid progress is being made and presently one can collect data from various sources. It is relatively easy to install sensors in all kinds of devices, indeed,  data can be collected from fridges and toasters, to say nothing of cars and phones.

Business have started to realise that data can be used to accurately predict the needs of customers which subsequently can increase profits significantly. Data is invaluable and the ability to use it correctly can be a crucial factor of success or failure in business

The amount of data is growing rapidly and we need to think how data should be processed and what kind of information to extract. When processing data one needs to ask questions, define what to look for. However, enormous amounts of data brings a new challenge of understanding what the right questions are and what information is feasible to obtain.

At present, we are not limited to one type of data, it can come in different forms: video, text, images, links between people. The amount of data is growing rapidly, so its automatic processing is becoming essential and it gives rise to more interest in such techniques as Machine Learning, Natural Language Processing and distributed computing.

Category: Big Data