Author: Kaibo Liu
Editor: Kaibo Liu
In this article, I will give a brief introduction on Big Data analytics. The goal here is to provide a basic understanding for the general public without previous trainings in this field.
If using one sentence to explain this term “Big Data analytics” in my words, it refers to a scientific process of transforming from data-rich to decision-mart.
1. What is Big Data?
Big Data is often characterized by the following three “Vs” of challenges:
- Volume: The size of the data is huge
- Velocity: The speed of data collection is fast
- Variety: The category of the data is diverse
Recently, some applications also put forward an additional “V” challenge in Big Data:
- Veracity: The quality of the data is unreliable
To understand why we generate a Big Data scenario nowadays, I would recommend to take a look at the following video:
2. Among the Four “Vs”, which one is most challenging?
Although different researchers may have different opinions for this questions, personally speaking, I find the last two “Vs” creates the most challenging parts based on my research experience. For the first “V”, it is possible for us to partition the data into small pieces and analyze the each piece sequentially. Or if the constraint is the hardware, we can seek better computational resources to handle the large volume dataset. Regarding the second “Vs”, it is possible to implement some sampling techniques, either over spatial domain (i.e., only analyze partial data at each data frame) or over temporal domain (i.e., only analyze one data frame for every n data frames). However, for the last two “Vs”, there is often a lack of generic approaches. Significant efforts have to be made to tackle the challenges involved in heterogeneous data types (i.e., the third “V”) and data cleaning to improve and rescue the data quality (i.e., the fourth “V”) before any analytics technique can be implemented. Most all the data analytics algorithms assume the data is cleaned and ready to be directly used as inputs for analysis. Unfortunately, such assumption is often violated in practice and it is indeed time-consuming to pre-process the data before any analytics technique makes sense.
3. What is data analytics?
Data analytics can be classified into three categories:
- Descriptive analytics: Digest the dataset with clear visualization and summary
- Predictive analytics: Predict the future behavior of interest
- Prescriptive analytics: Make smart decisions based on the predictive results
Any data analytics need to go through these three steps, from descriptive analytics to prescriptive analytics. To make data analytics valid or effective within a company, it needs to involve at least three different people with strong communication and interactive skills:
- Business experts: who set the business objective and provide domain knowledge
- Information technology experts: who manage the database
- Data analysis experts: who understand data mining, statistical and OR techniques
Here, one interesting question needs to understand is what the difference between data mining and data analytics. Generally speaking, I think these two terms differ in terms of the purpose. Data analytics is a more objective-oriented process that aims to make smart decisions. In many cases, it has a clear goal already before analyzing the data. On the contrary, data mining focuses on identifying undiscovered patterns and establishing hidden relationships embedded in the dataset. In such case, the decision process is not necessarily emphasized.
4. Why do I care Big Data analytics?
We are drowning in Big Data, but starving for techniques on data analytics! Data Analyst is likely to be the Sexiest Job in the 21st Century from CNBC news. Rob Bearden, CEO of Hortonworks also said “The desire on the enterprise side to find truly qualified data scientists has resulted in almost open headcount. It’s probably the biggest imbalance of supply and demand that I’ve ever seen in my career. … The talent pool is, at best, probably 20 percent of the demand.”
We are lucky to be the witness of Big Data Revolution. I guess how the world functions in the near future will change significantly due to this enabling technique. Please take a few minutes to watch the following video to have a better understanding of this new revolutionary opportunity: