• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

What I Always Wanted To Know About Big Data* (*but was afraid to ask)

Ramesh Dontha / 5 min read.
February 20, 2017
Datafloq AI Score
×

Datafloq AI Score: 88.33

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/zct95

When I first heard the term Big Data a few years ago, I didnt think much of it. Soon after, Big Data started appearing in many of my conversations with many of my tech friends. So I started asking a very simple question ‘What is Big Data?’. I kept asking that question to various folks and I did not get the same answer twice from any number of people. Oh, its a lot of data. Its variety of data. Its how fast the data is piling up. Really? I thought to myself but was afraid to ask more questions. As none of it made much sense to me, I decided to dig into it myself. Obviously, my first stop was Google.

When I typed Big Data at that time, this showed up.

Big Data

Ahh, It all made sense right away. None of the people I was talking to really knew much about Big Data but were talking about it anyway as everyone else was talking about it.

In this series of articles, I am planning to write on Big Data, my target audience is those people who come across the term Big Data but dont live and breathe Big Data on a daily basis for their regular jobs. If you are one of those practitioners making a living off of Big Data, these articles may be rudimentary for you. Decide whether you are on that spectrum and decide to either continue reading or leave now. Ill not be offended.

So what Really is Big Data?

I turned to my trusted old friend Wikipedia and it said:

Big Data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing application.

So Wikipedias definition is focusing on volume of data and complexity of processing that data. Good start but doesnt clearly answer the question of what is the volume threshold of data that makes it Big Data. Is it 100 GB? A Peta Byte? What are the on-hand database management tools? Are they referring to Relational Database Systems from Oracle and IBM etc? Most likely but not answered in this definition.

Then I turned to OReilly Media who everyone told me that they are the ones who made Big Data popular. As per OReilly media:

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesnt fit the structures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

To some extent, Wikipedia and OReillys definitions are similar in that both refer to processing capacity and conventional database systems but OReilly media adds a new twist by mentioning too big and moves fast. Hmm, animals like elephants and cheetahs started running through my head.

My next stop was Doug Laney from Gartner who was credited with the 3 Vs of Big Data. Gartners Big Data is:

High-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

Gartner is referring to the size of data (large volume), speed with which the data is being generated (velocity), and the different types of data (variety) and this seemed to align with the combined definition of Wikipedia and OReilly media.

I thought I was getting somewhere.

Not so fast said Mike Gualtieri of Forrester, who said that the 3 Vs mentioned by Gartner are just measures of data and Mike insisted that Forresters definition is more actionable. And that definition is:

Big Data is the frontier of a firms ability to store, process, and access (SPA) all the data it needs to operate effectively, make decisions, reduce risks, and serve customers.

Let us try to digest this together. Forrester seems to be saying that any data that is beyond the current reach (i.e. frontier) of that firm to store (i.e. large volumes of data), process (i.e. needs innovative processing), and access (new ways of accessing that data) is the Big Data. So the question is: What is the frontier? Who defines the frontier?

I kept searching for those answers. I looked at McKinseys definition:


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.

Well, similar to all the above but still not specific for me to decide when the data becomes Big Data.

Then I came across this article from University of Wisconsin which gave some specificity to the Volume. The article said some have defined big data as an amount that exceeds a petabyte – one million gigabytes. Thank you Wisconsin for clearing that up. But the same article mentioned the numerous Vs that are being added to the 3 Vs that Gartner originally came up with.

IBM added Veracity referring to the quality of data. See the picture below with 4 Vs.

big data infographic

And other Vs kept getting added to Big Data definition by other people

Variability is referring to the changing nature of data.

Visualization refers to the art of turning data science into visual stories via graphs and other charts to transform data into information, into insight, into knowledge etc.

Value refers to the fact that businesses need to take advantage of all this data into some valuable decisions.

So what did I learn?

Even though there is no single definition for Big Data that is universally accepted, there are some common concepts that almost all seem to converge on. And they are:

  • Big Data is data that is of large volume (> 1 Petabytes)
  • Big Data is data that is not a single type i.e. structured and a variety of structured, unstructured etc.
  • Big Data is data that is being generated at a much faster rate than data in the past from all kinds of sources including social media.
  • Big Data is data that requires newer ways to store, process, analyze, visualize, and integrate.

Hope this article was helpful. If it was, please Like it here, Share it with your network, and feel free to Follow me. If you want to stay in touch, please Connect here on LinkedIn or follow on twitter @rkdontha1.

So what other Big Data questions you have but are afraid to ask? Go ahead and comment below OR send me a note. Look forward to hearing from you!!!

Sources (Please click on the link for article):

Wikipedia

O’Reilly Media:

Gartner’s Doug Laney:

Forrester’s Mike Gualtieri

McKinsey definition and University of Wisconsin article

Categories: Big Data
Tags: Big Data, questions, variety, velocity, volume

About Ramesh Dontha

Ramesh Dontha is Managing Partner at Digital Transformation Pro, a management consulting and training organization focusing on Big Data, Data Strategy, Data Analytics, Data Governance/Quality and related Data management practices. For more than 15 years, Ramesh has put together successful strategies and implementation plans to meet/exceed business objectives and deliver business value. His personal passion is to demystify the intricacies of data related technologies and latest technology trends and make them applicable to business strategies and objectives. Ramesh can either be reached on LinkedIn or Twitter (@rkdontha1) or via email: rkdontha AT DigitalTransformationPro.com

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post
Host your website with Managed WordPress for $1.00/mo with GoDaddy!

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app Apple application Artificial Intelligence BI Big Data business CEO China Cloud Companies company content costs court crypto customers Data digital future Google+ government industry information machine learning market mobile Musk news Other public research revenue sales security share social social media strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics app Apple application Artificial Intelligence BI Big Data business CEO China Cloud Companies company content costs court crypto customers Data digital future Google+ government industry information machine learning market mobile Musk news Other public research revenue sales security share social social media strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!