• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

3 Statistics Denial Myths on the Volume of Big Data

Randy Bartlett / 5 min read.
September 9, 2015
Datafloq AI Score
×

Datafloq AI Score: 59

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/mF0bF

“Raw data is a viscous fluid containing information“

Myth #7: Big Data Volume (Or Large N) contains complete information

Myth #8: Big Data Volume (Or Large N) speaks for itself

Myth #9: Big Data Volume (Or Large N) replaces sampling and other statisticsso much information

The first misunderstanding about Big Data is that there is a data rush. To be more acccurate, we are in an information rush offering mind-boggling potential. To benefit the most, we need to filter the misleading promotional hype regarding something that has been around since astronomers started mapping the universe, Big Data.

We will define Big Data thusly, ‘Big Data is reached at the edge of our capabilities to manage (IT) or analyze (statistics) the Volume, Velocity, and Variety of the data.’ At this point, one of the Vs becomes part of the problem. Other experts, such as Diego Kuonen, include a fourth V, Veracity. The information inside the data has its own Vs and this information is what matters. For this blog, we will debunk three myths in the context of the Big Data volume (or large N) aspect of Big Data.

Myth: Big Data Contains Complete Information

According to promotional hype, Large N/Big Data Volume contains complete information not so. Many seek the rapture of making definitive proclamations in a deterministic universe. However, uncertainty complicates matters.

Analyzable data has two dimensions: variables and observations. Increasing the number of variables or observations increases the space for strong information. It does not cause complete information or even more information. Even if we have all of the variables and all of the observations, we can expect uncertainty with the numbers.

Four common sources of uncertainty: inferential, missing values, measurement error, and surrogate variables, are explained in the May/June 2015 issue of Analytics Magazine, https://goo.gl/Wod3gk.

The value of Big Data is its information content, which will not be complete.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Myth: Big Data Speaks For Itself

Promotional hype has proclaimed that Large N (Big Data: Volume) allows data to speak for itself, without the intermediation of a priori assumptions. The hype is that bigness facilitates self-explanatory visualizations, which require no interpretive interventions by experts, whose views, it is claimed, are biased by pre-conceived or overly theoretical notions about how the world works. In fact, we have seen many examples of these visualizations, and they are usually more biased, confusing, misleading, and uninformative than graphs generated in times of old, i.e., BBDH (Before Big Data Hype).

All data requires interpretation, domain knowledge, and an understanding of the underlying assumptions. The value of Big Data is its information content, which does not speak for itself.

Myth: Big Data Replaces Statistics

During the height of the Big Data hype, some statistics deniers gleefully proclaimed that the advent of Big Data spelled the end of statistics and statisticians, and by implication everyone using statistics to analyze dataeconometricians, sociologists, physicists, and other quants. This ‘Amish view’ is that we no longer need statistics because we have Large N/Big Data … and we have them, these ‘Amish visionaries,’ with no expertise in statistics.

We rely upon statistics to address uncertainty with the numbers. More data may contain more information, yet this will not resolve the uncertainty. The first step in analyzing Large N/Big Data is to use statistics (science) to reduce the data without losing information. This is a good trick and it facilitates the next step, using statistics to extract information from the dataas always.

The value of Big Data is its information content, which requires statistics to extract it.

Close

We are in an information rush that has the potential to accelerate almost every aspect of the human endeavor. To benefit the most, we need to filter the misleading promotional hype regarding Big Data.

The value of Big Data is its information content, which will not be complete; does not speak for itself; and requires statistics to extract it. Statistics addresses the uncertainty in all data.

We sure could use Deming, right now. Many of us who embrace the explicit rigorous logic and protocols of these tenets of data analysis hang out in the new LinkedIn group, About Data Analysis. Come see us.


The entire Statistical Denial series can be found on Datafloq.

  • Statistical Denial 1, Blog 1: Essays On Statistics Denial
  • Statistical Denial 2: Statistics Debacles & The Coming Flood Of Statistical Malfeasance
  • Statistical Denial 3: Applied Statistics Is A Way Of Thinking, Not Just A Toolbox
  • Statistical Denial 4: Five Forces Pushing Statistics Expertise Out of Data Analysis
  • Statistical Denial 5, Myth 1: Traditional Techniques Straw Man
  • Statistical Denial 6, Myth 2: Why Statisticians Not Only Practice Within Traditional Statistics
  • Statistical Denial 7, Myth 3: Are Data Mining and Machine Learning Distinct from Statistics?
  • Statistical Denial 8, Myth 4: Why Prediction Is / Is Not Part of Statistics
  • Statistical Denial 9, Myth 5 and 6: Why Statistical Significance Does Work For Big Data
  • Statistical Denial 10, Myths 7, 8 and 9: 3 Statistics Denial Myths on the Volume of Big Data
  • Statistical Denial 11, Myths 10 and 11: Debunking the Myth that None of Statistics Works for Big Data
  • Statistical Denial 12, Myth 12: Publications Straw Man
  • Statistical Denial 13, Myths 13 and 14: Minimizing The Profession

Categories: Technical
Tags: analytics, Big Data, big data analytics, big data news, big data strategy, myths, statistics

About Randy Bartlett

Randy Bartlett, Ph.D. CAP PSTAT is a statistician/statistical data scientist with 20+ years of practice experience analyzing and reviewing data analysis; and leading business analytics teams. He is currently a Business Analytics Leader at Blue Sigma Analytics. He provides services for everything from strategic consulting for business analytics to data analysis and data management. His services are reflected by his book, workshops, and presentations.

He designed 'A Practitioner's Guide to Business Analytics' (McGraw-Hill, 2013) (https://tinyurl.com/jx8rcru) to be the foremost reference on how corporations can better implement business analytics and in this era of Big Data and the Internet of Things. He discusses strategic topics, including culture, organization, planning, and leadership for business analytics, in Chapters 1-6 and in Day I of his workshop. For tactics, he discusses Statistical Qualifications, Diagnostics, and Review; and Data Collection, Software, and Management in Chapters 7-12 and during Day II of his workshop. He previously contributed to the Encyclopedia for Research Design and writes blogs (including a series on Statistical Denial), case studies (AIG, AstraZeneca, big pharma, Google Flu Trends, et al.), and articles (two in Analytics Magazine).

Specialties: Leading quants; addressing Big Data; making and supporting analytics-based decisions; performing statistical review; evaluating datasets and software needs;and re-organizing analytics teams.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!