• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Analyzing Big Data Will Require More Statistics

Randy Bartlett / 3 min read.
March 18, 2015
Datafloq AI Score
×

Datafloq AI Score: 55.67

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/sUcFd

I keep saying that the sexy job in the next 10 years will be statisticians.

Hal Varian, Ph.D., Chief Economist at Google

Big Data (Volume, Velocity, & Variety) represents a paradigm shift in approach and methodology for certain applications, yet not in statistical thinking or statistical assumptions. These, we will need in abundance. First, we will expose the growing myth that Big Data implies complete information. At its best it addresses information gaps; otherwise, Big Data can be redundant. Second, we will clarify the value proposition of statistical tools in making inferences from high-volume data. For many business applications, the foremost challenge with Big Data is reducing it to an analyzable size using statistical techniques that retain the information.

Our global community has a low statistical literacy, much lower than its mathematics literacy. Among the business press and Big Data vendors, there are many who are portraying Big Data as complete information. WRONG. This embodies the resurgence of an ancient data myth: larger datasets and censuses always provide more accurate and reliable results than smaller (statistical) samples. WRONG. Congress is still struggling with this one, every ten years, as they keep asking why it is necessary to augment the U.S. Census with a sample.

Solving problems with complete information is appealing to those who want to think only in a deterministic manner and make definitive proclamations. This mindset foregoes accepting uncertainty with the numbers and the benefits associated with statistics. However, there are four common sources of uncertainty with the numbers. The first comes from using one group of data, such as the past, to infer about another group, such as the future. A second source of uncertainty comes from missing observations and a third source comes from measuring the observations (measurement error). The fourth source of uncertainty occurs when we lack a variable we need and must make do with a surrogate variable(s). These surrogate variables do not contain the same information, creating error. Hence, high volume does not complete the information, so we can not bypass statistics.

The value proposition of statistics applied to Big Data is extensive, going well beyond the above three statistics problems. To clarify how data analysis will always involve statistical thinking, statistical techniques, and statistical assumptions, let us take a closer look. For all data, quants apply three tool boxes: mathematics, statistics, and algorithms. Mathematical tools, which are coated in logic and wrapped around algorithms, address complete numbers. Statistical tools, coated in logic and mathematics and wrapped around algorithms, address incomplete information. The coatings provide rigor. For complete numbers, we can deduce; for incomplete information, we must infer. Algorithmic tools (logic, heuristics, and optimization) work in both the complete and incomplete domains, yet they work differently. The three tool boxes have strong interdependencies and we refer to quants as those professionals, who employ all three.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

We do not want unrefined high volume data! We want the decision-affecting information that it contains. After reducing the data, we can employ proven mathematical, statistical, and algorithmic tools. Furthermore, the analytics problem is only a part of the business problem to be solved within certain constraints: Timeliness, Client Expectation, Accuracy, Reliability, and Cost. The bulk of Big Data can, in fact, turn out to be an impedimentand this grows more apparent as the analytics problems become more complex. Reducing the variables and observations, while retaining the wanted information, saves Time, improves Accuracy and Reliability, and lowers Costs.

Conclusion

It is going to take quants to deliver the real promises of Big Data. An understanding of statistics is necessary to access how to analyze Big Data and to properly lead and organize the analytics resources handling it. As Deming said, The nonstatistician cannot always recognize a statistical problem when he sees one. We should expect depictions of Big Data, which are void of an understanding of statistics.

Even if we can get by without addressing statistical errors or reducing the data, we still need statistical thinking, statistical assumptions, and statistical techniques. We need to combine our knowledge about the business problem with a mastery of techniques from all three tool boxes.

Our generation of quants spent our time wanting more data. The next generation will fully experience wanting less.

We sure could use Deming, right now.

Categories: Big Data
Tags: big data analytics, big data quality, decision-makers, insights, statistical, statistics

About Randy Bartlett

Randy Bartlett, Ph.D. CAP PSTAT is a statistician/statistical data scientist with 20+ years of practice experience analyzing and reviewing data analysis; and leading business analytics teams. He is currently a Business Analytics Leader at Blue Sigma Analytics. He provides services for everything from strategic consulting for business analytics to data analysis and data management. His services are reflected by his book, workshops, and presentations.

He designed 'A Practitioner's Guide to Business Analytics' (McGraw-Hill, 2013) (https://tinyurl.com/jx8rcru) to be the foremost reference on how corporations can better implement business analytics and in this era of Big Data and the Internet of Things. He discusses strategic topics, including culture, organization, planning, and leadership for business analytics, in Chapters 1-6 and in Day I of his workshop. For tactics, he discusses Statistical Qualifications, Diagnostics, and Review; and Data Collection, Software, and Management in Chapters 7-12 and during Day II of his workshop. He previously contributed to the Encyclopedia for Research Design and writes blogs (including a series on Statistical Denial), case studies (AIG, AstraZeneca, big pharma, Google Flu Trends, et al.), and articles (two in Analytics Magazine).

Specialties: Leading quants; addressing Big Data; making and supporting analytics-based decisions; performing statistical review; evaluating datasets and software needs;and re-organizing analytics teams.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post
Host your website with Managed WordPress for $1.00/mo with GoDaddy!

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app Apple application Artificial Intelligence BI Big Data business CEO China Cloud Companies company content costs court crypto customers Data digital future Google+ government industry information machine learning market mobile Musk news Other public research revenue sales security share social social media strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics app Apple application Artificial Intelligence BI Big Data business CEO China Cloud Companies company content costs court crypto customers Data digital future Google+ government industry information machine learning market mobile Musk news Other public research revenue sales security share social social media strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!