• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Are Data Mining and Machine Learning Distinct from Statistics?

Randy Bartlett / 5 min read.
July 24, 2015
Datafloq AI Score
×

Datafloq AI Score: 54.67

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/mdxG5

“I almost feel that folks in data science [excluding statisticians] are suddenly realizing that this kind of work is not new and are desperately looking for ways to justify a distinction.”  Thomas Speidel

[Machine Learning is simply a] loose confederation of themes in statistical inference (and decision-making)  Michael Jordan

“Without a grounding in statistics, a Data Scientist is a Data Lab Assistant.”   Martyn Jones

Myth #3: Data mining, machine learning, Big data analysis, business analytics, and data science are distinct from statistics

Repackaging statistics with complementary fields has the potential to create new synergies.  E.g., econometrics is the marriage of economics and statistics.  This repackaging has been extremely successful; surpassing Six Sigma’s mixed results as we discussed in Blog 2.  Econometrics has embraced statistics.  Applied econometricians have helped develop best practice and some identify as applied statisticians.  They are with the science. 

Straddling Distinct Applications

The interests of the ‘promotional industrial complex’ are to sell things, things like books, magazines, conferences, workshops, new degree programs, software, advertisement space, and newly anointed ‘experts.’  New things sell better.  Promotional interests are not married to protecting the integrity of statistics or best practice. 

The IT part of the promotional industrial complex has started using the terms Data Mining, Machine Learning, and Data Science to include both data analysis AND data management.  In the field, we are problem oriented and not tool oriented.  This makes repackaging data analysis with data management comparable to packaging addition problems with sorting problems and calling it ‘Add-Sort Science.’ 

In some circles, the point of this repackaging is less about finding synergies and more about expanding IT, giving IT more missions.  The next unbelievably bad idea, being shopped around, is that data analysis is somehow a data management problem?!  If it were, then we should be better at applying statistics to reporting and data collection. 

Data analysis and data management are distinct applications, separated by differences in culture, software, objectives, and thinking.  Data management emphasizes efficiency in storing and accessing data, and statistics is about extracting information in the presence of uncertainty. 

Any repackaging of data analysis with data management that removes statistics expertise from the data analysis is a bad idea.  However, bad ideas can happen, even linger.  A popular trend in the 1960s was for corporations to merge into conglomerations … which provided no synergies, made no economic sense.  Without the mergers, shareholders could invest in each company separately and realize the same return.  Even so, these conglomerates continued for a decade and these types of mergers still happen.  The merger of data analysis with data management does not have to make sense, and it can last a long time without making sense. 


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

In the case of Machine Learning, academia emphasizes how this set of tools work; their facility for iterative learning.  In the field, we have a problem-based view.  Any Machine Learning tool that solves statistics problems will necessarily make statistics assumptions and require statistical thinking.  These tools are part of statistics or Statistical Machine Learning.  We provide a clarifying problem-based view of statistics in the May/June 2015 issue of Analytics Magazine, https://goo.gl/Wod3gk. 

The Venn Diagram in Figure 1 illustrates two areas of application for applying Machine Learning: data analysis and data management.  From an applied perspective, there is no overlap.    

Two applications machine learning

The same relationship holds for Data Mining and Data Science.  In the field we should be problem-based and that means splitting these problems:

  • Machine Learning = Statistical ML + IT ML,
  • Data Mining = Statistical DM + IT MD, and
  • Data Science = Statistical DS + IT DS.   

What Is Wanted

We want to keep the statistics expertise on the data analysis.  This means embracing specialization, even if this does not help the promotional industrial complex to sell things.  Large corporations need separate teams for data analysis and data management.  Business managers should be playing chess, not checkers. 

Consumers of data analysis should look to statistical certifications, like the PSTAT, to ensure that the Statistical Qualifications are brought to bear. 

Close

There is a flood of statistical malfeasance on its way.  Wise consumers of data analysis want to avoid removing statistics expertise from their data analysis. 

Repackaging data analysis/statistics with data management/IT will not provide further synergies in the field.  It will just sell things. 

In the field, we are problems based.  We want to split Data Science, Data Mining, and Machine Learning to match our business problems: Statistical DS, DM, & ML and IT DS, DM, & ML. 

We sure could use Deming, right now.  Many of us, who consume or produce data analysis, hang out in the new LinkedIn group: About Data Analysis.  Come see us.


The entire Statistical Denial series can be found on Datafloq.  

  • Statistical Denial 1, Blog 1: Essays On Statistics Denial
  • Statistical Denial 2: Statistics Debacles & The Coming Flood Of Statistical Malfeasance
  • Statistical Denial 3: Applied Statistics Is A Way Of Thinking, Not Just A Toolbox
  • Statistical Denial 4: Five Forces Pushing Statistics Expertise Out of Data Analysis
  • Statistical Denial 5, Myth 1: Traditional Techniques Straw Man
  • Statistical Denial 6, Myth 2: Why Statisticians Not Only Practice Within Traditional Statistics
  • Statistical Denial 7, Myth 3: Are Data Mining and Machine Learning Distinct from Statistics?
  • Statistical Denial 8, Myth 4: Why Prediction Is / Is Not Part of Statistics
  • Statistical Denial 9, Myth 5 and 6: Why Statistical Significance Does Work For Big Data
  • Statistical Denial 10, Myths 7, 8 and 9: 3 Statistics Denial Myths on the Volume of Big Data
  • Statistical Denial 11, Myths 10 and 11: Debunking the Myth that None of Statistics Works for Big Data
  • Statistical Denial 12, Myth 12: Publications Straw Man
  • Statistical Denial 13, Myths 13 and 14: Minimizing The Profession
  •  

Categories: Artificial Intelligence
Tags: applications, Big Data, big data strategy, data mining, machine learning, myths, statistics

About Randy Bartlett

Randy Bartlett, Ph.D. CAP PSTAT is a statistician/statistical data scientist with 20+ years of practice experience analyzing and reviewing data analysis; and leading business analytics teams. He is currently a Business Analytics Leader at Blue Sigma Analytics. He provides services for everything from strategic consulting for business analytics to data analysis and data management. His services are reflected by his book, workshops, and presentations.

He designed 'A Practitioner's Guide to Business Analytics' (McGraw-Hill, 2013) (https://tinyurl.com/jx8rcru) to be the foremost reference on how corporations can better implement business analytics and in this era of Big Data and the Internet of Things. He discusses strategic topics, including culture, organization, planning, and leadership for business analytics, in Chapters 1-6 and in Day I of his workshop. For tactics, he discusses Statistical Qualifications, Diagnostics, and Review; and Data Collection, Software, and Management in Chapters 7-12 and during Day II of his workshop. He previously contributed to the Encyclopedia for Research Design and writes blogs (including a series on Statistical Denial), case studies (AIG, AstraZeneca, big pharma, Google Flu Trends, et al.), and articles (two in Analytics Magazine).

Specialties: Leading quants; addressing Big Data; making and supporting analytics-based decisions; performing statistical review; evaluating datasets and software needs;and re-organizing analytics teams.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

Top 6 Cybersecurity Certification Programs in 2023

March 22, 2023 By Lucia Adams

Why Blockchain Is The Missing Piece To IoT Security Puzzle

March 21, 2023 By johnwillium975

How data and modern machine learning can help TSA keep us safe

March 20, 2023 By fahmidkabir737

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Social Science Approaches to the Study of Chinese Society Part 2
  • Business Innovation and Digital Disruption
  • Using Prometheus for Monitoring on Google Cloud: Qwik Start
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • Microsoft Power BI -The Future of Healthcare’s Most Important Breakthrough
  • The Big Crunch of 2025: Is Your Data Safe from Quantum Computing?
  • From Data to Reality: Leveraging the Metaverse for Business Growth
  • How BlaBlaCar Built a Practical Data Mesh to Support Self-Service Analytics at Scale
  • How Blockchain Technology Can Enhance Fintech dApp Development

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!