• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Data science and Big Data: Definitions and Common Myths

Francesco Corea / 5 min read.
October 28, 2016
Datafloq AI Score
×

Datafloq AI Score: 84

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/mOyEH

Big data is nowadays one of the most common buzzwords you might have heard of. There are many ways to define what big data is, and this is why probably it still remains a really difficult concept to grasp.

Someone describes big data as dataset bigger than a certain threshold, e.g., over a terabyte (Driscoll, 2010), while others look at big data as dataset that crashes conventional analytical tools like Microsoft Excel. More renowned works though identified big data as data that display features of Variety,Velocity, and Volume (Laney, 2001; McAfee and Brynjolfsson, 2012; IBM, 2013; Marr, 2015). And all of them are somehow true, although I think incomplete.

The first class of definitions is indeed partial, since it is related to a pure technological issue, i.e., the computational need overcomes the available analytical power of a single tool or machine. This would not explain however why big data came out few years ago and not back in the Nineties.

The second opinion is instead too constraining, since it assumes that all the features have to be satisfied to talk about big data, and it also seems to identify the causes that originated big data (i.e., a huge amount of fast and diverse new data sources), rather than its characterization.

There are also many other definitions that could be used (Dumbill, 2013; De Mauro et al., 2015), but my personal definition is the following: data science is an innovative approach that consists of different new technologies and processes to extract worthy insights from low-value data that do not fit, for any reason, the conventional database systems (i.e., big data).

Data are quickly becoming a new form of capital, a different coin, and an innovative source of value. It is extremely important to learn how to channel the power of big data into an efficient strategy to manage and grow a business. A well-set data strategy is becoming fundamental to every business, regardless the actual size of the datasets used. However, in order to establish a data framework that works, there are a few misconceptions that need to be clarified:

i) More data means higher accuracy

Not all data are good quality data, and tainting a dataset with dirty data could compromise the final products. It is similar to a blood transfusion: if a non-compatible blood type is used, the outcome can be catastrophic for the whole body. Secondly, there is always the risk of overfitting data to the model, yet not derive any further insight if you torture the data enough, nature will always confess (Coase, 2012). In all applications of big data, you want to avoid striving for perfection: too many variables increase the complexity of the model without necessarily increasing accuracy or efficiency. More data always implies higher costs and not necessarily higher accuracy. Costs include: higher maintenance costs both for the physical storage and for model retention; greater difficulties in calling the shots and interpreting the results; more burdensome data collection and time-opportunity costs. Undoubtedly the data used do not have to be conventional or used in a standard way and this is where the real gain is locked in and they may challenge the general wisdom, but they have to be proven and validated. In summary, smart data strategies always start from analyzing internal datasets, before integrating them with public or external sources. Do not store and process data just for the sake of having them, because with the amount of data being generated daily, the noise increases faster than the signal (Silver, 2013). Paretos 80/20 rule applies: the 80% of the phenomenon could be probably explained by the 20% of the data owned.

ii) If you want to do big data, you have to start big

A good practice before investing heavily in technology and infrastructures for big data is to start with few high-value problems that validate whether big data may be of any value to your organization. Once the proof of concept demonstrates the impact of big data, the process can be scaled up.

iii) Data equals Objectivity

The interpretation of data is the quintessence of its value to business. Ultimately, different types of data could provide different insights to different observers due to relative problem frameworks or interpretation abilities (i.e., the framing effect).

Lets also not forget that people are affected by a wide range of behavioral biases that may invalidate the objectivity of the analysis. The most common ones between both scientists and managers are: apophenia (finding patterns where there are no patterns at all), narrative fallacy (the need to fit pattern to series of disconnected facts), confirmation bias (the tendency to use only information that confirms some priors – and the corollary according to which the search for evidence will eventually end up with evidence discovery), and selection bias (the propensity to use always some type of data, possibly those that are best known). A final interesting big data curse to be pointed out is nowadays becoming known as Hathaways effect: it appeared that when the famous actress appeared positively in the news, stock prices in Warren Buffetts Berkshire Hathaway company increased. This suggests that sometimes there exist correlations that are either spurious or completely meaningless and groundless.

iv) Your data will reveal you all the truth

Data on its own are meaningless, if you do not pose the right questions first. Readapting what DeepThought says in The Hitchhikers Guide to the Galaxy, big data can provide the final answer to life, the universe, and everything, as soon as the right question is asked. This is where human judgment comes in: posing the right question and interpreting the results are still competences of the human brain.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

References:

Coase, R. H. (2012). Essays on economics and economists. University of Chicago Press.

De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. AIP Conference Proceedings, 1644, 97104.

Driscoll, M. E. (2010). How much data is big data? [Msg 2]. Message posted to. Retrieved from https://www.quora.com/How-much-data-is-Big-Data.

Dumbill, E. (2013). Making sense of big data. Big Data, 1(1), 12.

IBM. (2013). The Four Vs of Big Data. Retrieved fromhttps://www.ibmbigdatahub.com/infograp hic/four-vs-big-data.

Laney, D. (2001). 3D Data Management: Controlling Data Volume, Velocity, and Variety.

Marr, B. (2015). Big data: using smart big data, analytics and metrics to make better decisions and improve performance, (p. 256). Wiley.

McAfee, A., & Brynjolfsson, E. (2012). Big data: the management revolution.Harvard Business Review, 90(10), 660.

Silver, N. (2013). The Signal and the Noise: The Art and Science of Prediction. Penguin.

Categories: Big Data
Tags: Big Data, big data strategy

About Francesco Corea

Editor at Cyber Tales. Complexity scientist and data strategist, Francesco is a strong supporter of an interdisciplinary research approach, and he wants to foster the interaction of different sciences in order to bring to light hidden connections. He is a former Anthemis Fellow, IPAM Fellow, and a PhD graduate at LUISS University. His topics of interests are big data and AI, and he focuses on fintech, medtech, and energy verticals.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!