• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Data Quality You’re Measuring It Wrong

Barr Moses / 4 min read.
July 17, 2020
Datafloq AI Score
×

Datafloq AI Score: 81.67

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/FGamC

This article was first posted on Towards Data Science.

One of our customers recently posed this question:

I would like to set up an OKR for ourselves [the data team] around data availability. I’d like to establish a single KPI that would summarize availability, freshness, quality.

What’s the best way to do this?

I can’t tell you how much joy this request brought me. As someone who is obsessed with data availability yeah, you read that right: instead of sheep, I dream about null values and data freshness these days this is a dream come true.

Why does this matter?

If you’re in data, you’re either currently working on a data quality project or you just wrapped one up. It’s the law of bad data there’s always more of it.

Traditional methods of measuring data quality are often time and resource-intensive, spanning several variables, from accuracy (a no-brainer) and completeness, to validity and timeliness (in data, there’s no such thing as being fashionably late). But the good news is there’s a better way to approach data quality.

Data downtime periods of time when your data is partial, erroneous, missing, or otherwise inaccurate is an important measurement for any company striving to be data-driven. It might sound clich’ , but it’s true we work hard to collect, track, and use data, but so often we have no idea if the data is actually accurate. In fact, companies frequently end up having excellent data pipelines, but terrible data. So what’s all this hard work to set up a fancy data architecture worth if at the end of the day, we can’t actually use the data?

By measuring data downtime this simple formula will help you determine the reliability of your data, giving you the confidence necessary to use it or lose it.

So you want a KPI for it?

Overall, data downtime is a function of:


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

  • Number of data incidents (N) This factor is not always in your control given that you rely on data sources external to your team, but it’s certainly a driver of data uptime.
  • Time-to-detection (TTD) In the event of an incident, how quickly are you alerted? In extreme cases, this quantity can be measured in months if you don’t have the proper methods for detection in place. Silent errors made by bad data can result in costly decisions, with repercussions for both your company and your customers.
  • Time-to-resolution (TTR) Following a known incident, how quickly were you able to resolve it?

By this method, a data incident refers to a case where a data product (e.g., a Looker report) is incorrect, which could be a result of a number of root causes, including:

  • All/parts of the data are not sufficiently up-to-date
  • All/parts of the data are missing/duplicated
  • Certain fields are missing/incorrect

Here are some examples of things that are not a data incident:

  • A planned schema change that does not break any downstream data
  • A table that stops updating as a result of an intentional change to the data system (deprecation)

Bringing this all together, I’d propose the right KPI for data downtime is:

Data downtime = Number of data incidents x

(Time-to-Detection + Time-to-Resolution)

(If you want to take this KPI a step further, you could also categorize incidents by severity and weight uptime by level of severity, but for simplicity’s sake, we’ll save that for a later post.)

With the right combination of automation, advanced detection, and seamless resolution, you can minimize data downtime by reducing TTD and TTR. There are even ways to reduce N, which we’ll discuss in future posts (spoiler: it’s about getting the right visibility to prevent data incidents in the first place).

Measuring data downtime is the first step in understanding its quality, and from there, ensuring its reliability. With fancy algorithms and business metrics flying all over the place, it’s easy to overcomplicate how we measure this. Sometimes, the simplest way is the best way.

If you want to learn more, reach out to Barr Moses.

Categories: Big Data, Strategy
Tags: bad data, Big Data, big data accountability, Data analytics, data quality

About Barr Moses

CEO and Co-Founder of Monte Carlo Data. Lover of data observability and action movies. #datadowntime

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital engineer environment experience future Google+ government information learning machine learning market mobile Musk news public research security share skills social social media software strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital engineer environment experience future Google+ government information learning machine learning market mobile Musk news public research security share skills social social media software strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!