• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Data integrity – A concern of data-driven organizations

Javeria Gauhar / 7 min read.
January 15, 2022
Datafloq AI Score
×

Datafloq AI Score: 84.67

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/4Xyyv

In the InsideView Alignment Report 2020, more than 70% of revenue leaders rank data management as the highest priority. Although, many organizationshave implemented a system for data collection and analysis, still, their biggest concern remains to maintain the integrity of their data.

The term ‘data integrity’ is sometimes used as a process or a state of data. Either way, it refers to data being accurate, valid, and consistent, across all data sources.

In layman’s terms, data integrity refers to the data that your team can trust, feel assured that is protected, and use for whatever purpose they want, without worrying about data quality.

These aspects are extremely important, especially for data analysts that integrate and bring data together data from multiple sources to derive useful insights and to retain customers.

Types of data integrity

Data integrity has various aspects, but at a high-level, it can be divided into two types: physical and logical. Both of these types define a number of methods and constraints that enforce integrity in datasets.

Physical data integrity

Physical data integrity relates to protecting data against external or physical calamities, such as power outages, natural disasters, hackers, etc. These problems make it impossible for users to access data from the database, and are usually triggered by human errors, storage reductions, security breaches, malware, etc.

Logical data integrity

Logical data integrity relates to how the data is stored and modelled within the database, and the logical constraints implemented to keep the data accurate, valid, and consistent across multiple sources.

Logical data integrity is further divided into four types:

Entity integrity

Entity integrity means uniquely identifying each entity in your database. This helps to avoid duplicate records since every new record must have a unique identifier. These identifiers also called primary keys in relational databases cannot be null and is usually referenced in other datasets to prevent data duplication. For example, in a customer database, SSN can be used as the unique identifier that ensures the entity integrity of the dataset.

In the absence of uniquely identifying attributes, complex data matching and fuzzy matching algorithms are required to match data accurately and find out which records belong to the same entity.

Referential integrity

Referential integrity refers to the presence of foreign keys in a relational database. Foreign keys are created to refer to an existing entity in another table. Relating records in this way avoids creating duplicate record entries, and using information from an existing schema table. For example, an employee database can have employee information in one table, and job role information in another. And foreign keys are used to relate/mention a job role in the employee information table.

An important thing to note here is that a table has a unique, not-null list of primary keys, but multiple unique records can share the same foreign key (as multiple people can have the same job role in the organization).

Domain integrity

Domain integrity means correct (domain-specific) values have been used in each column of the database. For example, in an employee database where address information is added, the column Country can have a list of possible values, and any value that does not fall in that list is incorrect and must be updated and implemented in a standard format(can be done by address standardization).

User-defined integrity

When users define their own custom rules or constraints on a column, it is termed as user-defined integrity. For example, if a user defines that the lead source for prospects database can be: Google Adwords, Website, or Cold Call, then a value outside of these three will be invalid.

Threats to data integrity

The terms data integrity and data security are used interchangeably but they are not the same. Data security measures are performed to attain data integrity. Moreover, maintaining data integrity is a complex task and data security is one way to achieve data integrity, but there are more, since a number of things pose as threats to data integrity. A few of them are highlighted below:

Human error

Every 400 per 10,000 entries in a database are inaccurate due to human error. This is a significant number, and the implementation of unique identifiers, integrity constraints, and other validation checks are all overridden due to human mistakes.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Inconsistencies in data formats

Without proper data formats and types defined, data values within the same column are stored using a different pattern and format, which leads to inconsistencies in the database. To prevent such inconsistencies, it is important to define validation patterns and the correct data types.

Integration error

While integrating data from multiple sources, data integrity is usually compromised. The reason being the difference in data structure, validation checks, and integrity constraints across each source. One data can be saving Phone Number as char data type with max char limit of 15, while the other is saving it as number data type, with max char limit of 13.

Internal privacy breaches

This usually happens when your data lands in the wrong hands, either an employee who is misusing the company’s data repository, or hackers trying to break through your firewall to get to the data. In any case, securing the data from such privacy breaches is an important task.

Signs of data integrity

To understand whether your data has integrity, you need to look for the following signs:

Accessibility

Is your data present at the right place and is accessible whenever needed at the right time? If there is no proper or easy access to your data, then your data might be at the risk of losing its integrity. Faster and optimized retrieval of data from database is a key sign that the data’s integrity is being maintained.

Validity

Do values of a column in your dataset have the same data type and format? Valid data is easily seen by noticing how many values in your database do not conform to appropriate validation checks, for example, a creation date for a record having a value 21.21.21. The day and year 21 make sense, but the month is invalid.

Completeness

Does your database contain a lot of null values? In case your dataset does not have the record of certain values, then it’s better to choose a generic non-null term (such as Not provided or N/A), rather than leaving the column values empty. This will help you to understand whether the values are missing or incomplete, or were deliberately left empty.

Uniqueness

Do your dataset records uniquely identify entities? This is seen by assessing whether all records in the database reflect a unique identity and one entity information does not span multiple records. In case your dataset contains duplicates, you will need to employ data matching algorithms to identify which records belong to the same entity. In case duplicates are non-exact, you may require a combination of fuzzy matching algorithms to compute match confidence levels, and make a decision.

Scenarios where data integrity becomes crucial

Now that we have covered the basics of data integrity, let’s discuss the real-world scenarios where data integrity plays a significant role.

Masking personally identifiable information

A common practice to hide personally identifiable information is to mask actual data with dummy data. This process is extensively used in healthcare and other government institutes to protect individual privacy. If data integrity is not maintained across the dataset, it could be very difficult to attain the actual data back from the dummy data, since the original data was inaccurate.

Ensuring compliance with data standards

Compliance standards, such as HIPPA, GDPR, etc. state the importance of data integrity. For example, GDPR Article 5(1) states that personal data should be:

Accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay;

This clearly depicts how crucial it is to maintain data integrity in your database for complying to necessary standards.

Driving business intelligence

Reliable data insights are the biggest benefit of capturing data and maintaining it clean, standardised. Data analysts spend 80% of their time managing data and correcting data mistakes and only 20% of the time in actually analyzing and deriving insights from it. Organizations are increasingly employing systems that process their data and give descriptive insights, but still finding it difficult to trust these insights. The reason goes back to how the data is being captured, structured, and related across tables in the database.

Conclusion: data integrity produces reliable insights

In this article, we covered basic and advanced aspects of data integrity and mentioned a few scenarios where data integrity becomes crucial. Although sustaining the integrity of your dataseems like a resource- and time-intensive initiative, it saves you time in the long run, as your data-driven insights become more reliable, accurate, and actionable.

Categories: Big Data
Tags: Big Data, data-driven

About Javeria Gauhar

Javeria Gauhar, an experienced B2B/SaaS writer specializing in writing for Data Ladder. She is also a programmer with 2 years of experience in developing, testing, and maintaining enterprise software applications.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

IMPACT: Operational & Business Transformation Summit

March 23, 2023 By carmen.cimino

Why We Need AI for Air Quality

March 21, 2023 By Jane Marsh

A Complete Career Guide to Becoming an Artificial Intelligence Engineer in 2023

March 21, 2023 By Pradip Mohapatra

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • IMPACT: Operational & Business Transformation Summit
  • Build automated speech systems with Azure Cognitive Services
  • Velocity Data and Analytics Summit, UAE
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • Microsoft Power BI -The Future of Healthcare’s Most Important Breakthrough
  • The Big Crunch of 2025: Is Your Data Safe from Quantum Computing?
  • From Data to Reality: Leveraging the Metaverse for Business Growth
  • How BlaBlaCar Built a Practical Data Mesh to Support Self-Service Analytics at Scale
  • How Blockchain Technology Can Enhance Fintech dApp Development

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!