• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

6 Reasons We Need Open and Vendor-Neutral Metadata Services

Ben Lorica / 3 min read.
October 19, 2015
Datafloq AI Score
×

Datafloq AI Score: 84

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/kKKMj

As I spoke with friends leading up to Strata + Hadoop World NYC 2015, one topic continued to come up: metadata. Its a topic that data engineers and data management researchers have long thought about because it has significant effects on the systems they maintain and the services they offer. Ive also been having more and more conversations about applications made possible by metadata collection and analysis.

At the recent Strata + Hadoop World, U.C. Berkeley professor and Trifacta co-founder Joe Hellerstein outlined the reasons why the broader data industry should rally to develop open and vendor-neutral metadata services. He made the case that improvements in metadata collection and sharing can lead to interesting applications and capabilities within the industry.

Below are some of the reasons why Hellerstein believes the data industry should start focusing more on metadata:

Improved Data Analysis: Metadata-on-Use

You will never know your data better than when you are wrangling and analyzing it. Joe Hellerstein

A few years ago, I observed that context-switching due to using multiple frameworks created a lag in productivity. Todays tools have improved to the point that someone using a single framework like Apache Spark can get many of their data tasks done without having to employ other programming environments. But outside of tracking in detail the actions and choices analysts make, as well as the rationales behind them, todays tools still do a poor job of capturing how people interact and work with data.

Enhanced Interoperability: Metadata-on-Use

If youve read the recent OReilly report Mapping Big Data or played with the accompanying demo, then youve seen the breadth of tools and platforms that data professionals have to contend with. Recreating a complex data pipeline means knowing the details (e.g., version, configuration parameters) of each component involved in a project. With a view to reproducibility, metadata in a persistent (stored) protocol that cuts across vendors and frameworks would come in handy.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Comprehensive Interpretation of Results

Behind every report and model (whether physical or quantitative) are assumptions, code, and parameters. The types of models used in a project determine what data will be gathered and, conversely, models depend heavily on the data that is used to build them. So, proper interpretation of results needs to be accompanied by metadata that focuses on factors that inform data collection and model building.

Reproducibility

As I noted above, the settings (version, configuration parameters) of each tool involved in a project are essential to the reproducibility of complex data pipelines. This usually means only documenting projects that yield a desired outcome. Using scientific research as an example, Hellerstein noted that having a comprehensive picture is often just as important. This entails gathering metadata for settings and actions in projects that succeeded as well as projects that failed.

Data Governance Policies By the People, For the People

Governance usually refers to policies that govern important items including the access, availability, and security of data. Rather than adhering to policies that are dictated from above, metadata can be used to develop a governance policy that is based on consensus and collective intelligence. A sandbox where users can explore and annotate data could be used to develop a governance policy that is fueled by observing, learning, and iterating.

Time Travel and Simulations

Comprehensive metadata services lead to capabilities that many organizations aspire to have: the ability to quickly reproduce data pipelines opens the door to what-if scenarios. If the right metadata is collected and stored, then models and simulations can fill in any gaps where data was not captured, perform realistic recreations, and even conduct alternate histories (recreations that use different settings).

Public domain image on article and category pages by NASA on Wikimedia Commons.

Categories: Big Data
Tags: analysis, Big Data, Metadata, policies, results, simulation

About Ben Lorica

Ben Lorica is the Chief Data Scientist and Director of Content Strategy for Data at O'Reilly Media. He has applied Business Intelligence, Data Mining, Machine Learning and Statistical Analysis in a variety of settings including Direct Marketing, Consumer and Market Research, Targeted Advertising, Text Mining, and Financial Engineering. His background includes stints with an investment management company, internet startups, and financial services.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

How to Validate OpenAI GPT Model Performance with Text Summarization (Part 1)

March 29, 2023 By mark

What is Enterprise Application Integration (EAI), and How Should Your Company Approach It?

March 29, 2023 By Terry Wilson

Technology for Marketing, Singapore

March 29, 2023 By r.chan

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital environment experience finance future Google+ government information learning machine learning market mobile Musk news public research security share social social media software startup strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Big Data & AI World, Singapore
  • Creating a Differential Competitive Advantage -Jagdish Sheth
  • Intel AI Fundamentals
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • How to Validate OpenAI GPT Model Performance with Text Summarization (Part 1)
  • What is Enterprise Application Integration (EAI), and How Should Your Company Approach It?
  • 5 Best Data Engineering Projects & Ideas for Beginners
  • Personalization Vs. Hyper-Personalization: Benefits, Limitations and Potential
  • Explaining data products lifecycle and their scope in management

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto customers Data development digital environment experience finance future Google+ government information learning machine learning market mobile Musk news public research security share social social media software startup strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!