• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Consider this: Aligning Big Data

Martyn Jones / 8 min read.
January 14, 2015
Datafloq AI Score
×

Datafloq AI Score: 66

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/frt0m

In order to bring some semblance of simplicity, coherence and integrity to the Big Data debate I am sharing an evolving model for pervasive information architecture and management.

This is an overview of the realignment and placement of Big Data into a more generalized architectural framework, an architecture that integrates data warehousing (DW 2.0), business intelligence and statistical analysis.

The model is currently referred to as the DW 3.0 Information Supply Framework, or DW 3.0 for short.

A Recap

In a previous piece with the name of Data Made Simple Even Big Data, I looked at three broad-brush classes of data: Enterprise Operational Data; Enterprise Process Data; and, Enterprise Information Data. The following is a diagram taken from that piece:

Fig. 1 Data Made Simple

In simple terms the classes of data can be defined in the following terms:

Enterprise Operational Data This is data that is used in applications that support the day to day running of an organisations operations.

Enterprise Process Data This is measurement and management data collected to show how the operational systems are performing.

Enterprise Information Data This is primarily data which is collected from internal and external data sources, the most significant source being typically Enterprise Operational Data.

These three classes form the underlying basis of DW 3.0.

The Overall View

The following diagram illustrates the overall framework:

Fig. 2 DW 3.0 Information Supply Framework

There are three main elements within this diagram: Data Sources; Core Data Warehousing (the Inmon architecture and process model); and, Core Statistics.

Data Sources This element covers all the current sources, varieties and volumes of data available which may be used to support processes of challenge identification, option definition, decision making, including statistical analysis and scenario generation.

Core Data Warehousing This is a suggested evolution path of the DW 2.0 model. It faithfully extends the Inmon paradigm to not only include unstructured and complex data but also the information and outcomes derived from statistical analysis performed outside of the Core Data Warehousing landscape.

Core Statistics This element covers the core body of statistical competence, especially but not only with regards to evolving data volumes, data velocity and speed, data quality and data variety.

The focus of this piece is on the Core Statistics element. Mention will also be made of how the three elements provide useful synergies.

Core Statistics

The following diagram focuses on the Core Statistics element of the model:

Fig. 3 DW 3.0 Core Statistics

What this diagram seeks to illustrate is the flow of data and information through the process of data acquisition, statistical analysis and outcome integration.

What this model also introduces is the concept of the Analytics Data Store. This is arguably the most important aspect of this architectural element.

Data Sources

For the sake of simplicity there are three explicitly named data sources in the diagram (of course there can be more, and the Enterprise Data Warehouse or its dependent Data Marts may also act as a data source), but for the purpose of this blog piece I have limited the number to three: Complex data; Event data; and, Infrastructure data.

Complex Data This is unstructured or highly complexly structured data contained in documents and other complex data artefacts, such as multimedia documents.

Event Data This is an aspect of Enterprise Process Data, and typically at a fine-grained level of abstraction. Here are the business process logs, the internet web activity logs and other similar sources of event data. The volumes generated by these sources will tend to be higher than other volumes of data, and are those that are currently associated with the Big Data term, covering as it does that masses of information generated by tracking even the most minor piece of behavioural data from, for example, someone casually surfing a web site.

Infrastructure Data This aspect includes data which could well be described as signal data. Continuous high velocity streams of potentially highly volatile data that might be processed through complex event correlation and analysis components.

The Revolution Starts Here

Here I will backtrack slightly to highlight some guiding principles behind this architectural element.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Without a business imperative there is no business reason to do it: What does this mean? Well, it means that for every significant action or initiative, even a highly speculative initiative, there must be a tangible and credible business imperative to support that initiative. The difference is as clear as that found between the Sage of Omaha and Santa Claus.

All architectural decisions are based on a full and deep understanding of what needs to be achieved and of all of the available options: For example, rejecting the use of a high performance database management product must be made for sound reasons, even if that sound reason is cost. It should not be based on technical opinions such as I dont like the vendor, much. If a flavour of Hadoop makes absolute sense then use it, if Exasol or Oracle or Teradata make sense, then use them. You have to be technology agnostic, but not a dogmatic technology fundamentalist.

That statistics and non-traditional data sources are fully integrated into the future Data Warehousing landscape architectures: Building even more corporate silos, whether through action or omission, will lead to greater inefficiencies, greater misunderstanding and greater risk-

The architecture must be coherent, coherent, usable and cost-effective: If not, whats the point, right?

That no technology, technique or method is discounted: We need to be able to cost-effectively incorporate any relevant, existing and emerging technology into the architectural landscape.

Reduce early and reduce often: Massive volumes of data, especially at high speed, are problematic. Reducing those volumes, even if we cant theoretically reduce the speed is absolutely essential. I will elaborate on this point and the following separately.

That only the data that is required is sourced. That only the data that is required is forwarded: Again, this harks back on the need for clear business imperatives tied to the good sense of only shipping data that needs to be shipped.

Reduce Early, Reduce Often

Here I expand on the theme of early data filtering, reduction and aggregation. We may be generating increasingly massive amounts of data, but that doesnt mean we need to hoard all of it in order to get some value from it.

In simplistic data terms this is about putting the initial ET in ETL (Extract and Transform) as close to the data generators as possible. Its the concept of the database adapter, but in reverse.

Lets look at a scenario.

A corporation wants to carry out some speculative analysis on the many terabytes of internet web-site activity log data being generated and collected every minute of every day.

They are shipping massive log files to a distributed platform on which they can run data mapping and reduction.

Then they can analyse the resulting data.

The problem they have, as with many web sites that were developed by hackers, designers and stylists, and not engineers, architects and database experts, is that are lumbered with humungous and unwieldy artefacts such as massive log files of verbose, obtuse and zero-value adding data.

What do we need to ensure that this challenge is removed?

We need to rethink internet logging and then we need to redesign it.

  • We need to be able to tokenise log data in order to reduce the massive data footprint created by badly designed and verbose data.
  • We need to have the dual option of being able to continuously send data to an Event Appliance that can be used to reduce data volumes on an event by event and session basis.
  • If we must use log files, then many small log files are preferable to fewer massive log files, and more log cycles are preferable to few log cycles. We must also maximise the benefits of parallel logging. Time bound/volume bound session logs are also worth considering and in more depth.

So now, we are either getting log data to the point of use either via log files, log files produced by an Event Appliance (as part of a toolkit of Analytic Data Harvesting Adapters) or sent by that appliance to a reception point via messaging.

Once that data has been transmitted (conventional file transfer/sharing or messaging) we can then move to the next step: ET(A)L Extract, Transform, Analyse and Load

For log files we would typically employ ET(A)L but for messages of course we do not need the E, the extract, as this is about direct connect.

Again the ET(AL) is another form of reduction, which is why the analytics aspect is included to ensure that the data that gets through is the data that is needed, and that the junk that has no recognisable value, gets cleaned out early and often.

The Analytics Data Store

The ADS (which can be a distributed data store on a Cloud somewhere) supports the data requirements of statistical analysis. Here the data is organised, structured, integrated and enriched to meet the ongoing and occasionally volatile needs of the statisticians and data scientists focusing on data mining. Data in the ADS can be accumulative or completely refreshed. It can have a short life span or have a significantly long life-time.

The ADS is the logistics centre for analytics data. Not only can it be used to provide data into the statistical analysis process, but it can also be used to provide persistent long term storage for analysis outcomes and scenarios, and for future analysis, hence the ability to write back.

The data and information in the ADS may also be augmented with data derived from data stored in the data warehouse, it may also benefit from having its own dedicated Data Mart specifically designed for this purpose.

Results of statistical analysis on the ADS data may also result in feedback being used to tune the data reduction, filtering and enrichment rules further downstream, either in smart data analytics, complex event and discrimination adapters or in ET(AL) job streams.

Thats all folks.

This has been necessarily a very brief and high-level view of what I currently label DW 3.0.

The model doesnt seek to define statistics or how statistical analysis is to be applied, which has been done more than adequately elsewhere, but how statistics can be accommodated in an extended DW 2.0 architecture, and without the need to come up with almost reactionary and ill-fitting solutions to problems that can be solved in better and more effective ways through good sense, sound engineering principles and the judicious application of appropriate methods, technologies and techniques.

If you have questions or suggestions or observations regarding this framework, then please feel free to contact me either here or via LinkedIn mail.

Categories: Big Data
Tags: Big Data

About Martyn Jones

Martyn's range of knowledge, skills and experience span executive management, organisational strategy, strategic business performance and information management, leadership, business analysis, business and data architectures, data management, and executive and team coaching.

Martyn has worked with and advised many of the world's best-known organisations including Adidas, Banco Santander, Bank of China, BBVA, Boston Consulting Group, British Telecom, La Caixa, Central Statistical Office (UK), Central Statistical Office of Poland, Citco, Citigroup, Credit Suisse, E.On, Eroski, European Union, Fnac, France Telecom, Hewlett Packard, Iberdrola, IBM, Iberia, Infineon, T rkiye ' , Metropolitan Police, Movistar, NCR, National Health Service (UK), Office of the Governor - State of California, Oracle, The Home Office (UK), Rolls-Royce Marine Power Operations, the Royal Navy, Shell, Swiss Life, TSB, UBS, Unisys, the United Nations and Xerox, among many others.

He currently focuses on helping clients to:

-' Create relevant, understandable and actionable information
-' Plan, manage, design, develop and deliver information supply frameworks for the timely, appropriate and adequate supply of information
-' Design, develop and deliver beneficial, tangible and usable strategic performance and information frameworks
-' Design, develop and deliver relevant and coherent performance models, indicators and metrics
-' Plan, manage, design, develop and deliver information and data analytic strategies
-' Design, develop and deliver management informational insight and dynamic feedback solutions
-' Coach teams in measuring and managing performance
-' Align people, competencies, processes and practices with strategy
-' Prepare clients for the next big thing in Information Management and Analytics
-' Help IT suppliers to better align with the needs and nature of clients and prospects
-' Help clients capitalise on tangible benefits derived from advanced information architectures and management

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

The Advantages of IT Staff Augmentation Over Traditional Hiring

May 4, 2023 By Mukesh Ram

The State of Digital Asset Management in 2023

May 3, 2023 By pimcoremkt

Test Data Management – Implementation Challenges and Tools Available

May 1, 2023 By yash.mehta262

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government health information learning machine learning market mobile news public research security services share skills social social media software solutions strategy technology

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government health information learning machine learning market mobile news public research security services share skills social social media software solutions strategy technology

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!