• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Real-time stream processing: Are we doing it wrong?

David Sangma / 5 min read.
August 29, 2017
Datafloq AI Score
×

Datafloq AI Score: 72.33

Datafloq enables anyone to contribute articles, but we value high-quality content. This means that we do not accept SEO link building content, spammy articles, clickbait, articles written by bots and especially not misinformation. Therefore, we have developed an AI, built using multiple built open-source and proprietary tools to instantly define whether an article is written by a human or a bot and determine the level of bias, objectivity, whether it is fact-based or not, sentiment and overall quality.

Articles published on Datafloq need to have a minimum AI score of 60% and we provide this graph to give more detailed information on how we rate this article. Please note that this is a work in progress and if you have any suggestions, feel free to contact us.

floq.to/DD0yT

Humans to Machine “ Shift of data source

Data has been growing exponentially. We have more data streaming through the wire than we can keep them on disk from both value and volume perspective. These data are being created by everything we deal with on daily basis. When humans were the dominant creator of data, we naturally used to have fewer amounts of data to deal with and at the same time value used to persist for a longer period. This, in fact, holds true now as well, if humans are the creator of the data.

However, humans are no longer the dominant creator of the data. Machines, sensors, devices etc. have taken over long time back. These data, created by machines with humongous speed, is so much that in last two years we had 90% of the data created since the dawn of civilization. These data tend to have limited shelf life as far as value is concerned. The value of data decreases rapidly with time. If the data is not processed as soon as possible then it may not be very useful for ongoing businesses and operations. Naturally, we need to have different thought process and approach to deal with these data.

Why stream analytics is the key for future analytics

Since we are having more of these data streaming in from all different sources, that if combined and analyzed then huge value could be created for the users or businesses. At the same time given the perishable nature of the data, it’s imperative that these data must be analyzed and used as soon as they are created.

More and more use cases are being generated which need to be tackled to push the boundaries and achieve newer goals. These use cases demand collection of data from different data sources, joining across different layers, correlation and processing across different domains, all in real time. The future of analysis is less about understanding what happened and more about what’s happening or what may happen .

Use cases

Let’s analyze some of the use cases. Consider an e-commerce platform, which is integrated with real time streaming platform. Using this integrated streaming analysis of data, it could combine & process different data in real time to figure out the intent or behavior of the user to present personalized offer or content. This could increase the conversion rate significantly or reduce to eroding customer engagements. It could also have better campaign management to yield better results for the same spend.

Think of a small or mid-size data center (DC), which typically have many kinds of different devices and machines each generating volume of data every moment. They typically use many different static tools for different kinds of data in different silos. These tools not only restrict the DC in having a single view of the entire data center but also work like a BI tool. Because of this, the issue identification in predictive or real time manner doesn’t happen as a result firefighting becomes the norm of the day. With converged integrated stream analytic platform, DC could have a single view of the entire DC along with real time monitoring of events, data to ensure issues are caught before it may create bigger problems. A security breach could be seen or predicted much earlier before the damage is done. Analyzing the bandwidth usage and forecasting in near real time could do better resource planning and provisioning.

The entire IoT is based on the premise that everything can generate data and interact with other things in real time to achieve larger goals. This requires real time streaming analytic framework to be in place to ingest all sorts of unstructured data from different disparate sources, monitor them in real time and take actions as required after identifying either known patterns or anomalies

The AI and predictive analytic means that the data is being collected and processed in real time otherwise the impact of AI could only be in understanding what happened. And with the growth of data and types, it will be prudent to not rely solely on what has been learnt so far in the hindsight. Demand will be in reacting to new things as it is seen or felt. Also, we have learnt from our experiences that a model trained on older data often struggles to deal with newer data with acceptable accuracy. Therefore, here also, the real time streaming platform becomes the required part rather than a good to have piece.


Interested in what the future will bring? Download our 2023 Technology Trends eBook for free.

Consent

Limitations with existing tools or platforms

There are two broad categories in which we can slot the options available in the market. One is appliance model and another one is series of open source tools that need to be assembled to create a platform. While former costs several millions of dollars up front, the latter requires dozens of consultants for several months to create a platform. Time to market, cost, ease of use and lack of unified options are few major drawbacks. However, there are bigger issues to be addressed by either of these options when it comes to stream processing and here we require a new approach to solve the problems. We can’t apply older tools to newer, future looking problems. Otherwise, it will remain a patchwork and would not scale to the needs of the hour.

Challenges with Stream Processing

Here are the basic high-level challenges when it comes to dealing with a stream of data and to process them in real time:

  • Deal with high volume of unstructured data
  • Avoiding multiple copies of the data across different layers
  • Optimal flow of the data through the system
  • Partitioning the application across resources
  • Processing streaming data in real time
  • Data storage
  • Remain predictive rather than only forensic or BI tool
  • Ease of use
  • Time to market
  • Deployment model

Most of the options in the market suffer from these bottlenecks. Let’s take few examples;

Spark

Spark follows map reduce model philosophically although in much more efficient manner. However, it still deals with batches. Spark deals with micro batches of a given size with given batch interval. It has several problems when it comes to aligning with stream processing; in fact, its approach is antithesis of stream processing;

  • Micro or macro batches are not important, as long it’s a batch. Consider a macro and micro batches both with an equal number of events because of different speed of the data. The concept of batch doesn’t align with stream processing where processing every single event is important
  • Processing starts when batch is full. This fails the premise of processing event as it comes
  • Stream processing happens within a moving or sliding window. Windowing with batches is not possible
  • When batch processing time is more than batch interval then backlog only grows, this coupled with persisting data sets only aggravates the situation

Kafka + Spark + Cassandra

This model typically uses 5 or more distributed verticals, each containing many different nodes. This increases the network hops, data copy to great level, which eventually increases the latency. Scaling such system is not trivial as we have different dynamic requirements at different level. Further, cost of adding new processing logic is significantly higher than a simple BI tool where things could be handled using a dashboard. Finally, it requires large team and resources, which increases the cost. This can hardly be deployed for a scenario where sub second latency is desirable.

Kinesis

AWS kinesis at best is equivalent of Kafka, a distributed, partitioned messaging layer. Users still must assemble different layers (processing, store, visualization etc.) themselves.

Final word

There are a few solutions and workarounds to achieve stream processing in the current landscape. However these may not be the optimal approach for current and future needs. We need a scalable solution that can process streaming data with low latency and also provide ease of use. This is the only way, innovations in IoT, AI, ML will be ushered in the right direction.

Categories: Internet Of Things
Tags: apache spark, internet of things, platform, real-time analytics

About David Sangma

Works with IQLECT in big data analytics. Trying to make real-time big data easy and available for all.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Related Articles

Top 6 Cybersecurity Certification Programs in 2023

March 22, 2023 By Lucia Adams

Empowering Cyber Defenders: The Role of AI in Securing Our Digital Future

March 13, 2023 By Jessica Wade

The Business Case for Investing in Application Security Testing

March 13, 2023 By Hemanth Kumar

Related Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Related Events

  • 6th Middle East Banking AI & Analytics Summit 2023 | Riyadh, Saudi Arabia - May 10, 2023
  • Data Science Salon NYC: AI & Machine Learning in Finance & Technology | The Theater Center - December 7, 2022
  • Big Data LDN 2023 | Olympia London - September 20, 2023
More events

Related Online Courses

  • Forming, Funding, & Launching a Startup Company
  • Introduction to Computer Security
  • Business Innovation and Digital Disruption
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • Microsoft Power BI -The Future of Healthcare’s Most Important Breakthrough
  • The Big Crunch of 2025: Is Your Data Safe from Quantum Computing?
  • From Data to Reality: Leveraging the Metaverse for Business Growth
  • How BlaBlaCar Built a Practical Data Mesh to Support Self-Service Analytics at Scale
  • How Blockchain Technology Can Enhance Fintech dApp Development

Search

Tags

AI Amazon analysis analytics app application Artificial Intelligence BI Big Data blockchain business China Cloud Companies company costs crypto Data development digital environment experience finance financial future Google+ government information machine learning market mobile Musk news public research security share skills social social media software startup strategy technology twitter

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

  • Facebook
  • Twitter
  • LinkedIn
  • WhatsApp

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!