• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of Things
    • Metaverse
    • Robotics
    • Cybersecurity
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of Things
  • Robotics
  • Cybersecurity
  • Startups
  • Strategy
  • Technical

Data Analytics Platforms

There are several tools available which effectively are a Data as a Platform tool. These tools allow data analytics to be performed as a complete package.

Hadoop

Hadoop is the most well-known big data open source tool around at the moment. It supports data-intensive distributed applications that can run simultaneously on large clusters of normal, commodity, hardware. It is licensed under the Apache v2 license. A Hadoop network is reliable and extremely scalable and it works according to the computational model MapReduce. Hadoop is written in the Java programming language and is used by a global community of distributors.

Apache Spark

Apache Spark is an open-source cluster computing framework originally developed in the AMPLab at UC Berkeley. It is easy to use for developers, who can write applications in Java, Python or Scala. Programs run up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. Spark comes with several libraries: Spark SQL, Spark Streaming, the MLlib machine learning library, and GraphX. It is scalable to 1000s of nodes and fault-tolerant.

Storm

Storm, which is now owned by Twitter, is a real-time distributed computation system. It works the same way as Hadoop provides batch processing as it uses a set of general primitives for performing real-time analyses. Storm is easy to use and it works with any programming language. It is very scalable and fault-tolerant.

MapReduce

MapReduce was originally developed by Google but has now been adapted by many big data tools, among others Hadoop. It is a software framework and model that can process vast amounts of data parallel on a large system of different computer nodes. The MapReduce libraries have been written in many programming languages and it therefore can work with all of them. MapReduce can work with structured and unstructured data.

HPCC Systems

HPCC means ‘high performance computing cluster' and was developed by LexisNexis Risk Solutions. It is a similar version of Hadoop, but it claims to offer ‘superior performance'. There is a free and paid version available. It works with structured and unstructured data and it is scalable from 1-1000s of nodes. It therefore also offers high-performance, parallel big data processing.

Hortonworks

Hortonworks is a pure open source Hadoop Distribution system. It is built on top of Hadoop and it allows users to capture, process and share data at any scale and in any format in a simple and cost-effective manner. Apache Hadoop is a core component of the Hortonworks architecture.

Dremel

Dremel is an interactive ad-hoc query system, which is developed by Google. IT offers analyses of read-only nested data. The system is extremely scalable; to 1000s of PCs and petabytes of data. It can process a collection of queries over massive, trillion-row, tables in just a matter of seconds by combining multi-level execution trees and columnar data layout.

Apache Drill

Apache Drill is part of the Apache Incubator and it offers a distributed system to perform interactive analyses of large-scale datasets that are based on Dremel. At the moment it is still incubating but the goals is to eventually become a massive scalable platform that can process petabytes of data in seconds over up to 10.000 servers.

GreenplumHD

Greenplum HD allows users to start with big data analytics without the need to built an entire new project. Greenplum HD is offered as software or can be used in a pre-configured Data Computing Appliance Module. IT exists of a complete data analysis platform and it combines Hadoop and Greenplum database into a single Data Computing Appliance.

SAMOA

SAMOA is a platform for mining on big data streams. It is a distributed streaming machine learning (ML) framework that contains a programing abstraction for distributed streaming ML algorithms.

IKANOW

Ikanow focuses on developing products to enable uninhibited fusion and analysis of Big Data using open source technology. They have created an open source analytics platform.

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Jobs

  • Software Engineer | South Yorkshire, GB - February 07, 2023
  • Software Engineer with C# .net Investment House | London, GB - February 07, 2023
  • Senior Java Developer | London, GB - February 07, 2023
  • Software Engineer – Growing Digital Media Company | London, GB - February 07, 2023
  • LBG Returners – Senior Data Analyst | Chester Moor, GB - February 07, 2023
More Jobs

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

News

  • Renault customers to lodge criminal complaint in France over faulty engines
  • Tech shares see biggest-ever weekly inflow on AI boom-BofA
  • Kenya central bank says digital currency not a ‘compelling priority’
  • Netflix, Disney, Amazon to challenge India’s tobacco rules for streaming-sources
  • SentinelOne’s disappointing forecast slams shares
More News

Related Online Courses

  • Oracle Cloud Data Management Foundations Workshop
  • Data Science at Scale
  • Statistics with Python
More courses

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • 5 Reasons Why Modern Data Integration Gives You a Competitive Advantage
  • 5 Most Common Database Structures for Small Businesses
  • 6 Ways to Reduce IT Costs Through Observability
  • How is Big Data Analytics Used in Business? These 5 Use Cases Share Valuable Insights
  • How Realistic Are Self-Driving Cars?

Search

Tags

AI Amazon analysis analytics application Artificial Intelligence BI Big Data business China Cloud Companies company crypto customers Data design development digital engineer engineering environment experience future Google+ government Group health information learning machine learning mobile news public research security services share skills social social media software solutions strategy technology

Copyright © 2023 Datafloq
HTML Sitemap| Privacy| Terms| Cookies

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!