Apache Spark

Spark is an open source tool that was developed in the AMPLab at UC Berkeley. Apache Spark is a general-purpose engine for large-scale data processing, up to 1000s of nodes. It is an in-memory distributed computing engine that is highly versatile to any environment. This enables users and developers to quickly build models, iterate faster and apply deep intelligence to your data across your organization.

Spark’s distinguishing feature is its Resilient Distributed Datasets (RDDs). This feature allows collections of objects to be stored in memory or disk across a cluster, which automatically rebuilds on failure. Its in-memory primitives offer up to 100 times faster performances, contrary to the two-stage, disk-based MapReduce paradigm. It therefore addresses several of the MapReduce challenges.

Spark lets data scientists and developers work together in a unified platform. It enables developers to essentially execute Python or Scala code across a cluster instead to one machine. Users can load data into a cluster’s memory and they can query it repeatedly. Basically Spark is an advanced analytics tool that is very useful for machine learning algorithms because of these clusters.


Website     https://spark.apache.org/

Claim this Profile

In order to contact Apache Spark, you have to be a registered Datafloq user

Register here or Login

Your contact request has been sent successfully.

View your leads