apache spark

Apache Spark – A Basic Understanding

Soumya Raj Poddar /
August 10, 2017

Before diving deep into how Apache Spark works, lets understand the jargon of Apache Spark Job: A piece of code which reads some input from HDFS or local, performs some computation on the data and writes some output data. Stages: Jobs are divided into stages. Stages are classified as a Map or reduce stages (Its easier to understand if you have worked on Hadoop and want to … [Read more...] about Apache Spark – A Basic Understanding

25 Must Know Big Data Terms To Impress Your Date

Ramesh Dontha /
July 4, 2017

Big Data can be intimidating! If you are new to Big Data, please readÂ What is Big Data'Â to get you started. With the basic concepts under your belt, let's focus on some key terms to impress your date or boss or family.Â So let's get going with this list.Â Algorithm A mathematical formula or statistical process used to perform an analysis of data. How is Algorithm is related … [Read more...] about 25 Must Know Big Data Terms To Impress Your Date

Real-Time Kafka Data Ingestion into HBase via PySpark

Issam Hijazi /
February 9, 2017

Streaming data is becoming an essential part of every data integration project nowadays, if not a focus requirement, a second nature. Advantages gained from real-time data streaming are so many. To name a few: real-time analytics and decision making, better resource utilization, data pipelining, facilitation for micro-services and much more. Python has many modules out there … [Read more...] about Real-Time Kafka Data Ingestion into HBase via PySpark

How to Overcome Big Data Analytics Limitations With Hadoop

Annie Qureshi /
November 28, 2016

Hadoop is an open source project that was developed by Apache back in 2011. The initial version had a variety of bugs, so a more stable version was introduced in August. Hadoop is a great tool for big data analytics, because it is highly scalable, flexible and cost-effective. However, there are also some challenges big data analytics professionals need to be aware of. The good … [Read more...] about How to Overcome Big Data Analytics Limitations With Hadoop

Five Patterns of Big Data Integration

Lakshmi Randall /
December 22, 2015

As reliance on Hadoop and Spark grows for data management, processing and analytics, data integration strategies should evolve to exploit big data platforms in support of digital business, Internet of Things (IoT) and analytics use cases. While Hadoop is used for batch data processing, Spark supports low-latency processing. Integration leaders should understand the various … [Read more...] about Five Patterns of Big Data Integration

Apache Spark – A Basic Understanding

25 Must Know Big Data Terms To Impress Your Date

Real-Time Kafka Data Ingestion into HBase via PySpark

How to Overcome Big Data Analytics Limitations With Hadoop

Five Patterns of Big Data Integration

Recent

Search

apache spark

Footer

Recent

Search

Tags