As reliance on Hadoop and Spark grows for data management, processing and analytics, data integration strategies should evolve to exploit big data platforms in support of digital business, Internet of Things (IoT) and analytics use cases. While Hadoop is used for batch data processing, Spark supports low-latency processing. Integration leaders should understand the various … [Read more...] about Five Patterns of Big Data Integration
apache spark
Introduction to Apache Spark with Examples and Use Cases
I first heard of Spark in late 2013 when I became interested in Scala, the language in which Spark is written. Some time later, I did a fun data science project trying to predict survival on the Titanic. This turned out to be a great way to get further introduced to Spark concepts and programming. I highly recommend it for any aspiring Spark developers looking for a place to … [Read more...] about Introduction to Apache Spark with Examples and Use Cases
Why is Apache Spark Becoming More Popular?
2015 may be known as the year Apache Spark really came into its own. Spark was first launched back in 2009, and its popularity has been steadily growing ever since. However, in just the past year, Spark has exploded onto the scene. Perhaps this shouldnt be a surprise since the signs were there beforehand. After all, at the tail end of 2014, Apache Spark finally surpassed Hadoop … [Read more...] about Why is Apache Spark Becoming More Popular?
How Big Data is Affecting Online Dating in China
Finding a spouse can be a difficult challenge these days. Dating just isnt what it used to be, which is why so many people can get easily frustrated by the experience. Nowhere is this felt more than in China, where even though there are hundreds of millions of single people looking for the right one, success in the dating world can be tough to come by. Like in the United … [Read more...] about How Big Data is Affecting Online Dating in China
Spark Reaches for the Holy Grail: Federated Queries
Hallowed Ground: Data Federation Data federation technology is software that provides end-users with the ability to aggregate data from disparate sources and formats with virtual database objects. The benefits of this technology include increased availability and reliability as well as improved access times for BI and data analysis. The major data warehouse players - IBM, … [Read more...] about Spark Reaches for the Holy Grail: Federated Queries