The massive growth that social media networks underwent in the past few years require different tools to manage all the big data streams pouring in and out of the social networks. There are several open source tools that help social media network cope with the explosive data growth.

Apache Kafka

Kafka is a distributed publish-subscribe messaging system. It supports persistent messaging, high-throughput and parallel data load into Hadoop. It can handle and process all activity stream data (views, searches etc.). Current applications include LinkedIn to power LinkedIn Newsfeed and LinkedIn Today, Twitter, Foursquare and many others. You can view the possible application here.


ThinkUp is an online application that allows users to capture all activity on social networks. It can show users graphs and charts in one dashboard in order to understand the activity in a social network. Conversations can be shown in rich visualizations. It is easy to use and install and only thing that is needed is a website able to deal with PHP applications.


Corona is a scheduling framework developed by Facebook to separate cluster resource management from job coördination. It replaces the Hadoop MapReduce scheduling framework at Facebook, as they had reached the limits of Hadoop MapReduce because of vast quantities of data that Facebook has to deal with. A Corona MapReduce cluster consists of a cluster manager, a Corona Job tracker and a Proxy Job tracker. Facebook also developed an open source tool of it and it is dubbed the next version of MapReduce.

comments powered by Disqus