Big Data Search Tools

Big Data requires a special big data search system and there are a few open source tool that know how to search big data. Big data search needs to be able to search through unstructured and structured data, running many queries at the same time, if possible in real-time.


Lucene is a search algorithm under the Apache license and it offers high-performance and scalable indexing. It indexes almost 100 GB per hour on commodity hardware and only requires minimal memory requirements. The algorithm offers ranked search, field searching, data-range searching as well as multiple-index searching. There are multiple query options and is it built completely in Java.

Apache Solr

Solr is a standalone enterprise search server built in Java under the Apache license. It can run as a full-text search server and offers features like faceted search, dynamic clustering, near real-time indexing and geospatial search. It is scalable and fault tolerant. Solr powers many of the most important search engines on the web.


Elasticsearch is a Dutch open source tool built on top of Apache Solr. Data is indexed using JSON over HTTP and it can scale of to 100s of machines. Elasticsearch is schema-less and it can search across multiple indices. Indices are broken down in shards and these are distributed of the nodes. This allows fast operation and easy rebalancing and rerouting.

comments powered by Disqus