Site icon Datafloq News

What is a Data Lake and What Are the Benefits?

A data lake is a central location in which to store all your data, regardless of its source or format. It is typically, although not always, built using Hadoop. The data can be structured or unstructured. You can then use a variety of storage and processing toolstypically tools in the extended Hadoop ecosystemto extract value quickly and inform key organizational decisions.

Because of the growing variety and volume of data, data lakes are an emerging and powerful architectural approach, especially as enterprises turn to mobile, cloud-based applications, and the Internet of Things (IoT) as right-time delivery mediums for big data.

Data Lake versus EDW

The differences between enterprise data warehouses (EDW) and data lakes are significant. An EDW is fed data from a broad variety of enterprise applications. Naturally, each applications data has its own schema, requiring the data to be transformed to conform to the EDWs own predefined schema. Designed to collect only data that is controlled for quality and conforming to an enterprise data model, the EDW is capable of answering only a limited number of questions.

Data lakes, on the other hand, are fed information in its native form. Little or no processing is performed for adapting the structure to an enterprise schema. The biggest advantage of data lakes is flexibility. By allowing the data to remain in its native format, a far greaterand timelierstream of data is available for analysis.

Some of the benefits of a data lake include:

Key Attributes of a Data Lake

To be classified as a data lake, a big data repository should exhibit three key characteristics:

Image: Zaloni

Data lakes are becoming more and more central to enterprise data strategies. Data lakes best address todays data realities: much greater data volumes and varieties, higher expectations from users, and the rapid globalization of economies.

Exit mobile version