• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Articles
  • News
  • Events
  • Advertize
  • Jobs
  • Courses
  • Contact
  • (0)
  • LoginRegister
    • Facebook
    • LinkedIn
    • RSS
      Articles
      News
      Events
      Job Posts
    • Twitter
Datafloq

Datafloq

Data and Technology Insights

  • Categories
    • Big Data
    • Blockchain
    • Cloud
    • Internet Of things
    • Metaverse
    • Robotics
    • Security
    • Startups
    • Strategy
    • Technical
  • Big Data
  • Blockchain
  • Cloud
  • Metaverse
  • Internet Of things
  • Robotics
  • Security
  • Startups
  • Strategy
  • Technical

Hadoop

Apache Hadoop is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework. The core of Apache Hadoop consists of a storage part (Hadoop Distributed File System (HDFS)) and a processing part (MapReduce). Hadoop splits files into large blocks (default 64MB or 128MB) and distributes the blocks amongst the nodes in the cluster. To process the data, Hadoop Map/Reduce transfers code (specifically Jar files) to nodes that have the required data, which the nodes then process in parallel. This approach takes advantage of data locality to allow the data to be processed faster and more efficiently via distributed processing than by using a more conventional supercomputer architecture that relies on a parallel file system where computation and data are connected via high-speed networking. The base Apache Hadoop framework is composed of the following modules: Hadoop Common
contains libraries and utilities needed by other Hadoop modules; Hadoop Distributed File System (HDFS)
a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster; Hadoop YARN
a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users’ applications; and Hadoop MapReduce
a programming model for large scale data processing. Since 2012, the term “Hadoop” often refers not just to the base modules above but also to the collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Spark, and others. Apache Hadoop’s MapReduce and HDFS components were inspired by Google papers on their MapReduce and Google File System. The Hadoop framework itself is mostly written in the Java programming language, with some native code in C and command line utilities written as shell-scripts. For end-users, though MapReduce Java code is common, any programming language can be used with “Hadoop Streaming” to implement the “map” and “reduce” parts of the user’s program. Other related projects expose other higher level user interfaces. Prominent corporate users of Hadoop include Facebook and Yahoo. It can be deployed in traditional onsite datacenters as well as via the cloud; e.g., it is available on Microsoft Azure, Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3), Google App Engine and IBM Bluemix cloud services. Apache Hadoop is a registered trademark of the Apache Software Foundation.

Tweet
Share
Share
WhatsApp

Primary Sidebar

E-mail Newsletter

Sign up to receive email updates daily and to hear what's going on with us!

Publish
AN Article
Submit
a press release
List
AN Event
Create
A Job Post

Jobs

  • Data Scientist | Atlanta, GA, USA - August 12, 2022
  • Autonomous Software Engineer – Multiple Openings – REMOTE | Broussard, LA, USA - August 12, 2022
  • Software Engineer | Eau Gallie, FL, USA - August 12, 2022
  • Software Engineer | Saddle Brook, NJ, USA - August 12, 2022
  • Senior Software Engineer | Atlanta, GA, USA - August 12, 2022
More Jobs

Tags

AI Amazon analytics application Artificial Intelligence benefits BI Big Data business Cloud company Covid-19 Data design development DevOps engineer engineering environment experience future government Group health information Java knowledge machine learning mobile news platform public requirements research security services share skills social social media software software engineer solutions strategy technology

News

  • China regulator says Alibaba, Tencent have submitted app algorithm details
  • With Alibaba stake cut, SoftBank’s Son cools toward China tech
  • EV startup Fisker explores higher production in 2023, U.S. manufacturing
  • China’s CATL to build $7.6 billion Hungary battery plant to supply Mercedes, BMW
  • Dutch detain suspected developer of crypto mixer Tornado Cash
More News

Footer


Datafloq is the one-stop source for big data, blockchain and artificial intelligence. We offer information, insights and opportunities to drive innovation with emerging technologies.

  • Facebook
  • LinkedIn
  • RSS
  • Twitter

Recent

  • What Factors Determine the Success of Private Cloud?
  • Understanding StatefulSets in Kubernetes & When to Use Them
  • Dealing With Air Pollution in Data Centers
  • How AI/ML Based Applications Powered by Data Annotation Elevate Customer Experiences
  • Decoding Analytics In The E-Commerce Industry

Search

Tags

AI Amazon analytics application Artificial Intelligence benefits BI Big Data business Cloud company Covid-19 Data design development DevOps engineer engineering environment experience future government Group health information Java knowledge machine learning mobile news platform public requirements research security services share skills social social media software software engineer solutions strategy technology

Copyright © 2022 Datafloq
Privacy|Terms|Cookies

In order to optimize the website and to continuously improve Datafloq, we use cookies. For more information click here.

settings

Dear visitor,
Thank you for visiting Datafloq. If you find our content interesting, please subscribe to our weekly newsletter:

Did you know that you can publish job posts for free on Datafloq? You can start immediately and find the best candidates for free! Click here to get started.

Not Now Subscribe

Thanks for visiting Datafloq
If you enjoyed our content on emerging technologies, why not subscribe to our weekly newsletter to receive the latest news straight into your mailbox?

Subscribe

No thanks

Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Marketing cookies

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.

Please enable Strictly Necessary Cookies first so that we can save your preferences!