1 / 26

Hadoop Introduction

DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop:This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: https://goo.gl/QadBS4 Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477, +91-9111133369 Or visit our website https://data-flair.training/

PritamPal
Download Presentation

Hadoop Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hadoop Tutorial

  2. Agenda • Introduction to Hadoop • Hadoop nodes & daemons • Hadoop Architecture • Characteristics • Hadoop Features

  3. What is Hadoop? The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others Hadoop

  4. What is Hadoop? An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware

  5. What is Hadoop? An Open Sourceframework that allows distributed processing of large data-sets across the cluster of commodity hardware Open Source • Source code is freely available • It may be redistributed and modified

  6. What is Hadoop? An open source framework that allows Distributed Processingof large data-sets across the cluster of commodity hardware Distributed Processing • Data is processed distributedlyon multiple nodes / servers • Multiple machines processes the data independently

  7. What is Hadoop? An open source framework that allows distributed processing of large data-sets across the Clusterof commodity hardware Cluster • Multiple machines connected together • Nodes are connected via LAN

  8. What is Hadoop? An open source framework that allows distributed processing of large data-sets across the cluster of CommodityHardware Commodity Hardware • Economic / affordable machines • Typically low performance hardware

  9. What is Hadoop? • Open source framework written in Java • Inspired by Google's Map-Reduce programming model as well as its file system (GFS)

  10. Hadoop History Doug Cutting added DFS & MapReduce in Hadoop defeated Super computer converted 4TB of image archives over 100 EC2 instances Doug Cutting started working on Doug Cutting joined Cloudera 2002 2003 2004 2005 2006 2007 2008 2009 published GFS & MapReduce papers Hadoop became top-level project Development of started as Lucenesub-project • launched Hive, • SQL Support for Hadoop

  11. Hadoop consists of three key parts Hadoop Components

  12. Hadoop Nodes Nodes Master Node Slave Node

  13. Hadoop Daemons Nodes Master Node Slave Node Resource Manager Node Manager NameNode DataNode

  14. Basic Hadoop Architecture Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Work Sub Work Sub Work Sub Work Sub Work USER Master(s) Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work 100 SLAVES

  15. Hadoop Characteristics Distributed Processing Open Source Fault Tolerance Easy to use Reliability Economic High Availability Scalability

  16. Open Source • Source code is freely available • Can be redistributed • Can be modified Free Transparent Affordable Inter-operable Open Source No vendor lock Community

  17. Distributed Processing • Data is processed distributedly on cluster • Multiple nodes in the cluster process data independently Centralized Processing Distributed Processing

  18. Fault Tolerance • Failure of nodes are recovered automatically • Framework takes care of failure of hardware as well tasks

  19. Reliability • Data is reliably stored on the cluster of machines despite machine failures • Failure of nodes doesn’t cause data loss

  20. High Availability • Data is highly available and accessible despite hardware failure • There will be no downtime for end user application due to data USER

  21. Scalability • Vertical Scalability – New hardware can be added to the nodes • Horizontal Scalability – New nodes can be added on the fly

  22. Economic • No need to purchase costly license • No need to purchase costly hardware Economic Open Source Commodity Hardware = +

  23. Easy to Use • Distributed computing challenges are handled by framework • Client just need to concentrate on business logic

  24. Data Locality • Move computation to data instead of data to computation • Data is processed on the nodes where it is stored Data Data Data Data App Servers Storage Servers Algo Algo Data Data Algorithm Algo Algo Data Data Servers

  25. Summary • Everyday we generate 2.3 trillion GBs of data • Hadoop handles huge volumes of data efficiently • Hadoop uses the power of distributed computing • HDFS & Yarn are two main components of Hadoop • It is highly fault tolerant, reliable & available

  26. Thank You DataFlair /DataFlairWS /c/DataFlairWS

More Related