Big Data Hadoop Tutorial PPT for Beginners

Views:
 
     
 

Presentation Description

DataFlair's Big Data Hadoop Tutorial PPT for Beginners takes you through various concepts of Hadoop: What is Hadoop, Introduction to Hadoop, Why Hadoop, Hadoop Architecture, Hadoop Ecosystem Components, Hadoop Nodes – master & slave, Hadoop Daemons, Hadoop Characteristics, and Features of Hadoop. This Hadoop tutorial PPT covers: 1. Introduction to Hadoop 2. What is Hadoop 3. Hadoop History 4. Why Hadoop 5. Hadoop Nodes 6. Hadoop Architecture 7. Hadoop data flow 8. Hadoop components – HDFS, MapReduce, Yarn 9. Hadoop Daemons 10. Hadoop characteristics & features Related Blogs: Hadoop Introduction – A Comprehensive Guide: https://goo.gl/QadBS4 Wish to Learn Hadoop & Carve your career in Big Data, Contact us: info@data-flair.training +91-7718877477

Comments

Presentation Transcript

Hadoop Tutorial:

Hadoop Tutorial

Agenda:

Agenda Introduction to Hadoop Hadoop nodes & daemons Hadoop Architecture Characteristics Hadoop Features

What is Hadoop?:

What is Hadoop? The Technology that empowers Yahoo, Facebook , Twitter, Walmart and others Hadoop

What is Hadoop?:

What is Hadoop ? An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware

What is Hadoop?:

What is Hadoop ? An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware Open Source Source code is freely available It may be redistributed and modified

What is Hadoop?:

What is Hadoop ? An open source framework that allows Distributed Processing of large data-sets across the cluster of commodity hardware Distributed Processing Data is processed distributedly on multiple nodes / servers Multiple machines processes the data independently

What is Hadoop?:

What is Hadoop ? An open source framework that allows distributed processing of large data-sets across the Cluster of commodity hardware Cluster Multiple machines connected together Nodes are connected via LAN

What is Hadoop?:

What is Hadoop ? An open source framework that allows distributed processing of large data-sets across the cluster of Commodity Hardware Commodity Hardware Economic / affordable machines Typically low performance hardware

What is Hadoop?:

What is Hadoop ? Open source framework written in Java Inspired by Google's Map-Reduce programming model as well as its file system (GFS)

Slide10:

Hadoop defeated Super computer Hadoop became top-level project launched Hive, SQL Support for Hadoop Development of started as Lucene sub-project published GFS & MapReduce papers 2002 2003 2005 2006 2008 Doug Cutting started working on Doug Cutting added DFS & MapReduce in converted 4TB of image archives over 100 EC2 instances Doug Cutting joined Cloudera 2009 2004 Hadoop History 2007

Hadoop Components:

Hadoop Components Hadoop consists of three key parts

Hadoop Nodes:

Master Node Slave Node Hadoop Nodes Nodes

Hadoop Daemons:

Master Node Slave Node Hadoop Daemons Resource Manager NameNode Node Manager DataNode Nodes

Basic Hadoop Architecture:

Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Work Master(s) 100 SLAVES USER Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Basic Hadoop Architecture

Hadoop Characteristics:

Fault Tolerance Economic Distributed Processing High Availability Easy to use Open Source Scalability Reliability Hadoop Characteristics

Open Source:

Open Source Source code is freely available Can be redistributed Can be modified Free Affordable Community Transparent Inter-operable No vendor lock Open Source

Distributed Processing:

Distributed Processing Data is processed distributedly on cluster Multiple nodes in the cluster process data independently Centralized Processing Distributed Processing

Fault Tolerance:

Fault Tolerance Failure of nodes are recovered automatically Framework takes care of failure of hardware as well tasks

Reliability:

Reliability Data is reliably stored on the cluster of machines despite machine failures Failure of nodes doesn’t cause data loss

High Availability:

High Availability Data is highly available and accessible despite hardware failure There will be no downtime for end user application due to data USER

Scalability:

Scalability Vertical Scalability – New hardware can be added to the nodes Horizontal Scalability – New nodes can be added on the fly

Economic:

Economic No need to purchase costly license No need to purchase costly hardware Economic Open Source Commodity Hardware = +

Easy to Use:

Easy to Use Distributed computing challenges are handled by framework Client just need to concentrate on business logic

Data Locality:

Data Locality Move computation to data instead of data to computation Data is processed on the nodes where it is stored Storage Servers App Servers Data Data Data Data Servers Data Data Data Data Algorithm Algo Algo Algo Algo

Summary:

Summary Everyday we generate 2.3 trillion GBs of data Hadoop handles huge volumes of data efficiently Hadoop uses the power of distributed computing HDFS & Yarn are two main components of Hadoop It is highly fault tolerant, reliable & available

Thank You:

Thank You DataFlair / c/ DataFlairWS / DataFlairWS

authorStream Live Help