Welcome to the World of Big Data Hadoop

Agenda What is Big Data Different Kinds of Big Data Big Data Global Market Hadoop Global job trends What is Hadoop

What is Big Data Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

Types of Big Data Traditional RDBMS deals with only Structured data. Need of a technology which deals with Semi-structured data Unstructured data and Structured data as well Semi-Structured Data

The 3V’s of Big Data

Sources of Data Social Media Networks All of us are generating data Mobile Devices Tracking all the objects all the time Sensor Technology Networks Measuring all kinds of data Scientific Instruments Collecting all sorts of data

Where Big Data is used

Facebook Scenario Facebook on an average generates 70 thousand MB in 1 minute. 1 hour 70000 MB 60 4.2 Million MB 1 Day 4.2 Million 24 MB 10.8 Billion MB 98438 GB 1 week 6.9 thousand GB 690 TB 4 weeks 690 TB 4 2756 TB 2.7 PB 52 weeks 2.7 PB 52 143.3 PB Ad that’s aloooooooooot of data

Various Bigdata Technologies

Big Data Global Market Sources : Dice LinkedIn. Big Data Implementation Implemented Big Data Yet to Implement Big Data 0 10 20 30 40 50 60 2012 2013 2014 2015 2016 2017 Big Data Growth in USD Billions BIG D A TA A NA LYST BIG D A TA A RCHITECT BIG D A TA ENGINEER BIG D A TA RESEA RCH A NA LYST BIG D A TA V ISUA LIZ ER D A TA SCIENTIST 50 43 44 31 23 18 50 57 56 69 77 82 FILLED/VACANCY Filled Unfilled

Hadoop Global Job Trends Top Hadoop Technology Companies Sources : Dice LinkedIn. More than 17000 employees with Hadoop skill across these companies

2 2 3 4 8 8 10 11 14 38 DEMAND FOR BIG DATA IN CITIES As of February 2014 0 20 40 60 80 100 120 SALARY USD P.A. IN THOUSANDS Sources : Dice LinkedIn. Hadoop Global Job Trends

What is Hadoop Hadoop was created by Doug Cutting and Mike Cafarella. Hadoop provides the reliable shared storage and analysis system. It is designed to scale up from a single server to thousand of machines with a high degree of fault tolerance.

Hadoop History

Hadoop Core Components Core Hadoop has two main systems: • Hadoop Distributed File System: The Hadoop file system is a Distributed file system which holds the large amount of data across multiple nodes in a cluster. • MapReduce: MapReduce is a distributed programming paradigm used to analyze the data in the HDFS.

Hadoop Distributed File System HDFS A given file is broken down into blocks default64MB then blocks are replicated across cluster default3. Optimized for throughput. HDFS allows you to put/get/delete files. Follows the philosophy Write Oce ad Read Multiple ties Block Replication for: - Durability High Availability and Throughput.

MapReduce Flow

MapReduce Framework Map Reduce works by breaking the processing into two phases : Map Phase and Reduce Phase.

Syllabus Introduction aBig Data bHadoop Hadoop aHDFS bMapReduce PIG aPig 1 bPig 2 Hive aHive 1 bHive 2 Hbase Zookeeper Sqoop Yarn Project Class

