Category: Entertainment

Presentation Description

No description available.


Presentation Transcript

slide 1:

Top 20 Hadoop BigData Interview Questions And Answers Here are the top 20 interview questions and answers on BigData Hadoop. Refer these important questions before going to the interview. For more details about Bigdata Hadoop visit IQ Online Training - Best IT Online Training Institute. 1 Define BigData and what are the five V’s of Big Data A Big Data is a term used to explain a set of information that is large in length and yet growing exponentially with time. In short such statistics is so massive and complicated that not one of the traditional facts management tools are able to store it or system it successfully. The below are the 5 V’s of Big Data:  Volume  Velocity  Variety  Veracity  Value Big Data Example: Take Social Media Impact as an example Statistic indicates that 500+terabytes of new records gets ingested into the databases of social media site Facebook every day. This statistics is especially generated in terms of photograph and video uploads message exchanges placing remarks and so forth. 2 What is Hadoop and its additives A Apache Hadoop evolved as a solution to BigData when it is emerged as a problem. Apache Hadoop is a framework which gives us diverse services or tools to store and process Big Data. It

slide 2:

facilitates in reading Big Data and making enterprise selections out of it that may not be accomplished successfully and correctly using conventional systems.  Storage unit– HDFS NameNode DataNode  Processing framework– YARN ResourceManager NodeManager 3 What are HDFS and YARN A HDFS Hadoop Distributed File System is nothing but storage unit of Hadoop. It is liable for storing distinctive kinds of records as blocks in a distributed environment. It follows master and slave topology.  NameNode: NameNode is the master node in the distributed surroundings and it continues the metadata records for the blocks of facts stored in HDFS like block location replication factors and so forth.  DataNode: DataNodes are the slave nodes that are liable for storing facts in the HDFS. NameNode manages all the DataNodes. YARN Yet Another Resource Negotiator is the processing framework in Hadoop which manages sources and offers an execution environment to the procedures.  ResourceManager: It receives the processing requests after which passes the elements of requests to corresponding NodeManagers accordingly in which the actual processing takes area. It allocates assets to programs based totally at the wishes.  NodeManager: NodeManager is hooked up on each DataNode and its far accountable for the execution of the task on every single DataNode. 4 What are the various Hadoop daemons and their roles in a Hadoop cluster A At first refer the following various HDFS daemons:  NameNode  DataNode and

slide 3:

 Secondary NameNode And then moving to YARN daemons:  ResourceManager and  NodeManager and  Finally about JobHistoryServer. NameNode: A master node thats accountable for storing the metadata of all the documents and directories. It has data approximately blocks that make a report and wherein those blocks are located within the cluster. DataNode: It is the slave node that consists of the actual facts. Secondary NameNode: It periodically merges the changes with the FsImage Filesystem Image present in the NameNode. It stores the modified FsImage into chronic storage which may be used in case of failure of NameNode. ResourceManager: It is the principal authority that manages resources and agenda applications running on pinnacle of YARN. NodeManager: It runs on slave machines and is chargeable for launching the application’s boxes monitoring their aid utilization CPU memory disk network and reporting those to the ResourceManager. JobHistoryServer: It is responsible for maintaining information about MapReduce jobs after the Application Master terminates. 5 What are active and passive “NameNodes” A In HA High Availability architecture weve got two NameNodes – Active “NameNode” and Passive “NameNode”.  Active “NameNode” is the “NameNode” which works and runs within the cluster.

slide 4:

 Passive NameNode is a standby “NameNode” which has similar statistics as active NameNode. The passive “NameNode” replaces the active “NameNode” in the cluster when the active NameNode fails. Hence the cluster is in no way without a “NameNode” and so it by no means fails. 6 What are the modes that Hadoop can be run in A Hadoop can run in 3 modes:  Standalone Mode: It is the default mode of Hadoop where it makes use of the local file system for input and output operations. This mode is particularly used for debugging cause and it doesn’t support the use of HDFS. Further in this mode theres no custom configuration required for mapred-site.xml core-site.xml hdfs- site.xml files. Much quicker whilst compared to other modes.  Pseudo-Distributed Mode SingleNode Cluster: In this mode you need configuration for all the 3 documents cited above. In this case all daemons are going for walks on one node and thus each Master and Slave node are identical.  Fully Distributed Mode Multiple Cluster Node: This is the manufacturing phase of Hadoop wherein statistics are used and dispensed across numerous nodes on a Hadoop cluster. Separate nodes are allocated as Master and Slave. 7 Mention the most common Input Formats in Hadoop A The three most common Input Formats in Hadoop are as follows:  Text Input Format: It is the Default input layout in Hadoop.  Key Value Input Format: This format is used for plain text files in which the documents are broken into lines.  Sequence File Input Format: This format is used for studying documents in sequence.

slide 5:

8 What are the methods of a Reducer A The 3 core methods of a Reducer are: 1. Setup: To configure the numerous parameters like entering data length and disbursed cache the Setup method is used. Public void setup context 2. Reduce: This method usually called once consistent with key with the associated decreased mission. Public void reduce Key Value context 3. Cleanup: This method is called to clean the temporary files simplest as soon as at the cease of the mission. Public void cleanup context 9 Name a few companies that use Hadoop A Yahoo One of the largest person greater than 80 code contributor to Hadoop  Facebook  Netflix  Amazon  Adobe  eBay  Hulu  Spotify  Rubikloud  Twitter 10 What is the port number for NameNode Task Tracker and Job Tracker A The following are the port numbers for NameNode Task Tracker and Job tracker:  NameNode: The port number for NameNode is 50070  Job Tracker: The port number for Job Tracker is 50030

slide 6:

 Task Tracker: The port number for Task Tracker is 50060 11 Whenever a client submits a Hadoop job who receives it A NameNode receives the Hadoop job which then appears for the records requested with the aid of the consumer and gives the block statistics. Job Tracker looks after resource allocation of the Hadoop job to ensure well-timed completion. 12 Explain about the different catalog tables in HBase A There are two different catalog tables in HBase. They are:  ROOT and  META ROOT table: It tracks where the META table is. META table: It stores all the regions in the system. 13 What are the different types of tombstone markers in HBase for deletion A There are 3 different types of tombstone markers in HBase for deletion. They are as follows:  Family Delete Marker- This type of marker marks all columns for a column family.  Version Delete Marker-This kind of marker marks a single version of a column.  Column Delete Marker-This marker marks all the versions of a column. 14 Differentiate between Structured and Unstructured data A Structured Data: The Data which is stored in traditional database systems in the form of rows and columns for example the online purchase transactions can be referred to as Structured Data. Semi-Structured Data: Data which can be stored only partially in traditional database systems for example data in XML records can be referred to as semi structured data. Unstructured Data: Unorganized and raw data that cannot be categorized as semi structured or structured data is referred to as unstructured data.

slide 7:

Examples of Unstructured data: Facebook updates Tweets on Twitter Reviews web logs etc. 15 What are the main components of a Hadoop Application A Hadoop applications have wide range of technologies that provide great advantage in solving complex business problems. Core components of a Hadoop application are- 1 Hadoop Common 2 HDFS 3 Hadoop MapReduce 4 YARN The Data Access Components are - Pig and Hive The Data Storage Component is - HBase The Data Integration Components are - Apache Flume Sqoop Chukwa The Data Management and Monitoring Components are - Ambari Oozie and Zookeeper. The Data Serialization Components are - Thrift and Avro The Data Intelligence Components are - Apache Mahout and Drill. 16 What are the steps involved in deploying a big data solution A There are three steps involved in deploying a big data solution: i Data Ingestion – The main step in deploying big data solutions is to extract statistics from unique resources which will be an Enterprise Resource Planning System like SAP any CRM like Salesforce or Siebel RDBMS like MySQL or Oracle or could be the log documents flat documents files pictures social media feeds. This statistics desires to be saved in HDFS. Data

slide 8:

can both be ingested through batch jobs that run every 15 mins as soon as each night and so on or thru streaming in actual-time from a 100 ms to 120 seconds. ii Data Storage – The next step after ingesting records is to keep it both in HDFS or NoSQL database like HBase. HBase storage works well for random read/write access while HDFS is optimized for sequential get right of entry to. iii Data Processing – The ultimate step is to process the statistics using one of the processing frameworks like MapReduce spark pig hive and so forth. 17 How do you define “block” in HDFS What is the default block size of Hadoop 1 and in Hadoop 2 Can it be changed A Block is defined as the smallest location on the hard drive where data is stored. HDFS stores each as a block and distribute it across the Hadoop Cluster. Files in HDFS are chopped into block-sized chunks which are stored as independent units.  Default Block size of Hadoop 1: 64MB  Default Block size of Hadoop 2: 128MB Yes blocks can be changed. 18 How do you outline “Rack Awareness” in Hadoop A Rack Awareness is the algorithm in which the “NameNode” makes a decision how blocks and their replicas are placed based on rack definitions to minimize community visitors among “DataNodes” inside the identical rack. Let’s say we remember replication factor 3 default the coverage is that “for every block of records two copies will exist in a single rack third copy in a different rack”. This rule is referred as the “Replica Placement Policy”. "YOUR DREAM JOB IS AWAITING YOU" ENROLL NOW

authorStream Live Help