An example Hadoop Install

Views:
 
     
 

Presentation Description

A practical example of how Hadoop can be installed and a cluster created using low cost hardware.

Comments

Presentation Transcript

Arial: 

Apache Hadoop Install Example Using Ubuntu 12.04 Java 1.6 Hadoop 1.2.0 Static DNS 3 Machine Cluster www.semtech-solutions.co.nz info@semtech-solutions.co.nz

MS Gothic: 

Install Step 1 Install Ubuntu Linux 12.04 on each machine Assign a host name and static IP address to each machine Names used here hc1nn ( hadoop cluster 1 name node )‏ hc1r1m1 ( hadoop cluster 1 rack 1 machine 1 ) hc1r1m2 ( hadoop cluster 1 rack 1 machine 2 ) Install ssh daemon on each server Install vsftpd ( ftp ) deamon on each server Update /etc/host with all hostnames on each server www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Wingdings: 

Install Step 2 Generate ssh keys for each server under hadoop user Copy keys to all server's hadoop account Install java 1.6 ( we used openjdk )‏ Obtain the Hadoop software from hadoop.apache.org Unpack Hadoop software to /usr/local Now consider cluster architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Symbol: 

Install Step 3 Start will three single installs For Hadoop Then cluster the Hadoop machines www.semtech-solutions.co.nz info@semtech-solutions.co.nz

MS Mincho: 

Install Step 4 Ensure auto shh From name node (hc1nn) to both data nodes From each machine to itself Create symbolic link Named hadoop Pointing to /usr/local/hadoop-1.2.0 Set up Bash .bashrc on each machine hadoop user set HADOOP_HOME JAVA_HOME www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Courier New: 

Install Step 5 Create Hadoop tmp dir on all servers sudo mkdir -p /app/hadoop/tmp sudo chown hadoop:hadoop /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp Set Up conf/core-site.xml ( on all servers ) www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Lucida Sans Unicode: 

Install Step 5 <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Office Theme: 

Install Step 6 Set Up conf/mapred-site.xml ( on all servers )‏ <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Apache Hadoop Install Example: 

Install Step 7 Set Up conf/hdfs-site.xml ( on all servers )‏ <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 1: 

Install Step 8 Format the Hadoop file system ( on all servers )‏ hadoop namenode -format Dont do this on a running HDFS you will lose all data !! Now start Hadoop ( on all servers )‏ $HADOOP_HOME/bin/start-all.sh Check Hadoop is running with sudo netstat -plten | grep java you should see ports like 54310 and 54311 being used . All Good ? Stop Hadoop on all servers $HADOOP_HOME/bin/stop-all.sh www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 2: 

Install Step 9 Now set up the cluster – do on all servers Set $HADOOP_HOME/conf/masters file to contain hc1nn Set $HADOOP_HOME/conf/slaves file to contain hc1r1m1 hc1r1m2 hc1nn We will be using the name node as a data node as well www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 3: 

Install Step 10 on all machines Change conf/core-site.xml on all machines fs.default.name = hdfs://hc1nn:54310 Change conf/mapred-site.xml mapred.job.tracker = hc1nn:54311 Change conf/hdfs-site.xml dfs.replication = 3 www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 4: 

Install Step 11 Now reformat the HDFS on hc1nn hadoop namenode -format On name node start HDFS $HADOOP_HOME/bin/start-dfs.sh On name node start Map Reduce $HADOOP_HOME/bin/start-mapred.sh www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 5: 

Install Step 12 Run a test Map Reduce job I have data in /tmp/gutenberg Load Data into HDFS hadoop dfs -copyFromLocal /tmp/gutenberg /usr/hadoop/gutenberg List Data in HDFS hadoop dfs -ls /usr/hadoop/gutenberg Found 18 items -rw-r--r-- 3 hadoop supergroup 674389 2013-07-30 19:31 /usr/hadoop/gutenberg/pg20417.txt -rw-r--r-- 3 hadoop supergroup 674389 2013-07-30 19:31 /usr/hadoop/gutenberg/pg20417.txt1 ............... -rw-r--r-- 3 hadoop supergroup 834980 2013-07-30 19:31 /usr/hadoop/gutenberg/pg5000.txt4 -rw-r--r-- 3 hadoop supergroup 834980 2013-07-30 19:31 /usr/hadoop/gutenberg/pg5000.txt5 www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 5: 

Install Step 13 Run the Map Reduce job cd $HADOOP_HOME hadoop jar hadoop*examples*.jar wordcount /usr/hduser/gutenberg /usr/hduser/gutenberg-output Check the output 13/07/30 19:34:13 INFO input.FileInputFormat: Total input paths to process : 18 13/07/30 19:34:13 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/07/30 19:34:14 INFO mapred.JobClient: Running job: job_201307301931_0001 13/07/30 19:34:15 INFO mapred.JobClient: map 0% reduce 0% 13/07/30 19:34:26 INFO mapred.JobClient: map 11% reduce 0% 13/07/30 19:34:34 INFO mapred.JobClient: map 16% reduce 0% 13/07/30 19:34:35 INFO mapred.JobClient: map 22% reduce 0% 13/07/30 19:34:42 INFO mapred.JobClient: map 33% reduce 0% 13/07/30 19:34:43 INFO mapred.JobClient: map 33% reduce 7% 13/07/30 19:34:48 INFO mapred.JobClient: map 44% reduce 7% 13/07/30 19:34:52 INFO mapred.JobClient: map 44% reduce 14% 13/07/30 19:34:54 INFO mapred.JobClient: map 55% reduce 14% 13/07/30 19:35:01 INFO mapred.JobClient: map 66% reduce 14% 13/07/30 19:35:02 INFO mapred.JobClient: map 66% reduce 18% 13/07/30 19:35:06 INFO mapred.JobClient: map 72% reduce 18% 13/07/30 19:35:07 INFO mapred.JobClient: map 77% reduce 18% 13/07/30 19:35:08 INFO mapred.JobClient: map 77% reduce 25% 13/07/30 19:35:12 INFO mapred.JobClient: map 88% reduce 25% www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 6: 

Install Step 13 13/07/30 19:35:17 INFO mapred.JobClient: map 88% reduce 29% 13/07/30 19:35:18 INFO mapred.JobClient: map 100% reduce 29% 13/07/30 19:35:23 INFO mapred.JobClient: map 100% reduce 33% 13/07/30 19:35:27 INFO mapred.JobClient: map 100% reduce 100% 13/07/30 19:35:28 INFO mapred.JobClient: Job complete: job_201307301931_0001 13/07/30 19:35:28 INFO mapred.JobClient: Counters: 29 13/07/30 19:35:28 INFO mapred.JobClient: Job Counters 13/07/30 19:35:28 INFO mapred.JobClient: Launched reduce tasks=1 13/07/30 19:35:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119572 13/07/30 19:35:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 13/07/30 19:35:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 13/07/30 19:35:28 INFO mapred.JobClient: Launched map tasks=18 13/07/30 19:35:28 INFO mapred.JobClient: Data-local map tasks=18 13/07/30 19:35:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=61226 13/07/30 19:35:28 INFO mapred.JobClient: File Output Format Counters 13/07/30 19:35:28 INFO mapred.JobClient: Bytes Written=725257 13/07/30 19:35:28 INFO mapred.JobClient: FileSystemCounters 13/07/30 19:35:28 INFO mapred.JobClient: FILE_BYTES_READ=6977160 13/07/30 19:35:28 INFO mapred.JobClient: HDFS_BYTES_READ=17600721 13/07/30 19:35:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=14994585 13/07/30 19:35:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=725257 13/07/30 19:35:28 INFO mapred.JobClient: File Input Format Counters 13/07/30 19:35:28 INFO mapred.JobClient: Bytes Read=17598630 13/07/30 19:35:28 INFO mapred.JobClient: Map-Reduce Framework www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 7: 

Install Step 14 Check the job output hadoop dfs -ls /usr/hadoop/gutenberg-output Found 3 items -rw-r--r-- 3 hadoop supergroup 0 2013-07-30 19:35 /usr/hadoop/gutenberg-output/_SUCCESS drwxr-xr-x - hadoop supergroup 0 2013-07-30 19:34 /usr/hadoop/gutenberg-output/_logs -rw-r--r-- 3 hadoop supergroup 725257 2013-07-30 19:35 /usr/hadoop/gutenberg-output/part-r-00000 Now get results out of HDFS hadoop dfs -cat /usr/hadoop/gutenberg-output/part-r-00000 > /tmp/hrun/cluster_run.txt head -10 /tmp/hrun/cluster_run.txt "(Lo)cra" 6 "1490 6 "1498," 6 "35" 6 "40," 6 "A 12 "AS-IS". 6 "A_ 6 "Absoluti 6 "Alack! 6 www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 8: 

Install Step 15 Congratulations – you now have A working HDFS cluster With three data nodes One name node Tested via a Map Reduce job Detailed install instructions available from our site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Install Step 9: 

Contact Us Feel free to contact us at www.semtech-solutions.co.nz info@semtech-solutions.co.nz We offer IT project consultancy We are happy to hear about your problems You can just pay for those hours that you need To solve your problems