hadoopHadoop is supported by GNU/Linux platforms and their flavors. Therefore, we have to install a Linux operating system for setting up Hadoop environment. In case you have an OS other than Linux, you can install a Virtualbox software in it and have Linux inside the Virtualbox.

In this tutorial we will be using centos 7.0 as the opertating system.

Before installing Hadoop into the Linux environment, we need to set up Linux using ssh (Secure Shell). Follow the steps given below for setting up the Linux environment.

Repeat this steps on all nodes:

Installing Java

Java is the main prerequisite for Hadoop. First of all, you should verify the existence of java in your system using the command “java -version”. The syntax of java version command is given below.

$ java -version

If everything is in order, it will give you the following output.

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

If java is not installed in your system, then follow the steps given below for installing java.

Step 1

Download java (JDK <latest version> — X64.tar.gz) by visiting the following
Link to Latest Version: http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz

$ wget http://download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.tar.gz

Then jdk-8u131-linux-x64.tar.gz will be downloaded into your system.

Step 2

Generally you will find the downloaded java file in Downloads folder. Verify it and extract the jdk-8u131-linux-x64.tar.gz file using the following commands.

$ cd Downloads/
$ ls
jdk-8u131-linux-x64.tar.gz
$ tar zxf jdk-8u131-linux-x64.tar.gz

Step 3

To make java available to all the users, you have to move it to the location “/usr/local/”. Open root, and type the following commands.

$ su
password:
$ mv jdk1.7.0_71 /usr/local/
$ exit

Step 4

For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.

export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH=$PATH:$JAVA_HOME/bin

Now apply all the changes into the current running system.

$ source ~/.bashrc

Creating a user:

Open the root user using su

Add the user using useradd username

Switch to newly created user by su username

# su
# useradd hduser
# su hduser

Mapping the Nodes

First of all, we have to edit hosts file in /etc/ folder on all nodes, specify the IP address of each system followed by their host names.

# vi /etc/hosts

Enter the following lines in the /etc/hosts file.

192.168.1.xxx hadoop-master
192.168.1.xxx hadoop-slave-1
192.168.56.xxx hadoop-slave-2

Passwordless Login Through ssh

Then we need to setup ssh passwordless login. For this, we need to Configure Key Based Login.

Setup ssh in every node such that they can communicate with one another without any prompt for a password.

# su hduser
$ ssh-keygen -t rsa
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@hadoop-master
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@hadoop-slave-1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub hduser@hadoop-slave-2

Note: ssh folder should have permission: 700 & authorised_key should have 644 and hduser should have 755 permission in both master & slaves. (This is very important as I wasted a lot of my time trying to figure this out.)

Read about sentiment analysis

All you need to know about Sentiment Analysis

 

Downloading Hadoop

Download and extract Hadoop 2.8.0 from Apache software foundation using the following commands.

$ su
password:
# cd /usr/local
# wget http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.8.0/hadoop-2.8.0.tar.gz
# tar xzf hadoop-2.8.0.tar.gz
# mv hadoop-2.8.0  hadoop
# chown hduser hadoop
# exit

Configuring Hadoop

  • Set $HADOOP_HOME in bashrc as:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
  • Create a directory named hadoop_data in opt folder and dfs in $HADOOP_HOME
  • Inside dfs create a directory called name and inside name create a directory named data
  • The permissions for name and dfs should be 777.
  • Make sure that hadoop_data folder in opt folder is owned by hduser and its permissions should be 777
  • Your core-site.xml file should look like:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop_data</value>
<description>directory for hadoop data</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:54311</value>
<description> data to be put on this URI</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:54311</value>
<description>Use HDFS as file storage engine</description>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>

Your hdfs-site.xml file should look like:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<final>true</final>
</property>
</configuration>

Your mapred-site.xml should look like:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
  • Your yarn-site.xml should look like:
<configuration>
<! — Site specific YARN configuration properties →
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-master:8033</value>
</property>
</configuration>

Now, set JAVA_HOME in hadoop-env.sh

export JAVA_HOME=/home/hduser/software/jdk1.8.0_111

Next, in the master node, set slaves IP address in $HADOOP_HOME/etc/hadoop/slaves file

hadoop-slave-1
hadoop-slave-2

Remove localhost entry from the above file.

Any Problem doing above steps, do leave your query in the comment section below

LEAVE A REPLY

Please enter your comment!
Please enter your name here