Setting Up Hadoop

hadoop

#1

Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

This is a guide on how to install Hadoop on a Cloud9 workspace.

First step is to Create A Workspace. Once, your workspace is up and ready, visit the Hadoop download page and copy the link to the latest build. At the time of writing the latest stable build was 2.6.0.

Once you’ve copied the full url to the Hadoop build tar file, go back to your workspace and download the file via wget within the terminal:

wget http://mirror.cogentco.com/pub/apache/hadoop/common/current/hadoop-2.6.0.tar.gz

This will start the download. Please note that the file is 186MB or so, so it might take a couple of minutes.

Once the download finishes, go ahead and extract the tar file by running the following command:

tar xvf hadoop-2.6.0.tar.gz

This will create a new hadoop-2.6.0 folder within your workspace.

Next we need to set up the JAVA_HOME environment variable within Hadoop’s configuration file. The config file we need to edit is etc/hadoop/hadoop-env.sh. Just double click on the file as shown in the screenshot below.

Replace the line setting JAVA_HOME to:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/

and also add the HADOOP_PREFIX variable:

export HADOOP_PREFIX=/home/ubuntu/workspace/hadoop-2.6.0

After the file has been saved, try starting Hadoop using the following command:

cd hadoop-2.6.0/
bin/hadoop

it should output something like the following screenshot. If you get the following message, this means that Hadoop has been installed.

For further information on how to run Hadoop as a Single Node Cluster please visit the Hadoop help page.


Anyone managed to install HAdoop?
Tutorials - Table of Contents
#2

I was setting up Hadoop on a workspace in the so called ‘Pseudodistributed’ mode. One of the requirements for this after the setup you mentioned above, is the ability to ssh to localhost without a passphrase. I keep getting this read-only error when I try to update the authorized keys:

ubuntu@joramun-hadoop-test-3019904:~$ ssh localhost Permission denied (publickey). ubuntu@joramun-hadoop-test-3019904:~$ sudo cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys bash: /home/ubuntu/.ssh/authorized_keys: Read-only file system ubuntu@joramun-hadoop-test-3019904:~$


#3

Hello

I have the same problem. Has anyone please installed Hadoop and can help?

Many thanks,
John


#4

Same problem for me
Can’t teach on Cloud9 because I can’t start HDFS :frowning:


#5

For those of you still looking up for a way to start HDFS on cloud9 you can use the deprecated commands:

hadoop namenode hadoop datanode

This also have the advantage to start both daemon in interactive mode (you can see the logs)