This article is a guide to setup a HBase cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.
Prerequisites
This setup guides assumes you have gone through the Hadoop Setup Guide and the Zookeeper Setup Guide.
It assumes you are using the following software versions.
- MacOS 10.11.3
- Vagrant 1.8.5
- Java 1.8.0
- Zookeeper 3.4.8
- Hadoop 2.7.3
- HBase 1.2.2
Here are the steps I used:
-
First, create a workspace.
mkdir -p ~/vagrant_boxes/hbase
cd ~/vagrant_boxes/hbase
-
Next, create a new vagrant box. I’m using a minimal CentOS vagrant box.
vagrant box add “CentOS 6.5 x86_64” https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box
-
We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.
vagrant init -m “CentOS 6.5 x86_64” hbase_base
-
Next, change the Vagrantfile to the following:
Vagrant.configure(2) do |config| config.vm.box = "CentOS 6.5 x86_64" config.vm.box_url = "hbase_base" config.ssh.insert_key = false end
-
Now, install HBase and it’s dependencies.
vagrant up
vagrant ssh
sudo yum install java-1.8.0-openjdk-devel
sudo yum install wget
wget https://www.apache.org/dist/hbase/stable/hbase-1.2.2-bin.tar.gz ~
gunzip -c *gz | tar xvf –
-
Open up your ~/.bash_profile and append the following lines.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64 export PATH=$PATH:$JAVA_HOME/bin export HBASE_HOME=~/hbase-1.2.2 export PATH=$PATH:$HBASE_HOME/bin export HBASE_CONF_DIR=$HBASE_HOME/conf
-
Source the profile.
source ~/.bash_profile
-
Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.
Host * StrictHostKeyChecking no
-
Now run these commands to finish the password-less authentication.
chmod 600 ~/.ssh/config
ssh-keygen -f ~/.ssh/id_rsa -t rsa -P “”
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-
In /etc/hosts, remove any lines starting with 127.0.*, and add the following lines.
192.168.50.11 zoo1.example.com 192.168.50.12 zoo2.example.com 192.168.50.13 zoo3.example.com 192.168.50.14 zoo4.example.com 192.168.50.15 zoo5.example.com 192.168.50.21 hdfs-namenode.example.com 192.168.50.22 hdfs-datanode1.example.com 192.168.50.23 hdfs-datanode2.example.com 192.168.50.24 hdfs-datanode3.example.com 192.168.50.25 hdfs-datanode4.example.com 192.168.50.31 hbase-master.example.com 192.168.50.32 hbase-region1.example.com 192.168.50.33 hbase-region2.example.com 192.168.50.34 hbase-region3.example.com 192.168.50.35 hbase-region4.example.com
-
In ~/hbase-1.2.2/conf/hbase-env.sh, append the following lines to the bottom of the file.
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64 export HBASE_MANAGES_ZK=false
-
Edit ~/hbase-1.2.2/conf/hbase-site.xml to contain the following:
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hdfs-namenode.example.com:9000/hbase</value> </property> <property> <name>hbase.master.hostname</name> <value>hbase-master.example.com</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>zoo1.example.com,zoo2.example.com,zoo3.example.com,zoo4.example.com,zoo5.example.com</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/tmp/zookeeper</value> </property> </configuration>
-
In ~/hbase-1.2.2/conf/regionservers, remove localhost and add the following lines:
hbase-region1.example.com hbase-region2.example.com hbase-region3.example.com hbase-region4.example.com
-
The docs say you can create a “backup-masters” file in the conf directory, but I had a problem starting my cluster when I did. So, I skipped this step.
-
Exit the SSH session and copy the VM for the other hbase nodes.
exit
vagrant halt
vagrant package
vagrant box add hbase ~/vagrant_boxes/hbase/package.box
-
Edit the Vagrantfile to look like the following below. This will create 5 hbase nodes for us using the new HBase VM.
Vagrant.configure("2") do |config| config.vm.define "hbase-master" do |node| node.vm.box = "hbase" node.vm.box_url = "hbase-master.example.com" node.vm.hostname = "hbase-master.example.com" node.vm.network :private_network, ip: "192.168.50.31" node.ssh.insert_key = false # Change hostname node.vm.provision "shell", inline: "hostname hbase-master.example.com", privileged: true end (1..4).each do |i| config.vm.define "hbase-region#{i}" do |node| node.vm.box = "hbase" node.vm.box_url = "hbase-region#{i}.example.com" node.vm.hostname = "hbase-region#{i}.example.com" node.vm.network :private_network, ip: "192.168.50.3#{i+1}" node.ssh.insert_key = false # Change hostname node.vm.provision "shell", inline: "hostname hbase-region#{i}.example.com", privileged: true end end end
-
Bring the new Vagrant VMs up.
vagrant up –no-provision
-
Start HBase. For some reason, I can start HBase from the provisioner. So, I ssh in and start it up.
vagrant provision
vagrant ssh hbase-master
~/hbase-1.2.2/bin/start-hbase.sh
To test the cluster:
-
Log into the Master Server and run ‘jps’ on the command line. You should see at least these two process.
jps
Jps
HMaster -
Log into one of the Region Servers and run ‘jps’ on the command line. You should see at least these three processes.
jps
Jps
HMaster
HRegionServer -
Go to http://192.168.50.31:16010/ and you should see all of the Region Servers running.
-
From the Master Server, start the HBase shell.
vagrant ssh hbase-master
sudo ~/hbase-1.2.2/bin/hbase shell
-
At the command prompt, you should be able to create a table.
create ‘test’, ‘cf’
-
And you should be able to list the table.
list
-
And you should be able to put date into the table.
put ‘test’, ‘row1’, ‘cf:a’, ‘value1’
put ‘test’, ‘row2’, ‘cf:b’, ‘value2’
put ‘test’, ‘row3’, ‘cf:c’, ‘value3’
-
And you should be able to view all the data in the table.
scan ‘test’
-
Or just get one row.
get ‘test’, ‘row1’