Installation of Hortonworks(HDP)

Steps to Install Hortonworks(HDP) on 2 Nodes Using Cent OS

  1. Pre-requisites
    1. Hadoop can be installed on the following operating systems
      1. Red Hat Enterprise Linux (RHEL) v6.x
      2. Red Hat Enterprise Linux (RHEL) v5.x (deprecated)
      3. CentOS v6.x
      4. CentOS v5.x (deprecated)
      5. Oracle Linux v6.x
      6. Oracle Linux v5.x (deprecated)
      7. SUSE Linux Enterprise Server (SLES) v11, SP1 and SP3
      8. Ubuntu Precise v12.04
    2. Ensure that the nodes have full hostname.
      Command to check hostname: "hostname -f"
    3. Ensure that you have the following LINUX software’s
      1. yum and rpm (RHEL/CentOS/Oracle Linux)
      2. scp, curl, unzip, tar, and wget
      3. OpenSSL (v1.01, build 16 or later)
      4. python v2.6
      Ensure that you have installed Java. Run command "yum install java-1.7.0-openjdk"
    4. Database & Memory requirements - Ambari requires a relational database to store information about the cluster configuration and topology. If you install HDP Stack with Hive or Oozie, they also require a relational database.
      1. Ambari : By default, install an instance of PostgreSQL on the Ambari Server host.
      2. Hive : By default (on RHEL/CentOS/Oracle Linux 6), Ambari will install an instance of MySQL on the Hive Metastore host.
      3. Oozie : By default, Ambari will install an instance of Derby on the Oozie Server host You can also use existing instance of PostgreSQL, MySQL or Oracle. For the Ambari database, if you use an existing Oracle database, make sure the Oracle listener runs on a port other than 8080 to avoid conflict with the default Ambari port.
      Also ensure that you have atleast 8 GB of ram on each host. Apache recommends that you have atleast 1 GB of ram available.
  2. Hosts Preparation:
      Host Name IP Address
    Node 1
    Node 2

    On NODE1 .i.e

    1. Checking Kernel Versions & Centos Release Versions

      Step -1:
      Ensure the centos installed is 64 Bit using the command

      #uname -r

      Ensure the Operating system installed is centos 6.7 using the command

      Cat /etc/redha-release

  3. Disabling IP Tables & Selinux

    Configuring iptables:During cluster installation we need to ensure the connections are open between cluster hosts without any restrictions.

    In order to achieve that we need to Turn Off the IP tables (Firewall).

    The easiest way to do this is by disabling the iptables, as follows:

    #Service iptables stop
    #Service ip6tables stop
    #chkconfig iptables off
    #chkconfig ip6tables off

    Disable SELinux and PackageKit. Forthis the following file needs to be changed.

    Vi /etc/sysconfig/selinux

    To permanently disable SELinux set SELINUX=disabled

  4. Adding Host Names to Cluster Nodes

    Step -4:
    Ensure that the nodes have fully qualified domain name (full hostname).
    Use command "hostname -f" to check the full hostname or FQDN.

    Use the /etc/hosts file to add both the node hostnames. hdp1 hdp2

    On NODE2i.e

    Follow the steps from step1 to step4 on Node2 i.e as shown above.

    Once all the steps are completed. Ensure the communication happens between the nodes using the below command on both nodes and vice versa.

    On Node1 issue the command


    On Node 2 issue the command

  5. SSH Configuration On Cluster Nodes

    Create password -less SSH. To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent.

    Generate public and private SSH keys on the Ambari Server host ("Command ssh-keygen")

    Copy the SSH Public Key( to the root account on your target hosts.

    .ssh/id_rsa .ssh/   
    # ssh-copy-id -i $HOME/.ssh/
    # ssh-copy-id -i $HOME/.ssh/

    Need to test the ssh connection to the host without password.


    and also connect to node2 using ssh without password


    Needs to be performed on Node2 i.e

    # ssh-keygen -t rsa
    # ssh-copy-id -i $HOME/.ssh/
    # ssh-copy-id -i $HOME/.ssh/

    Need to test the ssh connection to the host without password.


    And also connect to node1 using ssh without password

  6. Installation of httpd & ntp Packages

    Step -7:
    Configure an HTTP server. Ensure to installhttpd using the command

    #yum install httpd on both node1 and node2

    Http server is required for webbrowser access.

    Enable NTP on the Node1.i.e and on the Node2

    # yum install ntp
    # yum install ntp
  7. Download Ambari and hdp Repositories for Cluster Installation

    Step -8:

    Setting up a local repository: If your cluster is behind a firewall that prevents or limits Internet access, you can install Ambari and a Stack using local repositories else you can use internet to access Hortonwork repositories.

    Obtaining the Repositories: Ambari Repositories : If you do not have Internet access for setting up the Ambari repository, use the link appropriate for your OS family to download a tarball that contains the software.
    wget -nv -O /etc/yum.repos.d/HDP.repo

    HDP Stack Repositories : If you do not have Internet access to set up the Stack repositories, use the link appropriate for your OS family to download a tarball that contains the HDP Stack version you plan to install.
    Wget–nv -O/etc/yum.repos.d/ambari.repo

  8. Installation and Configuration of Ambari Server

    Step-9: Installing Ambari
    Issue the command “yum install ambari-server”on node1

    Once ambari is installed we need to run the setup using the command.

    #ambari-server setup

    Select a JDK version to download. Enter 1 to download Oracle JDK 1.7.By default, Ambari Server setup downloads and installs Oracle JDK1.7 by accompanying Java Cryptography Extension (JCE) Policy Files.

    Select n at Enter advanced database configuration to use the default, embedded PostgreSQL database for Ambari. The default PostgreSQL database name is ambari. The default user name and password are ambari/bigdata. Else, to use an existing PostgreSQL, MySQL or Oracle database with Ambari, select y.

    Once the setup is completed - Start the Ambari Server

    Run the following command on the Ambari Server host

    To start the server:

    #ambari-server start

    To stop the Ambari Server:

    #ambari-server stop

    To know status:

    #ambari-server status

    Installing, Configuring, and Deploying a HDP Cluster

    We will use the Ambari Install Wizard running in your browser to install, configure, and deploy our cluster.

    Log In to Apache Ambari

    After starting the Ambari service, open Ambari Web using a web browser & Point your browser to http://:8080, where is the name of yourambari server host. i.e http://

    Log in to the Ambari Server using the default user name/password: admin/admin.

    You can change these credentials later.

  9. Post Installation using Ambari Server UI (User Interface)


  10. Step-11: From the Ambari Welcome page, choose Launch Install Wizard.

    Then press “Launch Install Wizard” for creating new cluster.

  11. Step-12: In Name for your cluster, type a name for the cluster you want to create. There should be no white spaces or special characters in the name

    Give your own name for cluster and Press “Next”.

  12. Step-13: The Service Stack (the Stack) is a coordinated and tested set of HDP components. Use a radio button to select the Stack version you want to install. To install an HDP 2x stack, select the HDP 2.2, HDP 2.1, or HDP 2.0 radio button.

    press “Next”.

  13. Step-14: Need to provide the fully qulaified domain host names here for cluster creation.

    We need to access the private key file created earlier between the hosts.

    For that first we need to open node1. i.e with winscp. Follow above screen.

    And then copy id_rsa file(private key) into your desktop. Find the above screen.

    In order to build up the cluster, the install wizard prompts you for general information about how you want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to access the private key file you created earlier. Using the host names and key file information, the wizard can locate, access, and interact securely with all hosts in the cluster. If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH, select Provide your SSH Private Key and either use the Choose File button in the Host Registration Information section to find the private key file that matches the public key you installed earlier on all your hosts or cut and paste the key into the text box manually.

    Then press “Register and Confirm”.

  14. Step-15: Confirm Hosts

    Here it will check ssh authentication with provided private key.Check above screen.

  15. Step-16: Choose Services you want to deploy

  16. Step-17:
    The Ambari install wizard assigns the master components for selected services to appropriate hosts in your cluster and displays the assignments in Assign Masters.

    Hive server,Metastore Web Hcat should be Hosted in same machine. Here we hosted on node1 i.e

  17. Step-18:
    The Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, and RegionServers) to appropriate hosts in your cluster. It also attempts to select hosts for installing the appropriate set of clients.

    All the clients were installed on node2 i.e

  18. Step-19:
    Customize Services, here we need to provide usernames and passwords in the required fields.

    Here we need to provide password and mail ID for nagios service.

    Review all the settings and services before starting the installation.

  19. Step-20:
    Start the installation by clicking the Deploy button.

  20. Step-21:

    Here, the dashboard shows that all services are up and runing.

    This completes our Hadoop Server Cluster deployment.

    Note: Difference between single node installation and multinode installation

    In single node hdp installation, we have all hadoop daemons on single node with separate jvms. When comes to the multinode hadoop installation all hadoop daemonsare sharedon multiple nodes.For example here we have 2 nodes, in this two nodes we had shared hadoop daemons based on our hardware requirements.




Sign Up for Updates

Your information will be protected.


Please fill in ALL fields with correct information. Your information will be protected.