Introduction to Apache Ambari
It provides a highly interactive dashboard which allows the administrators to visualize the progress and status of every application running over the Hadoop cluster.
Its flexible and scalable user-interface allows a range of tools such as Pig, MapReduce, Hive, etc., to be installed on the cluster and administers their performances in a user-friendly fashion. Some of the key features of this technology can be highlighted as:
- Instantaneous insight into the health of Hadoop cluster using pre-configured operational metrics
- User-friendly configuration providing an easy step-by-step guide for installation
- You can install Apache Ambari through the Hortonworks Data Platform
- Dependencies and performances monitored by visualizing and analyzing jobs and tasks
- Authentication, authorization and auditing by installing Kerberos-based Hadoop clusters
- Flexible and adaptive technology fitting perfectly in the enterprise environment.
How is Ambari different from ZooKeeper?
This description may confuse you as ZooKeeper performs the similar kind of tasks. But, there is a huge difference between the tasks performed by these two technologies if looked closely. Following comparison will give you a clearer idea:
|Basis of Difference||Apache Ambari||Apache ZooKeeper|
|Basic Task||Monitoring, provisioning and managing Hadoop cluster||Maintaining configuration information, naming and synchronizing the cluster.|
|Nature||Web interface||Open-source server|
|Status maintenance||Status maintained through APIs||Status maintained through znodes|
Therefore these tasks may seem similar from a bird’s eye-view but actually these two technologies perform different tasks on the same Hadoop cluster making it agile, responsive, scalable and fault-tolerant in a big way. As an Apache Ambari administrator you will be creating and managing Ambari users and groups. You can also input user and groups from LDAP systems into Ambari.
How Apache Ambari came into existence?
The genesis of Apache Ambari traces the emergence of Hadoop when its distributed and scalable computing took the world by storm. More and more technologies were incorporated in the existing infrastructure. Gradually Hadoop matured and it became difficult for the cluster to maintain multiple nodes and applications simultaneously. That is when this technology came into picture to make distributed computing easier.
Currently it is one of the leading projects running under Apache Software Foundation.
Installing Apache Ambari
To build up the cluster, the install wizard needs to know general information regarding how to setup the cluster to which you should supply the FQDN of each of your hosts.
Additionally, the wizard needs access of the private key file the user created in Set Up Password-less SSH. This is used to locate all the hosts in the system and to access and interact with them securely.
1. The list of host names like one per line can be entered Using the Target Hosts text box to enter your, one per line.
2. Select Provide your SSH Private Key if you want Ambari to automatically install the Ambari Agent on all your hosts using SSH.In the Host Registration Information, one can use the Choose File button section to find the private key file matching the public key installed earlier on all one’s hosts. Alternatively, one can cut and paste the key into the text box manually.
3. Select Perform manual registration if you do not wish to have Ambari automatically install the Ambari Agents.
Go through this short video of Intellipaat elucidating on Ambari :
Apache Ambari architecture
Ambari provides intuitive and REST APIs that automate the operations in the Hadoop cluster. Its consistent and secure interface allows it to be fairly efficient in operational control. Its easy and user-friendly interface efficiently diagnoses the health of Hadoop cluster using an interactive dashboard.
To have a better understanding of how Ambari works, let’s look at the detailed architecture of Ambari, in the following diagram:
Apache Ambari follows a master/slave architecture where the master node instructs the slave nodes to perform certain actions and report back the state of every action. The master node is responsible for keeping track of the state of the infrastructure. To do this, the master node uses a database server, which can be configured during setup time.
These are the following applications in Apache Ambari, at the core:
- Ambari server
- The Ambari agent
- Ambari web UI
1. Ambari server
The entry point for all administrative activities on the master server is known as Ambari server. It is a shell script. Internally this script uses Python code, ambari-server.py and routes all the requests to it.
Ambari server consists of several entry points that are available when passed different parameters to the Ambari-server program like:
- Daemon management
- Software upgrade
- Software setup
- LDAP (Lightweight Direct Access Protocol) /PAM (Pluggable Authentication Module) /Kerberos management
- Ambari backup and restore
- Miscellaneous options
2. Ambari Agent
The Ambari Agent runs on all the nodes that we want to manage with Ambari. This program periodically heartbeats to the master node. By using this agent, Ambari-server executes many of the tasks on the servers.
3. Ambari web interface
Ambari web interface is one of the powerful features of Ambari application. The web application is through the server of Ambari program which is running on the master host exposed on port 8080. You can access this application and this application is protected by authentication. Also, you can control and view all aspects of your Hadoop Cluster, once you log in to the web portal.
Ambari supports multiple RDBMS (Relational Database Management Systems) to keep track of the state of the entire Hadoop infrastructure. Also, you can choose the database you want to use during the setup of the Ambari for the first time.
Ambari supports these following databases at the time of writing:
- MySQL or MariaDB
- Embedded PostgreSQL
- Microsoft SQL Server
- SQL Anywhere
- Berkeley DB
This technology is preferred by the big data developers as it is quite handy and comes with a step-by-step guide allowing easy installation on the Hadoop cluster. Its pre-configured key operational metrics provide quick look into the health of Hadoop core, i.e., HDFS and MapReduce along with the additional components such as Hive, HBase, HCatalog, etc. Ambari sets up a centralized security system by incorporating Kerberos and Apache Ranger into the architecture. The RESTful APIs monitor the information as well as integrate the operational tools. Its user-friendliness and interactivity has brought it in the range of top ten open source technologies for Hadoop cluster.
Learn what top MNCs ask in big data interviews with these Top Hadoop Interview Questions!
Features of Apache Ambari
Following are some of features of Ambari. Read on to understand how the tool is expertly used in big data arena.
Platform independent – Apache Ambari runs in Windows, Mac and many other platforms as it architecturally supports any hardware and software systems. Other platforms where Ambari runs are Ubuntu, SLES, RHEL etc. Those components which are dependent on a platform like yum, rpm packages, debian packages ought to be plugged with well defined interfaces.
Pluggable component – Any current Ambari application can be customized. Any specific tools and technologies ought to be encapsulated by pluggable components. The goal of pluggability doesn’t encompass standardization of inter-component.
Version management and upgrade – Ambari itself maintains versions and hence there is no need of external tools like Git. If any Ambari application is to be upgraded or even Ambari is to be upgraded then doing it fairly easy.
Extensibility – We can extend the functionality of existing Ambari applications by adding different view components.
Failure recovery – Assume you are working on an Ambari application and something wrong happens. Then the system should gracefully recover from it. If you are a Windows user you can relate well to this. You might have worked on word file and suddenly there is a power outage. After turning the system on there will be an autosaved version of the document when you run the MS word.
Security – The Ambari application comes with robust security and it can sync with LDAP over the active directory.
Learn Big Data Hadoop in 16 hrs from experts
Benefits of using Apache Ambari
This is given with respect to Hortonworks Data Platform (HDP). Ambari eliminates the need for manual tasks used to watch over Hadoop operations. It gives a simple secure platform for provisioning, managing and monitoring HDP deployments. Ambari is an easy to use Hadoop management UI and is solidly backed by REST APIs.
It provides numerous benefits like:
Installation, configuration and management is way simplified
Ambari can efficiently create Hadoop clusters at scale. It wizard driven approach lets the configuration be automated as per the environment so that the performance is optimal. Master slave and client components are assigned to configuring services. It is also used to install, start and test the cluster.
Configuration blueprints give recommendations to those seeking a hands-on approach. The blue print of an ideal cluster is stored. How it is provisioned is clearly traced. This is then used to automate the creation of successive clusters without any user interaction. Blueprints also preserve and ensure the application of best practices across different environments.
Ambari also provides rolling upgrade feature where running clusters can be updated on the go with maintenance releases and feature bearing releases and therefore there is no unnecessary downtime. When there are large clusters involved then rolling updates are simply not possible in which case express updates are used. Here the downtime is there but is minimum as when the update is manual. Both rolling and express updates are free of manual updates.
Centralized security and application
The complexity of cluster security configuration and administration is greatly reduced by Ambari which is among the components of Hadoop ecosystem. The tool also helps with automated setup of advanced security constructs like Kerboros and Ranger.
Complete visibility to cluster health
Through this tool you can monitor your cluster’s health and availability. An easily customized web based dashboard has metrics that give status information for each service in the cluster like HDFS, YARN and HBase. The tool also helps with garnering and visualizing critical operational metrics for troubleshooting and analysis. Ambari predefines alerts which integrate with existing enterprise monitoring tools that monitor cluster components and hosts as per specified check intervals. Through the browser interface users can browse alerts for their clusters, search and filter alerts. They can also view and modify alert properties alert instances associated with that definition.
Metrics visualization and dashboarding
In this Apache Ambari tutorial you can know that it provides scalable low latency storage system for Hadoop component metrics. To pick the metrics of Hadoop which truly matter requires considerable expertise and understanding on how the components work with each other and with themselves. Grafana is a leading graph and dashboard builder which simplifies the metrics reviewing process. This is included with Ambari metrics along with HDP.
Extensibility and customization
Ambari lets a developer to work on Hadoop gracefully in one’s enterprise setup. Ambari leverages the large innovative community which improve upon the tool and it also eliminates vendor lock in. REST APIs along with Ambari Stacks and Views allows extensive flexibility for customization of HDP implementation.
Ambari Stacks wrap lifecycle control layer to rationalize operations over a broad set of services. This includes a consistent approach which the Ambari technology uses to manage different types of services like install, start, configure, status, stop. When provisioning, cluster install experience is rationalized across a set of services by Stacks technology. A natural extension point for operators is provided by the Stacks to plug in newly created services that can perform alongside Hadoop.
Third parties can plug in their views through Ambari views. A view is an application that is deployed into Ambari container where it offers UI capabilities to be plugged in to give out custom visualization, management and monitoring features.
How recovery is achieved in Ambari?
There are two ways as to how recovery happens in Ambari. They are namely
Based on actions: Here, every action is persisted and after a restart master checks for pending actions and reschedules them. In the database the cluster state is persisted and the master rebuilds the state machines when there is a restart. When there is a race condition when actions complete master actually crash before recording their completion. There is special consideration taken that the actions should be idempotent. The master restarts those actions that have not marked as complete or have failed in the DB. Redo logs is where we can see these persisted actions.
Based on desired state: The desired state of the cluster is persisted by the master and when there is a restart the master tries to make the cluster in the live state as per the desired state.
Scope of Apache Ambari
Apache Ambari has seen tremendous growth over the last year gaining immense popularity among the existing big data technologies. Bigger companies are increasingly turning towards this technology to manage their huge clusters in a better fashion which made it spiral upwards in the technology pecking order in 2016.
Big data innovators like Hortonworks are working on Ambari to make it more scalable to support more than 2000 or 3000 nodes seamlessly. Hortonworks recently released the latest version of Ambari 2.4 aiming at simplifying the Hadoop cluster by reducing the troubleshooting time, improving operational efficiency, gaining more visibility, etc. Definitely there is much more to come in this technology in the near future.
Who should learn Apache Ambari?
- Hadoop administrators
- Database professionals
- Mainframe and Hadoop testing professionals
- DevOps Professionals
How will Apache Ambari help in your career growth?
With the increasing popularity of big data and analytics, the professionals having a good grasp of Ambari or the related technologies have the greater possibility to grab the lucrative career opportunities in this area. From the below mentioned graph, it is clearly visible that the daily rate of jobs available for the professionals of this technology has increased dynamically over the last three months of 2016.
Therefore learning Ambari will certainly be a good choice for building career as there will be a huge skill gap going to be formed in the coming years and having knowledge in the proper technology will be your token for success.
Get the most Industry-recognized Online Apache Ambari Training Course here!