Introduction to Apache Ambari
It provides a highly interactive dashboard that allows administrators to visualize the progress and status of every application running over the Hadoop cluster.
Its flexible and scalable user interface allows a range of tools such as Pig, MapReduce, Hive, etc. to be installed on the cluster and administers their performances in a user-friendly fashion. Some of the key features of this technology can be highlighted as:
- Instantaneous insight into the health of the Hadoop cluster using preconfigured operational metrics
- User-friendly configuration providing an easy step-by-step guide for installation
- Installation of Apache Ambari is possible through Hortonworks Data Platform (HDP)
- Monitoring dependencies and performances by visualizing and analyzing jobs and tasks
- Authentication, authorization, and auditing by installing Kerberos-based Hadoop clusters
- Flexible and adaptive technology fitting perfectly in the enterprise environment
How is Ambari different from ZooKeeper?
The above description might have confused you as ZooKeeper performs similar kinds of tasks. But, there is a huge difference between tasks performed by these two technologies if looked closely. The following comparison will give you a clearer idea:
Basis of Difference | Apache Ambari | Apache ZooKeeper |
Basic Task | Monitoring, provisioning, and managing Hadoop clusters | Maintaining configuration information, naming, and synchronizing clusters |
Nature | Web interface | Open-source server |
Status Maintenance | Status maintained through APIs | Status maintained through znodes |
Therefore, though these tasks may seem similar from a distance, actually, these two technologies perform different tasks on the same Hadoop cluster making it agile, responsive, scalable, and fault-tolerant in a big way. As an Apache Ambari Administrator, you will be creating and managing Ambari users and groups. You can also import users and groups from LDAP systems into Ambari.
Get 100% Hike!
Master Most in Demand Skills Now!
How did Apache Ambari come into existence?
The genesis of Apache Ambari traces back to the emergence of Hadoop when its distributed and scalable computing took the world by storm. Since the inception of Hadoop, more and more technologies were incorporated into its existing infrastructure. Gradually, Hadoop got overloaded, and it became difficult to maintain multi-node clusters and applications simultaneously. That is when Apache Ambari came into the picture to make distributed computing easier.
Currently, it is one of the leading projects running under Apache Software Foundation.
Installing Apache Ambari
To build up the cluster, the Install Wizard needs to know some general information regarding the cluster to which you should supply the fully qualified domain name (FQDN) of your each host.
Additionally, the wizard needs access to the private key file the user created in Set Up Passwordless SSH. This is used to locate all the hosts in the system and to access and interact with them securely.
1. The list of hostnames, one per line, can be entered using the Target Hosts text box
2. Select Provide Your SSH Private Key if you want Ambari to automatically install the Ambari Agent on all your hosts using SSH. In the Host Registration Information, you can use the Choose File button to find the private key file matching the public key installed earlier on all your hosts. Alternatively, you can cut and paste the key into the text box manually
3. Select Perform Manual Registration if you do not wish Ambari to automatically install the Ambari Agent
Go through this short video from Intellipaat elucidating on Apache Ambari:
Apache Ambari Architecture
Ambari provides intuitive and REST APIs that automate operations in the Hadoop cluster. Its consistent and secure interface allows it to be fairly efficient in operational control. Its easy and user-friendly interface efficiently diagnoses the health of the Hadoop cluster using an interactive dashboard.
To have a better understanding of how Ambari works, let’s look at the detailed architecture of Apache Ambari in the following diagram:
Apache Ambari follows a master–slave architecture where the master node instructs the slave nodes to perform certain actions and report back the state of every action. The master node is responsible for keeping track of the state of the infrastructure. To do this, the master node uses a database server, which can be configured during the setup time.
Applications of Apache Ambari Core
- Ambari Server
- Ambari Agent
- Ambari Web UI
- Database
1. Ambari Server
The entry point for all administrative activities on the master server is known as Ambari Server. It is a shell script. Internally, this script uses Python code, ambari-server.py, and routes all requests to it.
Ambari Server consists of several entry points that are available when passed different parameters to the Ambari Server program. They are:
- Daemon management
- Software upgrade
- Software setup
- LDAP (Lightweight Direct Access Protocol)/PAM (Pluggable Authentication Module) /Kerberos management
- Ambari backup and restore
- Miscellaneous options
2. Ambari Agent
Ambari Agent runs on all the nodes that you want to manage with Ambari. This program periodically sends heartbeats to the master node. By using Ambari Agent, Ambari Server executes many tasks on the servers.
3. Ambari Web User Interface
Ambari Web UI is one of the powerful features of Apache Ambari. The web application is deployed through the server of Ambari program which is running on the master host exposed on port 8080. This application is protected by authentication. You can access and then control and view all aspects of your Hadoop cluster, once you log in to the web portal.
4. Database
Ambari supports multiple RDBMS (Relational Database Management Systems) to keep track of the state of the entire Hadoop infrastructure. You can choose the database you want to use during the setup of Ambari. Ambari supports these following databases at the time of writing:
- PostgreSQL
- Oracle
- MySQL or MariaDB
- Embedded PostgreSQL
- Microsoft SQL Server
- SQL Anywhere
- Berkeley DB
This technology is preferred by the Big Data Developers as it is quite handy and comes with a step-by-step guide allowing easy installation on the Hadoop cluster. Its preconfigured key operational metrics provide a quick look into the health of the Hadoop core, i.e., HDFS and MapReduce, along with the additional components such as Hive, HBase, HCatalog, etc. Ambari sets up a centralized security system by incorporating Kerberos and Apache Ranger into the architecture. The RESTful APIs monitor the information and integrate the operational tools. Its user-friendliness and interactivity have made it enter the list of top 10 open-source technologies for the Hadoop cluster.
Features of Apache Ambari
Following are some of the features of Ambari. Read on to understand how the tool is expertly used in the Big Data.
Platform independent: Apache Ambari runs in Windows, Mac, and many other platforms as it architecturally supports any hardware and software systems. Other platforms on which Ambari runs are Ubuntu, SLES, RHEL, etc. Those components which are dependent on a platform like Yum, RPM packages, and Debian packages ought to be plugged with well-defined interfaces.
Pluggable component: Any current Ambari application can be customized. Any specific tools and technologies ought to be encapsulated by pluggable components. The goal of pluggability doesn’t encompass inter-component standardization.
Version management and upgrade: Ambari itself maintains versions and hence there is no need for external tools like Git. It is fairly easy to upgrade any Ambari application, or Ambari itself.
Extensibility: You can extend the functionality of the existing Ambari applications by simply adding different view components.
Failure recovery: Assume, you are working on an Ambari application and something goes wrong. Then, the system should gracefully recover from it. If you are a Windows user, you can relate to this well. You might have faced this issue while working on a Word file, when suddenly there is a power outage and your system gets switched off. When you turn on the system, there will be an autosaved version of the document as you run the MS Word.
Security: Apache Ambari comes with robust security, and it can sync with LDAP over the active directory.
Benefits of Using Apache Ambari
This is given with respect to Hortonworks Data Platform (HDP). Ambari eliminates the need for the manual tasks that used to watch over Hadoop operations. It gives a simple and secure platform for provisioning, managing, and monitoring HDP deployments. Ambari is an easy to use Hadoop management UI and is solidly backed by REST APIs. The benefits of using Apache Ambari are mentioned below.
Simplified installation, configuration, and management of the Hadoop cluster: Ambari can efficiently create Hadoop clusters at scale. Its wizard-driven approach lets the configuration be automated as per the environment so that the performance is optimal. Master–slave and client components are assigned to configuring services. It is also used to install, start, and test the cluster.
Configuration blueprints give recommendations to those seeking a hands-on approach. The blueprint of an ideal cluster is stored. How it is provisioned is clearly traced. This is then used to automate the creation of successive clusters without any user interaction. Blueprints also preserve and ensure the application of best practices across different environments.
Ambari provides a rolling upgrade feature where running clusters can be updated on the go with maintenance releases and feature-bearing releases, and therefore there is no unnecessary downtime. When there are large clusters involved, rolling updates are simply not possible, in which case express updates are used. Unlike the previous case, here, there is downtime involved but is minimum as when the update is manual. Both rolling and express updates are free of manual updates.
Centralized security and application: The complexity of cluster security configuration and administration is greatly reduced by Ambari which is among the components of the Hadoop ecosystem. The tool also helps the automated setup of the advanced security constructs like Kerberos and Ranger.
Complete visibility to your cluster’s health: Through this tool, you can monitor your cluster’s health and availability. An easily customized web-based dashboard has metrics that give status information for each service in the cluster like HDFS, YARN, and HBase. The tool also helps with garnering and visualizing critical operational metrics for troubleshooting and analysis. Ambari predefines alerts that are integrated with the existing enterprise monitoring tools that monitor cluster components and hosts as per the specified check intervals. Through the browser interface, users can browse alerts for their clusters, search, and filter alerts. They can also view and modify alert properties and alert instances.
Metrics visualization and dashboarding: It provides a scalable low-latency storage system for Hadoop component metrics. Picking the metrics of Hadoop which truly matter requires considerable expertise and understanding of how the components work with each other. Grafana is a leading graph and dashboard builder that simplifies the metrics reviewing process. This is included with Ambari Metrics, along with HDP.
Extensibility and customization: Ambari lets a developer work on Hadoop gracefully in his/her enterprise setup. Ambari leverages the large innovative community which improves upon the tool and it also eliminates vendor lock-in. REST APIs along with Ambari Stacks and Views allows extensive flexibility for customization of HDP implementation.
Ambari Stacks wraps the life cycle control layer used to rationalize operations over a broad set of services. This includes a consistent approach that Ambari uses to manage different types of services like install, start, configure, status, and stop. When provisioning, cluster install experience is rationalized across a set of services by Stacks technology. A natural extension point for operators is provided by Stacks to plug in newly created services that can perform alongside Hadoop.
Third parties can plug in their views through Ambari Views. A view is an application that is deployed into an Ambari container where it offers UI capabilities to be plugged in to give out custom visualization, management, and monitoring features.
How recovery is achieved in Ambari?
There are two ways as to how recovery happens in Ambari. Let’s look into them:
Based on actions: Here, every action is persisted and after a restart the master checks for pending actions and reschedules them. In the database, the cluster state is persisted, and the master rebuilds the state machines when there is a restart. When there is a race condition when actions complete master actually crash before recording their completion. There is special consideration taken that the actions should be idempotent. The master restarts those actions that are not marked as complete or have failed in the DB. Redo log is where you can see these persisted actions.
Based on the desired state: The desired state of the cluster is persisted by the master, and when there is a restart the master tries to make the cluster in the live state as per the desired state.
Scope of Apache Ambari
Apache Ambari has seen tremendous growth over the last year gaining immense popularity among the existing Big Data technologies. Bigger companies are increasingly turning toward this technology to manage their huge clusters in a better fashion which has made it grow extremely.
Big Data innovators like Hortonworks are working on Ambari to make it more scalable to support more than 2,000 or 3,000 nodes seamlessly. Hortonworks recently released the latest version of Ambari 2.4 aiming at simplifying the Hadoop cluster by reducing the troubleshooting time, improving operational efficiency, gaining more visibility, etc. Definitely, there is much more to come in this technology in the near future.
Who should learn Apache Ambari?
- Hadoop Administrators
- Database Professionals
- Mainframe and Hadoop Testing Professionals
- DevOps Professionals
How will Apache Ambari help in your career growth?
With the increasing popularity of Big Data Analytics, and professionals having a good grasp of Ambari or its related technologies have a greater possibility to grab lucrative career opportunities in this area. From the below-given graph, it is clearly visible that the percentage of the number of jobs available daily for the professionals of this technology has increased dynamically.
Therefore, learning Apache Ambari will certainly be the best decision you make for enriching your career!