Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Hadoop YARN Architecture
Programming

Hadoop YARN Architecture

Last Updated on June 13, 2020 by Editorial Team

Author(s): Vivek Chaudhary

Programming

YARN stands for Yet Another Resource Negotiator. YARN became part of Hadoop ecosystem with the advent of Hadoop 2.x, and with it came the major architectural changes inΒ Hadoop.

YARN

YARN manages resources in the cluster environment. That’s it? Didn’t we had any resource manager before Hadoop 2.x? Of course, we had a resource manager before Hadoop 2.x and it was called JobΒ Tracker.

So what is JobΒ Tracker?

JobTracker (JT) use to manage both cluster resources and perform MapR or MapReduce job execution which means Data processing. JT configures and monitors every running task. If a task fails, it reallocates a new slot for the task to start again. On completion of tasks, it releases resources and cleans up theΒ memory.

JT uses to perform a lot of tasks and this approach had some drawbacks as well. I haven’t worked on Hadoop 1.x but tried to list some of themΒ below.

Drawbacks of the above approach:

  1. It has a single component: JobTracker to perform many activities like Resource Management, Job Scheduling, Job Monitoring, Re-scheduling Jobs, etc, which puts lots of pressure on a single component.
  2. JobTracker is the single point of failure, which makes JT highly available resource, so if a JT fails then all tasks will be rebooted.
  3. Static Resource Allocation, since map and reduce slots are predefined and reserved, they cant be used for other applications even if slots are sittingΒ idle.

Above are some of the major drawbacks of the Hadoop 1.x way ofΒ working.

So the next question that arrives in mind is, how does YARN solve the purpose? YARN separates the Resource Management Layer and Data Processing components layer.

MapR1 vsΒ MapR2

In MapR1 all the task of Resource Management and Processing was done by JobTracker but with the release of Hadoop 2.x, both of the layers have been divided and for Resource Management layer we have YARN. Now Hadoop 2.x says, for Data Processing use MapR, Spark, Tez, and other available Data Processing Framework, whereas YARN takes care of Resource negotiation.

Hadoop 2.x has decoupled the MapR component into different components and eventually increased the capabilities of the whole ecosystem, resulting in Higher Availablity, and Higher Scalability.

YARN and its components

YARN comprises of two components: Resource Manager and NodeΒ Manager.

Detailed Architecture:

Let’s understand the different components:

Resource Manager

Spark works on Master-Slave architecture and Resource Manager is present at the Master Node. RM is the prime authority and helps in managing resources such as RAM, CPU usage, Network Bandwidth, etc. across different jobs. Resource Manager maintains the list of applications running and list of available resources.

Resource Manager has two components: Scheduler and Application Manager.

  1. Scheduler:

The scheduler takes care of the resource allocation part on behalf of the resource manager. It allocates resources to the various MapR or Spark applications subjected to the availability of resources.

Scheduler purely takes care of Resource allocation and doesn’t involve in any other activity like monitoring or tracking status of jobΒ etc.

2. Application Manager:

Application Manager launches application-specific Application Master in slaveΒ node.

Note: Application Manager and Application master are different components

Application Manager negotiates for the container to launch Application Master and helps in relaunching duringΒ failure.

In nutshell, when Resource Manager accepts a new MapR or Spark application submission, one of the initial decisions Scheduler takes is to select a container to launch Application Master for that particular application and the Application manager takes care of launching theΒ same.

Node Manager

Before Hadoop 2.x, there use to be a fixed number of slots to execute Map and Reduce jobs, but after the Hadoop 2.x concept of slots is replaced by dynamic creation/allocation of resource containers.

A container refers to the collection of resources such as CPU, RAM, DISK or Hard disk, and network IO, similar to aΒ server.

A Node Manager is the per-machine framework agent responsible to hold containers, monitor their resource usage (CPU, RAM, DISK, etc.) and reports it back to Scheduler present in Resource Manager. Node Manager is present on slaveΒ systems.

Node Manager performs Health Check of resources on a scheduled basis, if any health check fails Node Manager marks that node as unhealthy and reports it back to ResourceΒ Manager.

Resource Manager + Node Manager = Computation Framework

Application Master

Application master is application-specific or per-application and is launched by the Application manager.

Application Master negotiates resources from Resource Manager and works with Node Manager to execute and monitor tasks. Application Master is responsible for the whole lifecycle of the application.

Application Master sends a resource request to the Resource Manager and asks for containers to run application tasks. After receiving a request from the application master, the resource manager validates the resource requirements and checks for the availability of resources and grants a container to suffice the resourceΒ request.

After the container is granted, the application master will request Node Manager to utilize the resources and launch the application-specific tasks.

Application Master monitors the progress of an application and its tasks. If a failure happens then it asks for a new container to launch the task and reports theΒ failure.

After the execution of the application is completed, Application Master shuts itself down and releases its container. Hence marks the execution completion.

That’s all with the YARN and its various components.

Summary:

Β· What isΒ YARN

Β· Hadoop pre-2.x and post 2.x comparison

Β· How Yarn fixed the pre 2.xΒ issues

Β· Components such as Resource Manager, Node Manager, Application Master, and their functionality

Β· How Application MasterΒ works


Hadoop YARN Architecture was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓