f

Certpine's Hadoop Administration training helps you gain expertise to maintain large and complex Hadoop Clusters by Planning, Installation, Configuration, Monitoring & Tuning. Understand Security implementation using Kerberos and Hadoop v2 features using real-time use cases.

  • 128K + satisfied learners. Reviews
Weekend $314
Weekdays

Training Features

Instructor-led Sessions

24 Hours of Online Live Instructor-led Classes. Weekend class : 8 sessions of 3 hours. Weekday class : 12 sessions of 2 hours each.


Real-life Case Studies

Live project based on any of the selected use cases, involving Industry concepts of Hadoop Administration.


Assignments

Each class has practical assignments which shall be finished before the next class and helps you to apply the concepts taught during the class.


Lifetime Access

You get lifetime access to Learning Management System (LMS) where presentations, quizzes, installation guide & class recordings are there.


24 x 7 Expert Support

Live project based on any of the selected use cases, involving Industry concepts of Hadoop Administration.


Certification

Certpine certifies you in Hadoop Administration Course based on the project reviewed by our expert panel.


Forum

We have a community forum for all our customers that further facilitates learning through peer interaction and knowledge sharing.


Course Description

Hadoop Administration training from Certpine provides participants an expertise in all the steps necessary to operate and maintain a Hadoop cluster, i.e. from Planning, Installation and Configuration through load balancing, Security and Tuning. The Certpine’s training will provide hands-on preparation for the real-world challenges faced by Hadoop Administrators. The course curriculum follows Apache Hadoop distribution.

During the Hadoop Administration Online training, you'll master:

i) Hadoop Architecture, HDFS, Hadoop Cluster and Hadoop Administrator's role

ii) Plan and Deploy a Hadoop Cluster

iii) Load Data and Run Applications

iv) Configuration and Performance Tuning

v) How to Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster

vi) Cluster Security, Backup and Recovery 

vii) Insights on Hadoop 2.0, Name Node High Availability, HDFS Federation, YARN, MapReduce v2

viii) Oozie, Hcatalog/Hive, and HBase Administration and Hands-On Project

Big Data & Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015 - Forbes

McKinsey predicts that by 2018 there will be a shortage of 1.5M data experts - Mckinsey Report

Average Salary of Big Data Hadoop Developers is $110k (Payscale salary data)

The Hadoop Administration course is best suited to professionals with IT Admin experience such as:

i) Linux / Unix Administrator

ii) Database Administrator

iii) Windows Administrator

iv) Infrastructure Administrator

v) System Administrator

You can check a blog related to Top 5 Hadoop Admin Tasks

Having Cloud Computing skills is a highly preferred learning path after the Hadoop Administration training. Check out the upgraded AWS Course details.

This course only requires basic Linux knowledge. Certpine also offers a complementary course on "Linux Fundamentals" to all the Hadoop Administration course participants.

 

 

Project

For your practical work, we will help you set up a virtual machine in your system. For VM installation, 8GB RAM is required. You can also create an account with AWS EC2 and use 'Free tier usage' eligible servers to create your Hadoop Cluster on AWS EC2. This is the most preferred option and Certpine provides you a step-by-step procedure guide which is available on the LMS. Additionally, our 24/7 expert support team will be available to assist you with any queries.

Towards end of the Course, you will get an opportunity to work on a live project, that will use the different Hadoop ecosystem components to work together in a Hadoop implementation to solve big data problems.
 
1. Setup a minimum 2 Node Hadoop Cluster
Node 1 - Namenode, JobTracker,datanode, tasktracker
Node 2 – Secondary namenode, datanode, tasktracker
 
2. Create a simple text file and copy to HDFS
Find out the location of the node to which it went.
Find in which data node the output files are written.
 
3. Create a large text file and copy to HDFS with a block size of 256 MB. Keep all the other files in default block size and find how block size has an impact on the performance.
 
4. Set a spaceQuota of 200MB for projects and copy a file of 70MB with replication=2
Identify the reason the system is not letting you copy the file?
How will you solve this problem without increasing the spaceQuota?
 
5. Configure Rack Awareness and copy the file to HDFS
Find its rack distribution and identify the command used for it.
Find out how to change the replication factor of the existing file. 
 
The final certification project is based on real world use cases as follows:
 
Problem Statement 1:
1. Setup a Hadoop cluster with a single node or a 2 node cluster with all daemons like namenode, datanode, jobtracker, tasktracker, a secondary namenode that must run in the cluster with block size = 128MB.
2. Write a Namespace ID for the cluster and create a directory with name space quota as 10 and a space quota of 100MB in the directory.
3. Use the distcp command to copy the data to the same cluster or a different cluster, and create the list of data nodes participating in the cluster. 
 
Problem statement 2:
1. Save the namespace of the Namenode, without using the secondary namenode, and ensure that the edit file merge, without stopping the namenode daemon.
2. Set include file, so that no other nodes can talk to the namenode.
3. Set the cluster re-balancer threshold to 40%. 
4. Set the map and reduce slots to s4 and 2 respectively for each node.

Learning Objectives - In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will learn about YARN concepts in MapReduce.

Topics - MapReduce Use Cases, Traditional way Vs MapReduce way, Why MapReduce, Hadoop 2.x MapReduce Architecture, Hadoop 2.x MapReduce Components, YARN MR Application Execution Flow, YARN Workflow, Anatomy of MapReduce Program, Demo on MapReduce.

Curriculum

Learning Objectives - In this module, you will understand what is big data and Apache Hadoop. You will also learn how Hadoop solves the big data problems, about Hadoop cluster architecture, its core components & ecosystem, Hadoop data loading & reading mechanism and role of a Hadoop cluster administrator.

Topics - Introduction to big data, limitations of existing solutions, Hadoop architecture, Hadoop components and ecosystem, data loading & reading from HDFS, replication rules, rack awareness theory, Hadoop cluster administrator: Roles and responsibilities.

Learning Objectives - In this module, you will understand different Hadoop components, understand working of HDFS, Hadoop cluster modes, configuration files, and more. You will also understand the Hadoop 2.0 cluster setup and configuration, setting up Hadoop Clients using Hadoop 2.0 and resolve problems simulated from real-time environment.

Topics - Hadoop server roles and their usage, Hadoop installation and initial configuration, deploying Hadoop in a pseudo-distributed mode, deploying a multi-node Hadoop cluster, Installing Hadoop Clients, understanding the working of HDFS and resolving simulated problems.

Learning Objectives – In this module you will understand the working of the secondary namenode, working with Hadoop distributed cluster, enabling rack awareness, maintenance mode of Hadoop cluster, adding or removing nodes to your cluster in an adhoc and recommended way, understand the MapReduce programming model in context of Hadoop administrator and schedulers.

Topics - Understanding secondary namenode, working with Hadoop distributed cluster, Decommissioning or commissioning of nodes, understanding MapReduce, understanding schedulers and enabling them.

Learning Objectives - In this module, you will understand the day to day cluster administration tasks, balancing data in a cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters, safeguarding your meta data and doing metadata recovery or manual failover of NameNode recovery, learn how to restrict the usage of HDFS in terms of count and volume of data, and more.

Topics – Key admin commands like Balancer, Trash, Import Check Point, Distcp, data backup and recovery, enabling trash, namespace count quota or space quota, manual failover or metadata recovery.

Learning Objectives - In this module, you will gather insights around cluster planning and management, learn about the various aspects one needs to remember while planning a setup of a new cluster, capacity sizing, understanding recommendations and comparing different distributions of Hadoop, understanding workload and usage patterns and some examples from the world of big data.

Topics - Planning a Hadoop 2.0 cluster, cluster sizing, hardware, network and software considerations, popular Hadoop distributions, workload and usage patterns, industry recommendations.

Learning Objectives:
     Get to know about the Hadoop cluster monitoring and security concepts. You will also learn how to secure a Hadoop cluster with Kerberos.
Topics:
  • Monitoring Hadoop Clusters
  • Hadoop Security System Concepts
  • Securing a Hadoop Cluster With Kerberos
  • Common Misconfigurations
  • Overview on Kerberos
  • Checking log files to understand Hadoop clusters for troubleshooting

Learning Objectives: 
    In this module, you will learn about the Cloudera Hadoop 2.x and various features of it.
Topics:
  • Visualize Cloudera Manager
  • Features of Cloudera Manager
  • Build Cloudera Hadoop cluster using CDH
  • Installation choices in Cloudera
  • Cloudera Manager Vocabulary
  • Cloudera terminologies
  • Different tabs in Cloudera Manager
  • What is HUE?
  • Hue Architecture
  • Hue Interface
  • Hue Features

Learning Objectives:
     Get to know the working and installation of Hadoop ecosystem components such as Pig and Hive.
Topics:
  • Explain Hive
  • Hive Setup
  • Hive Configuration
  • Working with Hive
  • Setting Hive in local and remote metastore mode
  • Pig setup
  • Working with Pig

Learning Objectives:
     In this module, you will learn about the working and installation of HBase and Zookeeper.
Topics:
  • What is NoSQL Database
  • HBase data model
  • HBase Architecture
  • MemStore, WAL, BlockCache
  • HBase Hfile
  • Compactions
  • HBase Read and Write
  • HBase balancer and hbck
  • HBase setup
  • Working with HBase
  • Installing Zookeeper

Learning Objectives:
     In this module, you will get to know about Apache Oozie which is a server-based workflow scheduling system to manage Hadoop jobs.
Topics:
  • Oozie overview
  • Oozie Features
  • Oozie workflow, coordinator and bundle
  • Start, End and Error Node
  • Action Node
  • Join and Fork
  • Decision Node
  • Oozie CLI
  • Install Oozie

Learning Objectives:
     Learn about the different data ingestion tools such as Sqoop and Flume.
Topics:
  • Types of Data Ingestion
  • HDFS data loading commands
  • Purpose and features of Sqoop
  • Perform operations like, Sqoop Import, Export and Hive Import
  • Sqoop 2
  • Install Sqoop
  • Import data from RDBMS into HDFS
  • Flume features and architecture
  • Types of flow
  • Install Flume
  • Ingest Data From External Sources With Flume
  • Best Practices for Importing Data

Learning Objectives - In this module, you will learn Advance MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and how to deal with complex MapReduce programs.

Topics - Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format.

FAQ's

1866-357-4555 (TOLL FREE NO)

"You will never miss a lecture at Certpine You can choose either of the two options:

  • View the recorded session of the class available in your LMS.
  • You can attend the missed session, in any other live batch."

Post-enrolment, the LMS access will be instantly provided to you and will be available for lifetime. You will be able to access the complete set of previous class recordings, PPTs, PDFs, assignments. Moreover the access to our 24x7 support team will be granted instantly as well. You can start learning right away.

Yes, the access to the course material will be available for lifetime once you have enrolled into the course.

We have limited number of participants in a live session to maintain the Quality Standards. So, unfortunately participation in a live class without enrollment is not possible. However, you can go through the sample class recording and it would give you a clear insight about how are the classes conducted, quality of instructors and the level of interaction in a class.

All the instructors at Certpine are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by Certpine for providing an awesome learning experience to the participants.

To help you in this endeavor, we have added a resume builder tool in your LMS. Now, you will be able to create a winning resume in just 3 easy steps. You will have unlimited access to use these templates across different roles and designations. All you need to do is, log in to your LMS and click on the "create your resume" option.

All the instructors at Certpine are practitioners from the Industry with minimum 10-12 yrs of relevant IT experience. They are subject matter experts and are trained by Certpine for providing an awesome learning experience.

Just give us a CALL at +91 88808 62004 OR email at sales@certpine.co.US Toll free number is 1800 275 9730.

Certification