Big Data Hadoop Training Institute in Gurgaon and Delhi NCR

Courses Info

Course Highlights

All the training would be provided by Industry Experts who already works on Big Data Hadoop platform.
Backup Class in case you miss any session.
Theory + Practical Training along with case studies in order to get better understanding of concepts.
Complete course material with no extra cost.
Free doubt clearing session after completion of the training.
Resume building by experts.
Feedback form filled by candidates after every class in order to maintain highest level of quality standards

Introduction to Hadoop and Big-data

Introduction to Big-data
Introduction to Hadoop
Business problems / Challenges with Big data
Scenarios where Hadoop is used
Overview of batch Processing and real-time data analytics using Hadoop
Hadoop vendors – Apache, Cloudera, Hortonworks
Hadoop versions – Hadoop 1.x and Hadoop 2.x
Hadoop services – HDFS, MapReduce, YARN
Introduction to Hadoop ecosystem components (Hive, HBase, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark)

Cluster setup (Hadoop 1.x)

Linux VM installation on system for Hadoop cluster using Oracle Virtual Box
Preparing nodes for Hadoop and VM settings
Install Java and configure passwordless SSH across nodes
Basic Linux commands
Hadoop 1.x Single node deployment
Hadoop Daemons – NameNode, JobTracker, DataNode, TaskTracker, Secondary NameNode
Hadoop configuration files and running
Important Web URls and Logs for Hadoop
Run HDFS and Linux commands
Hadoop 1.x mutli-mode deployment
Run sample jobs in Hadoop single and multi-node clusters

HDFS Concepts

HDFS Design Goals
Understand Blocks and how to configure block size
Block replication and replication factor

MapReduce Concepts

Introduction to MapReduce
MapReduce Architecture
Understanding the concept of Mappers & Reducers
Anatomy of MapReduce Program
Phases of a MapReduce program
Data-types in Hadoop MapReduce
Driver, Mapper and Reducer classes
InputSplit and RecordReader
InputFormat and OutputFormat in Hadoop
Concepts of Combiner and Partitioner
Running and Monitoring MapReduce jobs
Writing your own MapReduce job using MapReduce API

Cluster setup (Hadoop 2.x)

Hadoop 1.x Limitations
Design Goals for HAdoop 2.x
Introduction to Hadoop 2.x
Introduction to YARN
Components of YARN – ResourceManager, NodeManager, ApplicationMaster
Deprecated properties
Hadoop 2.x Single node deployment
Hadoop 2.x mutli-mode deployment

HDFS High Availability and Federation

Introduction to HDFS Federation
Understand Nameservice ID and block pools
Introduction to HDFS High Availability
Failover mechanisms in Hadoop 1.x
Concept of Active and StandBy NameNode
Configuring Journal Nodes and avoiding split brain scenario
Automatic and manual fail-over techniques in HA using Zookeeper and ZKFC
HDFS HAadmin commands

YARN – Yet Another Resource Negotiator

YARN Architecture
YARN Components – ResourceManager, NodeManager, JobHistoryServer, Application TimelineServer, MRApplicationMaster
YARN Application execution flow
Running and Monitoring YARN Applications
Understand and configure Capacity/Fair Schedulers in YARN
Define and configure Queues
JobHistory Server / Application Timeline server
YARN REST API
Writing and executing YARN applications

Apache Zookeeper

Introduction to Apache Zookeeper
Zookeeper stand-alone installation
Zookeeper clustered installation
Understand Znode and Ephemeral nodes
Manage Znodes using Java API
Zookeeper four letter word commands

Apache Hive

Introduction to Hive
Hvie Architecture
Components – Metastore, HiveServer2, Beeline, HiveCli, Hive WebInterface
Installation and configuration
Metastore service
DDLs and DMLs
SQL – Select, Filter, Join, Group By
Hive Partitions and buckets in Hive
Hive User Defined Funcitons
Introduction to HCatalog
Install and configure HCatalog services

Apache Pig

Introduction to Pig
Pig installation
Accessing Pig Grunt shell
Pig Data Types
Pig commands
Pig Relational Operators
Pig User Defined Funcitons
Configure Pig to use HCatalog

Apache Sqoop

Introduction to Sqoop
Sqoop Architecture and Installation
Import data using Sqoop in HDFS
Import all tables in Sqoop
Import tables directly in Hive
Export data from HDFS

Apache Flume

Introduction to Flume
Flume Architecture and Installation
Define Flume agent – Sink, Source and Channel
Flume Use Cases

Apache Oozie

Introduction to Oozie
Oozie Architecture
Oozie server installation and configurations
Design Workflows, Coordinator Jobs, Bundle Jobs in Oozie

Apache HBase

Introduction to HBase
HBase Architecture
HBase components — HBase Master and RegionServers
HBase installation and configurations
Create sample tables and queries on HBase

Apache Spark / Storm / Kafka
Cluster Monitoring and Management tools

Apache Spark / Storm / Kafka

Real-time data Analytics
Introduction to Spark / Storm / Kafka

Cluster Monitoring and Management tools

Cloudera Manager
Apache Ambari
Ganglia
JMX monitoring and Jconsole
Hadoop User Experience (HUE)

Copyright © 2008 CTC | All Rights Reserved