Courses Info
Course Highlights
- All the training would be provided by Industry Experts who already works on Big Data Hadoop platform.
- Backup Class in case you miss any session.
- Theory + Practical Training along with case studies in order to get better understanding of concepts.
- Complete course material with no extra cost.
- Free doubt clearing session after completion of the training.
- Resume building by experts.
- Feedback form filled by candidates after every class in order to maintain highest level of quality standards
- Introduction to Big-data
- Introduction to Hadoop
- Business problems / Challenges with Big data
- Scenarios where Hadoop is used
- Overview of batch Processing and real-time data analytics using Hadoop
- Hadoop vendors – Apache, Cloudera, Hortonworks
- Hadoop versions – Hadoop 1.x and Hadoop 2.x
- Hadoop services – HDFS, MapReduce, YARN
- Introduction to Hadoop ecosystem components (Hive, HBase, Pig, Sqoop, Flume, Zookeeper, Oozie, Kafka, Spark)
- Linux VM installation on system for Hadoop cluster using Oracle Virtual Box
- Preparing nodes for Hadoop and VM settings
- Install Java and configure passwordless SSH across nodes
- Basic Linux commands
- Hadoop 1.x Single node deployment
- Hadoop Daemons – NameNode, JobTracker, DataNode, TaskTracker, Secondary NameNode
- Hadoop configuration files and running
- Important Web URls and Logs for Hadoop
- Run HDFS and Linux commands
- Hadoop 1.x mutli-mode deployment
- Run sample jobs in Hadoop single and multi-node clusters
- HDFS Design Goals
- Understand Blocks and how to configure block size
- Block replication and replication factor
- Introduction to MapReduce
- MapReduce Architecture
- Understanding the concept of Mappers & Reducers
- Anatomy of MapReduce Program
- Phases of a MapReduce program
- Data-types in Hadoop MapReduce
- Driver, Mapper and Reducer classes
- InputSplit and RecordReader
- InputFormat and OutputFormat in Hadoop
- Concepts of Combiner and Partitioner
- Running and Monitoring MapReduce jobs
- Writing your own MapReduce job using MapReduce API
- Hadoop 1.x Limitations
- Design Goals for HAdoop 2.x
- Introduction to Hadoop 2.x
- Introduction to YARN
- Components of YARN – ResourceManager, NodeManager, ApplicationMaster
- Deprecated properties
- Hadoop 2.x Single node deployment
- Hadoop 2.x mutli-mode deployment
- Introduction to HDFS Federation
- Understand Nameservice ID and block pools
- Introduction to HDFS High Availability
- Failover mechanisms in Hadoop 1.x
- Concept of Active and StandBy NameNode
- Configuring Journal Nodes and avoiding split brain scenario
- Automatic and manual fail-over techniques in HA using Zookeeper and ZKFC
- HDFS HAadmin commands
- YARN Architecture
- YARN Components – ResourceManager, NodeManager, JobHistoryServer, Application TimelineServer, MRApplicationMaster
- YARN Application execution flow
- Running and Monitoring YARN Applications
- Understand and configure Capacity/Fair Schedulers in YARN
- Define and configure Queues
- JobHistory Server / Application Timeline server
- YARN REST API
- Writing and executing YARN applications
- Introduction to Apache Zookeeper
- Zookeeper stand-alone installation
- Zookeeper clustered installation
- Understand Znode and Ephemeral nodes
- Manage Znodes using Java API
- Zookeeper four letter word commands
- Introduction to Hive
- Hvie Architecture
- Components – Metastore, HiveServer2, Beeline, HiveCli, Hive WebInterface
- Installation and configuration
- Metastore service
- DDLs and DMLs
- SQL – Select, Filter, Join, Group By
- Hive Partitions and buckets in Hive
- Hive User Defined Funcitons
- Introduction to HCatalog
- Install and configure HCatalog services
- Introduction to Pig
- Pig installation
- Accessing Pig Grunt shell
- Pig Data Types
- Pig commands
- Pig Relational Operators
- Pig User Defined Funcitons
- Configure Pig to use HCatalog
- Introduction to Sqoop
- Sqoop Architecture and Installation
- Import data using Sqoop in HDFS
- Import all tables in Sqoop
- Import tables directly in Hive
- Export data from HDFS
- Introduction to Flume
- Flume Architecture and Installation
- Define Flume agent – Sink, Source and Channel
- Flume Use Cases
- Introduction to Oozie
- Oozie Architecture
- Oozie server installation and configurations
- Design Workflows, Coordinator Jobs, Bundle Jobs in Oozie
- Introduction to HBase
- HBase Architecture
- HBase components — HBase Master and RegionServers
- HBase installation and configurations
- Create sample tables and queries on HBase
- Real-time data Analytics
- Introduction to Spark / Storm / Kafka
- Cloudera Manager
- Apache Ambari
- Ganglia
- JMX monitoring and Jconsole
- Hadoop User Experience (HUE)