Hadoop Course in Ameerpet:

Hadoop Course Overview:

Hadoop Development course teaches the skill set needed for the learners how to setup Hadoop Cluster, how to store Big Data using Hadoop( HDFS) and how to reuse dissect the Big Data using Chart- Reduce Programming or by using other Hadoop ecosystems. Attend Hadoop Training rally by Real- Time Expert.

Hadoop Training Course Prerequisites

Basic Unix Commands
Core Java (OOPS Concepts, Collections , Exceptions ) for Map Reduce Programming
SQL Query knowledge for Hive Queries

Hadoop Course System Requirements

Any Linux flavor OS (Ex: Ubuntu/Cent OS/Fedora/Red Hat Linux) with 4 GB RAM (minimum), 100 GB HDD
Java 1.6+
Open-SSH server & client
MYSQL Database
Eclipse IDE
VMWare (To use Linux OS along with Windows OS)

Hadoop Course Content

Introduction to Hadoop

High Availability
Scaling
Advantages and Challenges

Introduction to Big Data

What is Big data
Big Data opportunities,Challenges
Characteristics of Big data

Introduction to Hadoop

Hadoop Distributed File System
Comparing Hadoop & SQL
Industries using Hadoop
Data Locality
Hadoop Architecture
Map Reduce & HDFS
Using the Hadoop single node image (Clone)

Hadoop Distributed File System (HDFS)

HDFS Design & Concepts
Blocks, Name nodes and Data nodes
HDFS High-Availability and HDFS Federation
Hadoop DFS The Command-Line Interface
Basic File System Operations
Anatomy of File Read,File Write
Block Placement Policy and Modes
More detailed explanation about Configuration files
Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
How to add New Data Node dynamically,decommission a Data Node dynamically (Without stopping cluster)
FSCK Utility. (Block report)
How to override default configuration at system level and Programming level
HDFS Federation
ZOOKEEPER Leader Election Algorithm
Exercise and small use case on HDFS

Map Reduce

Map Reduce Functional Programming Basics
Map and Reduce Basics
How Map Reduce Works
Anatomy of a Map Reduce Job Run
Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
Job Completion, Failures
Shuffling and Sorting
Splits, Record reader, Partition, Types of partitions & Combiner
Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
Types of Schedulers and Counters
Comparisons between Old and New API at code and Architecture Level
Getting the data from RDBMS into HDFS using Custom data types
Distributed Cache and Hadoop Streaming (Python, Ruby and R)
YARN
Sequential Files and Map Files
Enabling Compression Codec’s
Map side Join with distributed Cache
Types of I/O Formats: Multiple outputs, NLINE input format
Handling small files using Combine File Input Format

Map Reduce Programming – Java Programming

Hands on “Word Count” in Map Reduce in standalone and Pseudo distribution Mode
Sorting files using Hadoop Configuration API discussion
Emulating “grep” for searching inside a file in Hadoop
Db Input Format
Job Dependency API discussion
Input Format API discussion,Split API discussion
Custom Data type creation in Hadoop

NOSQL

ACID in RDBMS and BASE in NoSQL
CAP Theorem and Types of Consistency
Types of NoSQL Databases in detail
Columnar Databases in Detail (HBASE and CASSANDRA)
TTL, Bloom Filters and Compensation

HBase

HBase Installation, Concepts
HBase Data Model and Comparison between RDBMS and NOSQL
Master & Region Servers
HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
Catalog Tables
Block Cache and sharding
SPLITS
DATA Modeling (Sequential, Salted, Promoted and Random Keys)
JAVA API’s and Rest Interface
Client Side Buffering and Process 1 million records using Client side Buffering
HBase Counters
Enabling Replication and HBase RAW Scans
HBase Filters
Bulk Loading and Co processors (Endpoints and Observers with programs)
Real world use case consisting of HDFS,MR and HBASE