Hadoop Course in Ameerpet:
Hadoop Course Overview:
Hadoop Development course teaches the skill set needed for the learners how to setup Hadoop Cluster, how to store Big Data using Hadoop( HDFS) and how to reuse dissect the Big Data using Chart- Reduce Programming or by using other Hadoop ecosystems. Attend Hadoop Training rally by Real- Time Expert.
Hadoop Training Course Prerequisites
- Basic Unix Commands
- Core Java (OOPS Concepts, Collections , Exceptions ) for Map Reduce Programming
- SQL Query knowledge for Hive Queries
Hadoop Course System Requirements
- Any Linux flavor OS (Ex: Ubuntu/Cent OS/Fedora/Red Hat Linux) with 4 GB RAM (minimum), 100 GB HDD
- Java 1.6+
- Open-SSH server & client
- MYSQL Database
- Eclipse IDE
- VMWare (To use Linux OS along with Windows OS)
Hadoop Course Content
Introduction to Hadoop
- High Availability
- Scaling
- Advantages and Challenges
Introduction to Big Data
- What is Big data
- Big Data opportunities,Challenges
- Characteristics of Big data
Introduction to Hadoop
- Hadoop Distributed File System
- Comparing Hadoop & SQL
- Industries using Hadoop
- Data Locality
- Hadoop Architecture
- Map Reduce & HDFS
- Using the Hadoop single node image (Clone)
Hadoop Distributed File System (HDFS)
- HDFS Design & Concepts
- Blocks, Name nodes and Data nodes
- HDFS High-Availability and HDFS Federation
- Hadoop DFS The Command-Line Interface
- Basic File System Operations
- Anatomy of File Read,File Write
- Block Placement Policy and Modes
- More detailed explanation about Configuration files
- Metadata, FS image, Edit log, Secondary Name Node and Safe Mode
- How to add New Data Node dynamically,decommission a Data Node dynamically (Without stopping cluster)
- FSCK Utility. (Block report)
- How to override default configuration at system level and Programming level
- HDFS Federation
- ZOOKEEPER Leader Election Algorithm
- Exercise and small use case on HDFS
Map Reduce
- Map Reduce Functional Programming Basics
- Map and Reduce Basics
- How Map Reduce Works
- Anatomy of a Map Reduce Job Run
- Legacy Architecture ->Job Submission, Job Initialization, Task Assignment, Task Execution, Progress and Status Updates
- Job Completion, Failures
- Shuffling and Sorting
- Splits, Record reader, Partition, Types of partitions & Combiner
- Optimization Techniques -> Speculative Execution, JVM Reuse and No. Slots
- Types of Schedulers and Counters
- Comparisons between Old and New API at code and Architecture Level
- Getting the data from RDBMS into HDFS using Custom data types
- Distributed Cache and Hadoop Streaming (Python, Ruby and R)
- YARN
- Sequential Files and Map Files
- Enabling Compression Codec’s
- Map side Join with distributed Cache
- Types of I/O Formats: Multiple outputs, NLINE input format
- Handling small files using Combine File Input Format
Map Reduce Programming – Java Programming
- Hands on “Word Count” in Map Reduce in standalone and Pseudo distribution Mode
- Sorting files using Hadoop Configuration API discussion
- Emulating “grep” for searching inside a file in Hadoop
- Db Input Format
- Job Dependency API discussion
- Input Format API discussion,Split API discussion
- Custom Data type creation in Hadoop
NOSQL
- ACID in RDBMS and BASE in NoSQL
- CAP Theorem and Types of Consistency
- Types of NoSQL Databases in detail
- Columnar Databases in Detail (HBASE and CASSANDRA)
- TTL, Bloom Filters and Compensation
HBase
- HBase Installation, Concepts
- HBase Data Model and Comparison between RDBMS and NOSQL
- Master & Region Servers
- HBase Operations (DDL and DML) through Shell and Programming and HBase Architecture
- Catalog Tables
- Block Cache and sharding
- SPLITS
- DATA Modeling (Sequential, Salted, Promoted and Random Keys)
- JAVA API’s and Rest Interface
- Client Side Buffering and Process 1 million records using Client side Buffering
- HBase Counters
- Enabling Replication and HBase RAW Scans
- HBase Filters
- Bulk Loading and Co processors (Endpoints and Observers with programs)
- Real world use case consisting of HDFS,MR and HBASE