Mr. Akash A
Experience:10 Years

Big Data and Hadoop

Course Overview:

In this training,you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works. storing data that allow for efficient processing and analysis, and gain the skills you need to store, manage, process, and analyze massive amounts of unstructured data to create an appropriate data lake.

Course Content:

Introduction:

Understanding BigData.

♦ What is Big Data?

♦ Big-Data characteristics

Hadoop Distributions:

♦ Cloudera

♦ MapR

♦ Hortonworks

♦ Amazone

Introduction to Apache Hadoop.

♦ Flavors of Hadoop: Big-Insights, Google Query etc..

Hadoop Eco-system components:

Understanding Hadoop Cluster

Hadoop Core-Components.

♦ NameNode.

♦ ResourceManager / JobTracker.

♦ NameNode/ TaskTracker.

♦ DataNode.

♦ SecondaryNameNode.

HDFS Architecture

♦ Why 64MB?

♦ Why Block?

♦ Why replication factor 3?

Rack Awareness.

♦ Network Topology.

♦ Assignment of Blocks to Racks and Nodes.

♦ Block Reports

♦ Heart Beat

♦ Block Management Service.

Anatomy of File Write.

Anatomy of File Read.
Hadoop Federation and High Availability
Map Reduce Overview

Cluster Configuration overview

♦ Core-default.xml

♦ Hdfs-default.xml

♦ Mapred-default.xml

♦ Yarn-site.xml

♦ Hadoop-env.sh

♦ Slaves

♦ Masters

Why Map Reduce?

Use cases where Map Reduce is used.
Parts of Map Reduce
Shuffle, Sort and Merge phases
HDFS Practicals (HDFS Commands)
ClouderaDistribution of Hadoop(CDH) – VM Setup
Map Reduce Failure Scenarios
Speculative Execution
Input File Formats
Output File Formats

Map Reduce Advance Concepts:

♦ Joins

♦ Multi outputs

♦ Counters

♦ Distributed Cache

Hadoop 2.X(YARN):

♦ YARN Architecture

♦ Hadoop Classic vs YARN

Sqoop:

♦ Sqoop Architecture

♦ Import and Export

♦ Sqoop Hive/HBase Import

♦ SqoopPracticals

Hive:

♦ Hive Background.

♦ What is Hive?

♦ Pig Vs Hive

♦ Where to Use Hive?

♦ Hive Architecture

♦ Metastore

♦ Hive execution modes.

♦ External, Manged, Native and Non-native tables.

♦Hive Partitions:

Dynamic Partitions

Static Partitions

♦ Hive DataModel

♦ Hive DataTypes

Primitive

Complex

♦ Queries:

Create Managed Table

Load Data

Insert overwrite table

Insert into Local directory.

Insert Overwrite table select.

♦ Joins

Inner Joins

Outer Joins

Skew Joins

♦ Multi-table Inserts

♦ Multiple files, directories, table inserts.

♦ Serde.

♦ UDF

♦ Hive Practical’s

♦ Hive Optimization Techniques and Best Practices

Pig:

♦ Need of Pig?

♦ Why Pig Created?

♦ Why go for Pig when Map Reduce is there?

♦ Pig use cases.

♦ Pig built in operators

♦Operators:

Load,Store,Dump,Filter.

Distinct,Group,CoGroup

Join,Foreach Generate,Distinct

Limit,ORDER,CROSS

UNION,SPLIT

♦ Dump Vs Store

♦ DataTypes

Complex

Bag,Tuple,Atom,Map

Primitives.

Integers,Float,Chararray

byteArray,Double

♦ Diagnostic Operators

Describe

Explain

Illustrate

♦ UDFs.

Filter Function

Eval Function

Macros

Demo

♦ Storage Handlers.

♦ Pig Practicals and Usecases.

♦ Pig Debugging using Explain and Illustrate commands

♦ Pig Stats.

♦ Introduction to NOSQL Databases.

♦ NOSql Landscapes

♦ Introduction to HBASE

♦ HBASE vs RDBMS

♦ Create Table on HBASE using HBASE shell

♦ Write Files to HBASE.

♦ Major Components of HBASE.

HBase Master.

HRegionServer.

HBase Client.

Zookeeper.

Region.

♦ HBase Practicals

♦ Row key Designing?

Discription

♦ History of "Big Data" & Apache Spark

♦ Introduction to the Spark Shell and the training environment

♦ Intro to Spark DataFrames and Spark SQL

♦ Introduction to RDDs

♦ Lazy Evaluation

♦ Transformations and Actions

♦ Data Sources: reading from Parquet, HDFS, and your local file system

♦ Spark's Architecture

♦ Programming with Accumulators and Broadcast variables

♦ Debugging and tuning Spark jobs using Spark's admin UIs

♦ Memory & Persistence

♦ Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)

We can assure a 100% job guarantee and Placement. Contact us for Free - Demo.

Quick Enroll

Functional

QA/Testing

Database Technologies

ERP

Data Science

Networking

Middleware Technologies

Microsoft Technologies

HP Technologies

IBM Technologies

ORACLE Technologies

Programming Languages

Mobile Applications

Cyber Security

Big Data and Hadoop Trainers