Big Data and Hadoop Trainers

Big Data and Hadoop

Course Overview:

In this training,you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works. storing data that allow for efficient processing and analysis, and gain the skills you need to store, manage, process, and analyze massive amounts of unstructured data to create an appropriate data lake. 

Course Content:

Introduction:

Understanding BigData.

What is Big Data?

Big-Data characteristics

  Hadoop Distributions:

Cloudera

MapR

Hortonworks

Amazone

Introduction to Apache Hadoop.

Flavors of Hadoop: Big-Insights, Google Query etc..

Hadoop Eco-system components:

Understanding Hadoop Cluster

Hadoop Core-Components.

NameNode.

ResourceManager / JobTracker.

NameNode/ TaskTracker.

DataNode.

SecondaryNameNode.

HDFS Architecture

Why 64MB?

Why Block?

Why replication factor 3?

Rack Awareness.

Network Topology.

Assignment of Blocks to Racks and Nodes.

Block Reports

Heart Beat

Block Management Service.

Anatomy of File Write.

Anatomy of File Read.

Hadoop Federation and High Availability

Map Reduce Overview

Cluster Configuration overview

Core-default.xml

Hdfs-default.xml

Mapred-default.xml

Yarn-site.xml

Hadoop-env.sh

Slaves

Masters

Why Map Reduce?

Use cases where Map Reduce is used.

Parts of Map Reduce

 Shuffle, Sort and Merge phases

HDFS Practicals (HDFS Commands)

ClouderaDistribution of Hadoop(CDH) – VM Setup

Map Reduce Failure Scenarios

Speculative Execution

Input File Formats

Output File Formats

Map Reduce Advance Concepts:

Joins

Multi outputs

Counters

Distributed Cache

Hadoop 2.X(YARN):

YARN Architecture

 Hadoop Classic vs YARN

Sqoop:

Sqoop Architecture

Import and Export

Sqoop Hive/HBase Import

SqoopPracticals

Hive:

Hive Background.

What is Hive?

Pig Vs Hive

Where to Use Hive?

Hive Architecture

Metastore

Hive execution modes.

External, Manged, Native and Non-native tables.

Hive Partitions:

Dynamic Partitions

Static Partitions

Hive DataModel

Hive DataTypes

Primitive

Complex

Queries:

Create Managed Table

Load Data

Insert overwrite table

Insert into Local directory.

Insert Overwrite table select.

Joins

Inner Joins

Outer Joins

Skew Joins

Multi-table Inserts

Multiple files, directories, table inserts.

Serde.

UDF

Hive Practical’s

Hive Optimization Techniques and Best Practices

Pig:

Need of Pig?

Why Pig Created?

Why go for Pig when Map Reduce is there?

Pig use cases.

Pig built in operators

Operators:

Load,Store,Dump,Filter.

Distinct,Group,CoGroup

Join,Foreach Generate,Distinct

Limit,ORDER,CROSS

UNION,SPLIT

Dump Vs Store

DataTypes

Complex

Bag,Tuple,Atom,Map

Primitives.

Integers,Float,Chararray

byteArray,Double

Diagnostic Operators

Describe

Explain

Illustrate

UDFs.

Filter Function

Eval Function

Macros

Demo

Storage Handlers.

Pig Practicals and Usecases.

Pig Debugging using Explain and Illustrate commands

 Pig Stats.

Introduction to NOSQL Databases.

NOSql Landscapes

Introduction to HBASE

HBASE vs RDBMS

Create Table on HBASE using HBASE shell

Write Files to HBASE.

Major Components of HBASE.

HBase Master.

HRegionServer.

HBase Client.

Zookeeper.

Region.

HBase Practicals

Row key Designing?

Discription

History of "Big Data" & Apache Spark

Introduction to the Spark Shell and the training environment

Intro to Spark DataFrames and Spark SQL

Introduction to RDDs

Lazy Evaluation

Transformations and Actions

Data Sources: reading from Parquet, HDFS, and your local file system

Spark's Architecture

Programming with Accumulators and Broadcast variables

Debugging and tuning Spark jobs using Spark's admin UIs

Memory & Persistence

Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)

We can assure a 100% job guarantee and Placement. Contact us for Free - Demo.

Quick Enroll