Experience:10 Years
In this training,you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works. storing data that allow for efficient processing and analysis, and gain the skills you need to store, manage, process, and analyze massive amounts of unstructured data to create an appropriate data lake.
Introduction:
Understanding BigData.
♦ What is Big Data?
♦ Big-Data characteristics
Hadoop Distributions:
♦ Cloudera
♦ MapR
♦ Hortonworks
♦ Amazone
Introduction to Apache Hadoop.
♦ Flavors of Hadoop: Big-Insights, Google Query etc..
Hadoop Eco-system components:
Understanding Hadoop Cluster
Hadoop Core-Components.
♦ NameNode.
♦ ResourceManager / JobTracker.
♦ NameNode/ TaskTracker.
♦ DataNode.
♦ SecondaryNameNode.
HDFS Architecture
♦ Why 64MB?
♦ Why Block?
♦ Why replication factor 3?
Rack Awareness.
♦ Network Topology.
♦ Assignment of Blocks to Racks and Nodes.
♦ Block Reports
♦ Heart Beat
♦ Block Management Service.
Anatomy of File Write.
Anatomy of File Read.
Hadoop Federation and High Availability
Map Reduce Overview
Cluster Configuration overview
♦ Core-default.xml
♦ Hdfs-default.xml
♦ Mapred-default.xml
♦ Yarn-site.xml
♦ Hadoop-env.sh
♦ Slaves
♦ Masters
Why Map Reduce?
Use cases where Map Reduce is used.
Parts of Map Reduce
Shuffle, Sort and Merge phases
HDFS Practicals (HDFS Commands)
ClouderaDistribution of Hadoop(CDH) – VM Setup
Map Reduce Failure Scenarios
Speculative Execution
Input File Formats
Output File Formats
Map Reduce Advance Concepts:
♦ Joins
♦ Multi outputs
♦ Counters
♦ Distributed Cache
Hadoop 2.X(YARN):
♦ YARN Architecture
♦ Hadoop Classic vs YARN
Sqoop:
♦ Sqoop Architecture
♦ Import and Export
♦ Sqoop Hive/HBase Import
♦ SqoopPracticals
Hive:
♦ Hive Background.
♦ What is Hive?
♦ Pig Vs Hive
♦ Where to Use Hive?
♦ Hive Architecture
♦ Metastore
♦ Hive execution modes.
♦ External, Manged, Native and Non-native tables.
♦Hive Partitions:
Dynamic Partitions
Static Partitions
♦ Hive DataModel
♦ Hive DataTypes
Primitive
Complex
♦ Queries:
Create Managed Table
Load Data
Insert overwrite table
Insert into Local directory.
Insert Overwrite table select.
♦ Joins
Inner Joins
Outer Joins
Skew Joins
♦ Multi-table Inserts
♦ Multiple files, directories, table inserts.
♦ Serde.
♦ UDF
♦ Hive Practical’s
♦ Hive Optimization Techniques and Best Practices
Pig:
♦ Need of Pig?
♦ Why Pig Created?
♦ Why go for Pig when Map Reduce is there?
♦ Pig use cases.
♦ Pig built in operators
♦Operators:
Load,Store,Dump,Filter.
Distinct,Group,CoGroup
Join,Foreach Generate,Distinct
Limit,ORDER,CROSS
UNION,SPLIT
♦ Dump Vs Store
♦ DataTypes
Complex
Bag,Tuple,Atom,Map
Primitives.
Integers,Float,Chararray
byteArray,Double
♦ Diagnostic Operators
Describe
Explain
Illustrate
♦ UDFs.
Filter Function
Eval Function
Macros
Demo
♦ Storage Handlers.
♦ Pig Practicals and Usecases.
♦ Pig Debugging using Explain and Illustrate commands
♦ Pig Stats.
♦ Introduction to NOSQL Databases.
♦ NOSql Landscapes
♦ Introduction to HBASE
♦ HBASE vs RDBMS
♦ Create Table on HBASE using HBASE shell
♦ Write Files to HBASE.
♦ Major Components of HBASE.
HBase Master.
HRegionServer.
HBase Client.
Zookeeper.
Region.
♦ HBase Practicals
♦ Row key Designing?
Discription
♦ History of "Big Data" & Apache Spark
♦ Introduction to the Spark Shell and the training environment
♦ Intro to Spark DataFrames and Spark SQL
♦ Introduction to RDDs
♦ Lazy Evaluation
♦ Transformations and Actions
♦ Data Sources: reading from Parquet, HDFS, and your local file system
♦ Spark's Architecture
♦ Programming with Accumulators and Broadcast variables
♦ Debugging and tuning Spark jobs using Spark's admin UIs
♦ Memory & Persistence
♦ Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)
We can assure a 100% job guarantee and Placement. Contact us for Free - Demo.
Copyright © 2017 - Developed by Infihive Consulting Services LLC changes