Peakin

 

  1. Introduction of Big Data
    • Big Data Characteristics
    • Sources of big data
    •  Typical Data flow

 

  1. Components in HADOOP
    • HDFS
    • Map Reduce
    • Hive
    • PIG
    • SQOOP
    • YARN
    • OOZIE

 

  1. HADOOP Framework
    • Hadoop System
    • Master-Slave architecture
    • Distributed
    • Fault Tolerance
    • Scalable
    • Parallel processing

 

  1. HADOOP ARCHITECTURE
    • Name Node
    • Data Node
    • Job Tracker
    • Task Tracker
    • Secondary Name Node

 

  1. HDFS architecture
    • Block
    • Split
    • Block Representation
    • High Availability
    • Rack Awareness

 

  1. UNIX & HDFS Commands

 

  1. Map Reduce
    • Map Reduce Flow
    • Map Reduce Execution
    • Speculative Execution
    • Input Formats
    • Distributed Cache
    • Combiner
    • Partitioner
    • Compression Techniques
    • Counters
    • Optimization Techniques in Map Reduce

 

  1. PIG 
    • Input and Output for PIG
    • Execution Modes
    • Explanation of 20+ PIG relations
    • UDF in PIG
    • Optimization Techniques in PIG
    • Replication Joins
    • Skewed Joins
    • Merge Joins
    • Optimization Techniques in PIG

 

  1. HIVE
    • Meta Store
    • Managed tables
    • External Tables
    • Loading Data
    • Data Types
    • Hive Query Language
    • File Formats
    • Partitioning
    • Bucketing
    • Vectorization
    • Script Mode
    • Advanced Hive Commands
    • UDF in HIVE
    • Optimization Techniques in HIVE

 

  1. SQOOP

 

  • Meta Store
  • Import
  • Incremental Importing
  • Query base importing
  • Conditional Base Importing
  • Export
  • SQOOP Jobs
  • Optimization Techniques in SQOOP
  • Code Generator
  • Evaluation Function

 

  1. OOZIE

 

  • OOZIE Flow
  • Components in OOZIE
  • Scheduling jobs

 

  1. YARN
    • Yarn Architecture
    • Yarn Flow