Course Content

  • 3V (Volume-Variety-Velocity) characteristics
  • Structured and Unstructured Data
  •  Structured and Unstructured Data
  •  Application and use cases of Big Data
  • Hadoop History and concepts
  • Ecosystem
  • Distributions
  • High level Architecture
  • Concepts (Distributed storage,horizontal scaling, replication, rack awareness)
  • Architecture
  • Namenode (function, storage, file system meta-data, and block reports)
  • Navigating HDFS UI
  • Secondary namenode
  • Data Node
  • Communications / heart-beats
  •  Block manager / balancer
  •  Health check / safemode
  • read / write path
  •  Command-line interaction with HDFS
  • File systems abstractions
  • Reading / writing files using Java API

  •  MapReduce concepts
  • Daemons: jobtracker / tasktracker
  • Phases: driver, mapper, shuffle/sort, and reducer
  • First MapReduce Job
  • MapReduce Programs (Word Count,Word co-occurence, Average Word Length)
  • Counters
  • Combiners
  • Partitioners
  • MapReduce Configuration
  • Job Config
  •  Mr Types and Formats
  • Sorting
  • Optimizing MapReduce
  • Hive Introduction
  • Environment and Configuration
  • Hive Tables
  • Meta Data
  • HiveQL (DDL & DML Operations)
  • External Vs Managed Tables
  • Partitions & Buckets
  • User Defined Functions
  • Pig Basics
  • Loading Data fiels
  • Pig vs MapReduce
  • Data Types
  • Pig Latin Laguage Constructs
  • LOAD,STORE,DUMP,SPILT etc
  • User Defined Functions
  • Sqoop Basics
  • Importing data from MYSQL
  • Exporting data to MYSQL
  • Introduction to HBase
  • Architecture
  • Configuration
  • HBase Vs RDBMS
  • HBase shell
  • Read path / write path
  • Schema design