Duration: 3 Days
Course Overview
This 3 day hands-on course is suitable for anybody wishing to understand the concepts and technologies involved with exploiting big data using the Hadoop Ecosystem. Attendees will learn how to set up and write applications for Hadoop, Pig, Hive and Impala.
How can I attend my course?
Course Content
Big Data
• What is big data?
• Technical challenges
• Structured, semi-structured and unstructured data
• Big data storage
• NoSql
Hadoop
• What is Hadoop?
• The Hadoop Ecosystem
• Hadoop versus relational databases
• Mapping and reducing
• Writing map reduce scripts
• Combining and partitioning
• Hadoop streaming
• Installing and configuring Hadoop
Pig
• What is Pig?
• Preprocessing data
• Using the Pig shell Grunt
• Loading data and schemas
• Generating relations
• Displaying and storing results
• Designing Pig scripts
Hive
• What is Hive?
• Creating the data warehouse
• Mapping structure onto stored data
• Hive Query Language (HiveQL)
Impala
What is Impala?
Impala architecture
Using the Impala shell
Impala SQL
Case Study
Develop a map reduce application using one or more tools from scratch.