Introduction Big Data with Hadoop
Duration 3 days
You Will Learn How To:
Integrate Big Data components to create an appropriate Data Lake
Select the correct Big Data stores for disparate data sets
Process large data sets using Hadoop to extract value
Query large data sets in near real time with Pig and Hive
Plan and implement a Big Data strategy for your organization
Who Should Attend :
As an introduction to Big Data training, this course is ideal for anyone, including managers,
programmers, architects and administrators, who wants a foundational overview of the key
components of Big Data and how they can be integrated to provide suitable solutions for their
organization. No programming experience is required. Programmers should be aware that the
exercises in this course are intended to give attendees high-level exposure to the capabilities of the
Big Data software tools and techniques, and not a deep dive.
Course Detail:
Introduction to Big Data
Defining Big Data
The four dimensions of Big Data: volume, velocity, variety, veracity
Introducing the Storage, MapReduce and Query Stack
Delivering business benefit from Big Data
Establishing the business importance of Big Data
Addressing the challenge of extracting useful data
Integrating Big Data with traditional data
Storing Big Data
Analyzing your data characteristics
Selecting data sources for analysis
Eliminating redundant data
Establishing the role of NoSQL
Overview of Big Data stores
Data models: key value, graph, document, column–family
Hadoop Distributed File System
HBase
Hive
Cassandra
Hypertable
Amazon S3
BigTable
DynamoDB
MongoDB
Redis
Riak
Neo4J
Selecting Big Data stores
Choosing the correct data stores based on your data characteristics
Moving code to data
Implementing polyglot data store solutions
Aligning business goals to the appropriate data store
Processing Big Data
Integrating disparate data stores
Mapping data to the programming framework
Connecting and extracting data from storage
Transforming data for processing
Subdividing data in preparation for Hadoop MapReduce
Employing Hadoop MapReduce
Creating the components of Hadoop MapReduce jobs
Distributing data processing across server farms
Executing Hadoop MapReduce jobs
Monitoring the progress of job flows
The building blocks of Hadoop MapReduce
Distinguishing Hadoop daemons
Investigating the Hadoop Distributed File System
Selecting appropriate execution modes: local, pseudo–distributed and fully distributed
Handling streaming data
Comparing real–time processing models
Leveraging Storm to extract live events
Lightning–fast processing with Spark and Shark
Tools and Techniques to Analyze Big Data
Abstracting Hadoop MapReduce jobs with Pig
Communicating with Hadoop in Pig Latin
Executing commands using the Grunt Shell
Streamlining high–level processing
Performing ad hoc Big Data querying with Hive
Persisting data in the Hive MegaStore
Performing queries with HiveQL
Investigating Hive file formats
Creating business value from extracted data
Mining data with Mahout
Visualizing processed results with reporting tools
Querying in real time with Impala
Developing a Big Data Strategy
Defining a Big Data strategy for your organization
Establishing your Big Data needs
Meeting business goals with timely data
Evaluating commercial Big Data tools
Managing organizational expectations
Enabling analytic innovation
Focusing on business importance
Framing the problem
Selecting the correct tools
Achieving timely results
Implementing a Big Data Solution
Selecting suitable vendors and hosting options
Balancing costs against business value
Keeping ahead of the curve