01 04 2024 3M Big Data Analytics
01 04 2024 3M Big Data Analytics
Revised Edition
2
Trainer Name
Course Title
BIG DATA ANALYTICS
Objectives and (i) Employable skills and hands-on practice for Big Data Analytics
Expectations
This is a special course designed to address unemployment in the youth.
Thecourse aims to empower students with the right skillset that would help
them get Big Data Analyst jobs in the industry. The course offers a broad,
cross- disciplinary learning experience for students looking to pursue
careers in relevant industry.
In this course, students are introduced to key aspects of the design process,
from research/strategy, creative brief development, and campaign
development to teamwork and presentation and content creation so that they
can enter the relevant market as strong candidates for beginner to
intermediate level jobs.
Main Expectations:
In short, the course under reference should be delivered by professional
instructors in such a robust hands-on manner that the trainees are
comfortably able to employ their skills for earning money (through wage/self-
employment) at its conclusion.
This course thus clearly goes beyond the domain of the traditional training
practices in vogue and underscores an expectation that a market-centric
approach will be adopted as the main driving force while delivering it. The
instructors should therefore be experienced enough to be able to identify the
training needs for the possible market roles available out there. Moreover,
they should also know the strengths and weaknesses of each trainee to
prepare them for such market roles during/after the training.
iii. A module on Work Place Ethics has also been included to highlight
the importance of good and positive behaviour in the workplace in the
line with the best practices elsewhere in the world. An outline of such
qualities have been given in the Appendix to this document.
Its importance should be conveyed in a format that is attractive and
interesting for the trainees such as through PPT slides +short video
documentaries. Needless to say that if the training provider puts his
heart and soul into these otherwise non-technical components, the
image of the Pakistani workforce would undergo a positive
transformation in the local as well as international job markets.
To maintain interest and motivation of the trainees throughout the
course,modern techniques such as:
Motivational Lectures
Success Stories
Case Studies
These techniques would be employed as an additional training tool wherever
possible (these are explained in the subsequent section on Training
Methodology). Lastly, evaluation of the competencies acquired by the
trainees will be done objectively at various stages of the training and a
proper record of the same will be maintained. Suffice to say that for such
evaluations, practical tasks would be designed by the training providers to
gauge the problem-solving abilities of the trainees.
Learning By the end of this course, students will be able to develop skills to convert bulk
Outcomes of information into knowledge, and to assist the business managers in taking data
the course driven decisions.
MODULES
● Institute/work ethics
Hour 4
● Success stories
Hour 1
Hour 1
Task 1
Hour 2 Big Data Characteristics
Introduction to
Big Data and Big Day 3
Week 1 Data Analytics Hour 3 Details may be
seen at
Hour 4 Use Cases Annexure-I
10 Vs of Big Data
Hour 4
Hour 1
Day 5 Hour 3
III)
Hour 2
Details may be
seen at
Hour 3 Annexure-I
Types of Big Data
Hour 4
Hour 1
Types of Data Lakes
Hour 2
Day 2
Hour 3
Hour 1
Hour 2
Categorization of Big Data
Day 3
Analytics
Hour 3
Hour 4
Hour 1
Hour 2
Day 4
Overview of NoSQL
Hour 3
databases
Hour 4
Hour 1
Hour 2
Case study/visit to a
Day 5 software house/data setup
etc.
Hour 3
Hour 4
Hour 2
Day 1
Hour 3
Hands on NoSQL
Databases
Hour 4
Hour 1
Task 3
NoSQL
databases
Week 3 Apache Hour 2 Details may be
Hadoop Day 2 seen at
Overview of Apache Annexure-I
Ecosystem
Hour 3 Hadoop Ecosystem
Hour 4
Hour 1
Hadoop 2
Day 3 Hour 2
Hour 3
Hands on Hadoop 2
Hour 4
Hour 1
YARN
Hour 2
Day 4
Hour 3
Hands on YARN
Hour 4
Hour 1
HDFS
Hour 2
Day 5
Hour 3
Setting up Hadoop clusters
Hour 4
Hour 2
Day 1
MapReduce: Theory and
Hour 3
Hands-on Task 4
MapReduce:
Theory and Details may be
Week 4
Hands-on Hour 4 seen at
MapReduce Annexure-I
Hour 1
Hour 2
Day 2 Hands on MapReduce
Hour 3
Hour 4
Hour 1
Hour 2
Apache Spark with Apache
Day 3
Kafka
Hour 3
Hour 4
Hour 1
Hour 2
Hands-on Practice with
Day 4
Apache Spark
Hour 3
Hour 4
Hour 1
Hour 2
Day 5 Apache Hive
Hour 3
Hour 4
Hour 1
Apache Spark
Task 5
with Apache
Kafka
Day 1 Hour 2 Details may be
Week 5 Apache Hive,
Apache HBase seen at
Apache HBase
Annexure-I
and Apache
Cassandra Hour 3
Hour 4
Hour 1
Hour 2
Day 2 Apache Cassandra
Hour 3
Hour 4
Hour 1
Hour 2
Day 3 Hands-on Activity
Hour 3
Hour 4
Hour 1
Hour 2
Motivational Lecture
Day 5
Hour 3
Hour 4
Hour 1
Task 6
Hour 2
Apache Presto Day 1 Apache Presto
Week 6 Details may be
and Apache Drill
Hour 3 seen at
Annexure-I
Hour 4
Hour 1
Hour 2
Day 2 Apache Drill
Hour 3
Hour 4
Hour 1
Hour 2
Hands on Apache Presto
Day 3
and Apache Drill
Hour 3
Hour 4
Hour 1
Hour 2
Hands on Apache Presto
Day 4
and Apache Drill
Hour 3
Hour 4
Hour 1
Motivational Lecture
Day 5 Hour 2
Hour 3
Hour 4
Hour 1
NoSQL
Hour 2
Day 1
Hour 3
Hands on NoSQL
Hour 4
Hour 1
NoSQL with MongoDB
Hour 2
Day 2
Task 7
Document Hour 3
NoSQL with
Hands on Details may be
Week 7 MongoDB
seen at
Graph NoSQL Hour 4 Annexure-I
with Neo4J
Hour 1
Hour 2
Day 3 Graph NoSQL with Neo4J
Hour 3
Hour 4
Hour 1
Hands on Graph NoSQL
Day 4
with Neo4J
Hour 2
Hour 3
Hour 4
Hour 1
Hour 2
Case study/visit to a
Day 5 software house/data setup
etc.
Hour 3
Hour 4
Hour 1
Client Connection
Hour 2
Day 1
Hour 3
Cluster Initialization
Hour 4
Task 8
Hour 2
Hour 4
Hour 1
Data Manipulation
Hour 2
Day 4
Hour 3 Getting Started with Redis
Hour 1
Hour 2
Day 5 Assignment on Redis
Hour 3
Hour 4
Hour 1
Hour 2
Introduction to Supervised Task 9
Day 1
learning
Large-Scale Details may be
Week 9 Hour 3
Supervised seen at
Learning Annexure-I
Hour 4
Hour 2
Hour 3
Hour 4
Hour 1
Hour 2
Day 3
Regularization
Hour 3
Hour 4
Hour 1
Hour 2
Day 4 Support Vector Machine
(SVM) and the kernel trick
Hour 3
Hour 4
Hour 1
Outlier Detection
Hour 2
Day 5
Hour 3
Spark ML library
Hour 4
Hour 1 Task 10
Details may be
Hour 2 seen at
Introduction to Annexure-I
Day 1
Unsupervised learning
Hour 3
Hour 4
Hour 1
Hour 2
Day 2
K-means / K-medoids
Hour 3
Large-Scale
Week 10 Unsupervised Hour 4
Learning
Hour 1
Hour 2
Day 3
Gaussian Mixture Models
Hour 3
Hour 4
Hour 1
Day 4
Hour 2 Dimensionality Reduction
Hour 3
Hour 4
Hour 1
Hour 2
Spark MLlib for
Day 5
Unsupervised Learning
Hour 3
Hour 4
Hour 1
Hour 2
Day 1 Latent Semantic Indexing
Hour 3
Hour 4
Hour 1 Task 11
Details may be
Large Scale Text
Week 11 seen at
Mining
Hour 2 Annexure-I
Day 2
Topic models
Hour 3
Hour 4
Hour 1
Day 3
Latent Dirichlet Allocation
Hour 2
Hour 3
Hour 4
Hour 1
Hour 2
Day 4
Hour 4
Hour 1
Hour 2
Day 5 Projects
Hour 3
Hour 4
Hour 1 Task 12
Details may be
Hour 2 seen at
Annexure-I
Day 1 Final Project
Hour 3 Final
Week 12 Final Project Project
Hour 4
Hour 2
Hour 3
Hour 4
Hour 1
Hour 2
Day 3
Final Project
Hour 3
Hour 4
Hour 1
Hour 2
Day 4 Final Project Presentation
Hour 3
Hour 4
Hour 1
Hour 2
Final Project Presentation
Day 5
Hour 3
Hour 4
Annexure-I
Task
Task Description Week
No.
Make presentation on Job market for Big Data profession
Explore Job
1. 1
Market
Ingest data from various sources such as CSV files,
databases, or streaming data sources into Hadoop
Data
2. 2
Ingestion HDFS using tools like Apache Sqoop or Apache
Kafka.
Write a MapReduce program to process the ingested
data, such as performing data cleaning, filtering,
3. Data Processing aggregation, or transformation tasks. Alternatively, use 3
Apache Spark to process the data using RDDs
(Resilient Distributed Datasets) or DataFrames.
Use Apache Hive or Apache Pig to write SQL-like
queries or data processing scripts for analyzing the
4. Data Analysis data. 4
Task
Task Description Week
No.
7. Optimization Optimize the performance of data processing jobs by
tuning parameters such as block size, replication factor,
or JVM settings. Implement partitioning, caching, or 7
indexing strategies to improve query performance in
Apache Hive or Apache Spark SQL.
8. Real-time Implement real-time data processing using Apache
Processing Storm or Apache Flink to analyze streaming data as it
arrives. Perform continuous computations, windowing, 8
or event processing on the streaming data.
Annexure-II
Work ethic is a standard of conduct and values for job performance. The modern definition of what
constitutes good work ethics often varies. Different businesses have different expectations. Work
ethic is a belief that hard work and diligence have a moral benefit and an inherent ability, virtue, or
The following ten work ethics are defined as essential for student success:
1. Attendance:
Be at work every day possible, plan your absences don’t abuse leave time. Be punctual
every day.
2. Character:
Honesty is the single most important factor having a direct bearing on the final success of
an individual, corporation, or product. Complete assigned tasks correctly and promptly.
Look to improve your skills.
3. Team Work:
The ability to get along with others including those you don’t necessarily like. The ability to
carry your weight and help others who are struggling. Recognize when to speak up with an
idea and when to compromise by blend ideas together.
4. Appearance:
Dress for success set your best foot forward, personal hygiene, good manner, remember
that the first impression of who you are can last a lifetime
5. Attitude:
Listen to suggestions and be positive, accept responsibility. If you make a mistake, admit it.
Values workplace safety rules and precautions for personal and co-worker safety. Avoids
unnecessary risks. Willing to learn new processes, systems, and procedures in light of
changing responsibilities.
6. Productivity:
Do the work correctly, quality and timelines are prized. Get along with fellows, cooperation
is the key to productivity. Help out whenever asked, do extra without being asked. Take
pride in your work, do things the best you know-how. Eagerly focuses energy on
accomplishing tasks, also referred to as demonstrating ownership. Takes pride in work.
7. Organizational Skills:
Make an effort to improve, learn ways to better yourself. Time management; utilize time and
resources to get the most out of both. Take an appropriate approach to social interactions
at work. Maintains focus on work responsibilities.
8. Communication:
Written communication, being able to correctly write reports and memos.
Verbal communications, being able to communicate one on one or to a group.
25|Big Data Analytics
26
9. Cooperation:
Follow institute rules and regulations, learn and follow expectations. Get along with fellows,
cooperation is the key to productivity. Able to welcome and adapt to changing work
situations and the application of new or different skills.
10. Respect:
Work hard, work to the best of your ability. Carry out orders, do what’s asked the first time.
Show respect, accept, and acknowledge an individual’s talents and knowledge. Respects
diversity in the workplace, including showing due respect for different perspectives,
opinions, and suggestions.