Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views5 pages

Instructions For Big Data Assignment

Instructions

Uploaded by

itamar.nakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views5 pages

Instructions For Big Data Assignment

Instructions

Uploaded by

itamar.nakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Instructions for Big Data Assignment

1 About
This assignment is part of the Big Data course

2 Mission
The students should build and demo a big data use case while at least using one or more
technologies that are part of the course’s topics. It is possible to use additional relevant
technologies according to the students’ discretion.

3 Course’s relevant topics


Big Data course covers the following main topics that can be used for the assignment:
• Datatypes (Structured, Semi-Structured, Unstructured)
• Dockers, VMs
• RDBMS Model (SQL DBs), Intro to NoSQL DBs
• ACID, CAP models
• OLTP, OLAP, ETL, ELT
• Apache Hadoop and echo system
• Hadoop Distributed File System (HDFS)
• Map Reduce
• Apache Hadoop echo system
• Apache ZooKeeper
• Apache Hadoop
• Apache Sqoop (Extract, Load), Apache Pig (Transform)
• Hadoop ETL – Apache Oozie
• Apache Spark
• Batch Vs Streaming
• Restful API
• Apache Kafka (Message Broker)
• Document DBs
o Elasticsearch
• Graph DBs

4 Project
The students will work in pairs
4.1 Definitions
Define and build an ETL:
• Find a relevant data set(s). for example, you can use Kaggle
• Use at least Semi-structured or unstructured data
• Use at least one of the following technologies (the more the marrier)
o NoSQL DB, such as Elasticsearch
o Distributed file system, such as HDFS
o Object store (for example AWS S3)
o Data Tables, Apache Iceberg
o Streaming technology, such as Kafka
• It is possible enrich data from the previous bullet with data from NFS or SQL DB
• Perform data transformation
• Load the data to any data repository
o You can use and logic or query language for the transformations
• Present the results

4.2 Presentation
The students will do 5 minutes presentation & a demo (when possible) and leave up to 3
minutes to questions

4.3 Guidelines
You can assist any LLM (such as ChatGPT or others) you prefer. Actually, it is recommended
to use an LLM engine (when possible), for example:
• Create code
• Get ideas
• Create the presentation
• Etc.

4.4 Optional use cases


ChatGPT answered my question “write big data use cases“:

“Big data has revolutionized numerous industries by enabling more informed decision-
making and uncovering insights that were previously unattainable. Here are some prominent
use cases of big data across various sectors:

### 1. **Healthcare**
- **Predictive Analytics:** Predict patient admissions, disease outbreaks, and the
progression of diseases.
- **Personalized Medicine:** Tailor treatments based on individual genetic profiles and
health data.
- **Operational Efficiency:** Optimize hospital operations and reduce wait times through
data analysis.

### 2. **Finance**
- **Fraud Detection:** Identify and prevent fraudulent transactions in real-time using
pattern recognition.
- **Risk Management:** Assess and mitigate financial risks through predictive modeling.
- **Customer Insights:** Analyze customer behavior to offer personalized financial products
and services.
### 3. **Retail**
- **Customer Personalization:** Provide personalized shopping experiences and
recommendations based on customer data.
- **Inventory Management:** Optimize stock levels and supply chain logistics to reduce
costs and improve efficiency.
- **Market Basket Analysis:** Understand purchasing patterns to drive cross-selling and
upselling strategies.

### 4. **Telecommunications**
- **Network Optimization:** Analyze network usage to enhance performance and reduce
downtime.
- **Churn Prediction:** Identify customers at risk of leaving and implement retention
strategies.
- **Customer Experience:** Improve customer service through sentiment analysis and
targeted marketing.

### 5. **Manufacturing**
- **Predictive Maintenance:** Predict equipment failures before they occur, reducing
downtime and maintenance costs.
- **Quality Control:** Monitor production processes to ensure product quality and
compliance.
- **Supply Chain Optimization:** Enhance supply chain visibility and efficiency through
data-driven decision-making.

### 6. **Energy**
- **Smart Grids:** Optimize energy distribution and reduce wastage through real-time data
analysis.
- **Predictive Maintenance:** Monitor and maintain infrastructure to prevent outages and
reduce costs.
- **Energy Consumption:** Analyze consumption patterns to promote energy efficiency and
cost savings.

### 7. **Transportation and Logistics**


- **Route Optimization:** Determine the most efficient routes to reduce fuel consumption
and delivery times.
- **Fleet Management:** Monitor and manage vehicle fleets for better performance and
cost-efficiency.
- **Predictive Maintenance:** Anticipate and prevent vehicle breakdowns through data
analysis.

### 8. **Entertainment and Media**


- **Content Recommendations:** Provide personalized content suggestions to users based
on their viewing history.
- **Audience Analysis:** Understand audience preferences and behavior to tailor content
and advertising.
- **Content Creation:** Use data insights to create content that resonates with target
audiences.

### 9. **Government**
- **Public Safety:** Predict and prevent crime through data analysis and predictive policing.
- **Resource Allocation:** Optimize the allocation of public resources and services based on
demand patterns.
- **Citizen Engagement:** Enhance communication and engagement with citizens through
data-driven strategies.

### 10. **Education**


- **Personalized Learning:** Tailor educational content and experiences to individual
student needs and learning styles.
- **Predictive Analytics:** Identify students at risk of dropping out and implement
intervention strategies.
- **Operational Efficiency:** Improve administrative processes and resource management
through data analysis.

### 11. **Agriculture**


- **Precision Farming:** Use data to optimize planting, watering, and harvesting processes.
- **Yield Prediction:** Predict crop yields based on weather, soil, and historical data.
- **Supply Chain Management:** Enhance the efficiency of the agricultural supply chain
from farm to market.

### 12. **Insurance**


- **Risk Assessment:** Evaluate risk more accurately for underwriting and pricing insurance
policies.
- **Fraud Detection:** Identify and prevent fraudulent claims through data analysis.
- **Customer Insights:** Personalize insurance products and services based on customer
data.

These use cases highlight the transformative potential of big data across various industries,
leading to enhanced efficiency, improved decision-making, and better outcomes overall.

5 Example for use case


There are so many data sets in Kaggle or other relevant dataset repositories. You can use a
dataset repository from any source you choose. In case there is a relevant challenge with
this repository, it is possible to use the challenge and implement a solution (even if it is not
your solution). The assignment is good as long as you understand the challenge, build the
solution and can conduct an intelligent conversation about this topic.

For example:
Dataset:
https://www.kaggle.com/datasets/datazng/shopping-mall-customer-data-segmentation-
analysis/data
1. Create a Kafka topic load the data to the topic
2. Using Apache Spark
a. Read the data in the topic
b. Apply logic - get KPIs according to the required insights
c. Write results to local directory
3. Present the conclusions

Example for insights, in this case you can choose one or more of the following
Demographic Analysis of Shopping Behavior: Insights and Recommendations
• Income-Related Spending Trends: Targeting high-income customers could optimize
marketing efforts towards premium products or services.
• Gender-Based Spending Differences: Gender appears to have minimal impact on
spending scores, suggesting gender-neutral marketing strategies may be effective.
• Age-Related Spending Patterns: Younger customers exhibit higher spending score
variability compared to older demographics. Tailoring marketing strategies to
resonate with different age groups could enhance conversion rates.

You might also like