CSCI461 Assignment 1 Spring25

The assignment for CSCI461 Introduction to Big Data at Nile University requires teams of 3-5 members to complete a project involving Docker and data processing by March 20, 2025. Students must create a Dockerfile, implement various Python scripts for data handling, and submit all files as a single ZIP on Moodle. There are penalties for late submissions and failure to fill out a team information form, with opportunities for bonus marks by pushing the Docker image to Docker Hub and files to GitHub.

Uploaded by

nourhano021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

CSCI461 Assignment 1 Spring25

Uploaded by

nourhano021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Nile University | Spring 2025 | CSCI461 Introduction to Big Data

Assignment #1
—------------------------------------------------------------------------------------------------------------------

INSTRUCTIONS: (READ ALL POINTS CAREFULLY)

- Assignment SHOULD be a team composed ONLY of 3-5 members.

- Assignment deadline will be March 20 2025 @ 11:45 PM.
- Assignment discussions start in the week that starts March 23, 2025. [Discussion slots
will be announced for each TA]
- Assignment's total grade is 10 marks. [Distribution is specified in assignment
requirements below] + 1 Bonus mark.
- Submission will be on Moodle. (More info. In the deliverables section below)
- One member should fill out the following form to submit team information:
https://forms.office.com/r/SRuEeju4HT . (The form deadline is the same as the
assignment)
- Form submission is very important, teams who won’t fill out the form will receive -5.
- Any submission after the deadline will be considered as -2 from the assignment’s total
grade.
- CHEATING in the assignment is considered ZERO.
- In the discussion all members MUST present in the discussion, all members MUST
understand every implemented part of the project.

ASSIGNMENT REQUIREMENTS:

- Start by creating a directory on your local machine named bd-a1/.

- Download and place the dataset in the bd-a1/ directory [Choose any simple dataset].
- Inside the bd-a1/ directory, create a Dockerfile does the following:
- Specify the base image as Ubuntu. [0.5 MARK]
- Install the following packages in the Dockerfile: Python3, Pandas, Numpy,
Seaborn, Matplotlib, scikit-learn, and Scipy. [1 MARK]
- Create a directory inside the container at /home/doc-bd-a1/. [0.5 MARK]
- Move the dataset file to the container. [0.5 MARK]
- Open the bash shell upon container startup. [0.5 MARK]
- Note: Install any additional modules or libraries you anticipate needing within the
container.
- Within the container's doc-bd-a1/ directory (after having the image and having a running
container), create the following files:
- load.py: Design this file to dynamically read the dataset file by accepting the file
path as a user-provided argument. [0.5 MARK]
Nile University | Spring 2025 | CSCI461 Introduction to Big Data

- dpre.py: This file should perform Data Cleaning, Data Transformation, Data
Reduction, and Data Discretization steps. Save the resulting data frame as a new
CSV file named res_dpre.csv. [2 MARKS]
- eda.py: Conduct exploratory data analysis, generating at least 3 insights without
visualizations. Save these insights as text files named eda-in-1.txt, eda-in-2.txt,
and so on. [1 MARK]
- vis.py: Create a single visualization and save it as vis.png. [0.5 MARK]
- model.py: Implement the K-means algorithm on your data with the columns you
deem suitable for K-means, setting k=3. Save the number of records in each
cluster as a text file named k.txt. [1 MARK]
- final.sh: Compose a simple bash script on your local machine to copy the output
files generated by dpre.py, eda.py, vis.py, and model.py from the container to
your local machine in bd-a1/service-result/. Finally, the script should stop the
container. [1 MARK]
Notes:
 Each Python file responsible for updating the data frame should invoke the next Python
file and transmit the data frame path to it. Subsequently, read the CSV file as a data frame
and continue processing.
 To execute your project, perform the following steps:
 After creating the Dockerfile, build it to produce an image.
 Run the container using the generated image.
 Inside the container, create the Python files as specified.
 Initiate the pipeline using the command (inside the container): python3 load.py
<dataset-path>.
 The pipeline will generate several files and figures, conforming to the prescribed
outputs. These will be relocated from the container to your local machine in bd-
a1/service-result/ using the bash script.
 README file showing the execution of the project, all Docker commands used,
etc. [1 MARK]
BONUS:
 Push the Docker Image to Docker Hub. [0.5 MARK]
 Push all your files to a GitHub repo. [0.5 MARK]
DELIVERABLES:

 ALL TEAM MEMBERS should submit all files (Dockerfile, Python files, Bash script,
Results files, README file, and Bonus files if exist) as ONE ZIP file on Moodle.
 You don’t have to attach the dataset.

Bad601 Lab Maual
No ratings yet
Bad601 Lab Maual
34 pages
Data Sceince Lab Manual
No ratings yet
Data Sceince Lab Manual
64 pages
CS 3361 Set 2
No ratings yet
CS 3361 Set 2
3 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
Dsbdal Te It Manual
No ratings yet
Dsbdal Te It Manual
86 pages
CS 3361 Set 1
No ratings yet
CS 3361 Set 1
3 pages
Python Data Science Certificate Course
No ratings yet
Python Data Science Certificate Course
5 pages
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
100% (1)
DSBDA LAB - MANUAL (Autosaved) - Sd1-Converted-1-2
256 pages
IML Lab Manual
No ratings yet
IML Lab Manual
31 pages
Docker Pycharm Guide
No ratings yet
Docker Pycharm Guide
4 pages
List of Questions Big Data
No ratings yet
List of Questions Big Data
5 pages
IDS Syllabus
No ratings yet
IDS Syllabus
5 pages
20dce017 Bda Pracfil
No ratings yet
20dce017 Bda Pracfil
41 pages
Data Science for Engineers Course
No ratings yet
Data Science for Engineers Course
8 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Exam Question Paper - BDT - 35
No ratings yet
Exam Question Paper - BDT - 35
3 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Python Numpy Tutorial
No ratings yet
Python Numpy Tutorial
3 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Data Science & Big Data Lab Guide
No ratings yet
Data Science & Big Data Lab Guide
167 pages
Nemis 6 - 11-1
No ratings yet
Nemis 6 - 11-1
41 pages
DataGrokr Technical Assignment
No ratings yet
DataGrokr Technical Assignment
4 pages
Prectical List MCA-304 (Data Science and Big Data)
No ratings yet
Prectical List MCA-304 (Data Science and Big Data)
1 page
N RQgi 8 Eg DUNFS451 K4 X QXA
No ratings yet
N RQgi 8 Eg DUNFS451 K4 X QXA
61 pages
OCS353-Data Science Fundamentals Manual 1
No ratings yet
OCS353-Data Science Fundamentals Manual 1
34 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
DataGrokr Technical Assignment - Data Engineering
No ratings yet
DataGrokr Technical Assignment - Data Engineering
4 pages
AIML Spiral
No ratings yet
AIML Spiral
41 pages
01 Pandas Basics
No ratings yet
01 Pandas Basics
5 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
Haoran Fei Resume
No ratings yet
Haoran Fei Resume
1 page
Fds Merged
No ratings yet
Fds Merged
102 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Introduction of Machine Learning Course Code: 4350702
No ratings yet
Introduction of Machine Learning Course Code: 4350702
9 pages
Bigdata
No ratings yet
Bigdata
3 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
AI and ML Lab Assignment PCCCS594 - Final
No ratings yet
AI and ML Lab Assignment PCCCS594 - Final
14 pages
FDS Lab Manual For CSE 1
No ratings yet
FDS Lab Manual For CSE 1
86 pages
3rd EXPERIMENT
No ratings yet
3rd EXPERIMENT
13 pages
Bda Lab Output
No ratings yet
Bda Lab Output
22 pages
PDS Practical
No ratings yet
PDS Practical
94 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
A1 Exploratory and Descriptive Data Analysis
No ratings yet
A1 Exploratory and Descriptive Data Analysis
1 page
AI&DS AC Lab Manual
No ratings yet
AI&DS AC Lab Manual
5 pages
Data Science
No ratings yet
Data Science
3 pages
DataGrokr Technical Assignment - Data Engineering - Internshala
No ratings yet
DataGrokr Technical Assignment - Data Engineering - Internshala
5 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
Unit-4 Containers and Docker
No ratings yet
Unit-4 Containers and Docker
44 pages
Ocs353 Data Science Fundamentals Laboratory-Eee
No ratings yet
Ocs353 Data Science Fundamentals Laboratory-Eee
52 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
CS3361 Set1
No ratings yet
CS3361 Set1
5 pages
Ad3461 ML Lab
No ratings yet
Ad3461 ML Lab
24 pages
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
No ratings yet
L1: Introduction, Mapreduce, Spark: Csl7710: Machine Learning With Big Data Dip Sankar Banerjee Cse, Iit Jodhpur
51 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
L1 - Numpy Pandas S2
No ratings yet
L1 - Numpy Pandas S2
2 pages
DataGrokr Technical Assignment
No ratings yet
DataGrokr Technical Assignment
4 pages
Index
No ratings yet
Index
4 pages
10lecture - Technology and Tools (Pig-ZooKeeper)
No ratings yet
10lecture - Technology and Tools (Pig-ZooKeeper)
44 pages
Lecture
No ratings yet
Lecture
5 pages
CSCI461 Assignment 2 Spring24
No ratings yet
CSCI461 Assignment 2 Spring24
3 pages
Lecture
No ratings yet
Lecture
9 pages
Lab 1
No ratings yet
Lab 1
21 pages
Lab 4
No ratings yet
Lab 4
20 pages
BMD303 Lec5 Interoperability2 S25
No ratings yet
BMD303 Lec5 Interoperability2 S25
10 pages
MidtermExamPractice B
No ratings yet
MidtermExamPractice B
2 pages
MidTermPractice - B - Answer Key
No ratings yet
MidTermPractice - B - Answer Key
2 pages
Experiment No.4 Atterberg Limits: Object
No ratings yet
Experiment No.4 Atterberg Limits: Object
3 pages
Ba hw1 BasicModeling f24
No ratings yet
Ba hw1 BasicModeling f24
6 pages
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
No ratings yet
Republic of The Philippines Tanggapan NG Sangguniang Panlungsod City of Naga
5 pages
(L6) - (JEE 2.0) - 3D Geometry - 28th Nov
No ratings yet
(L6) - (JEE 2.0) - 3D Geometry - 28th Nov
44 pages
Moral Panics Assignment
No ratings yet
Moral Panics Assignment
7 pages
Acgih Manual 1998 (401-500)
No ratings yet
Acgih Manual 1998 (401-500)
100 pages
National Cultural Policy
No ratings yet
National Cultural Policy
58 pages
Rapid Prototyping
100% (1)
Rapid Prototyping
21 pages
Crash Barrier BBS & QTY
100% (10)
Crash Barrier BBS & QTY
4 pages
Grade 11 Matrices
No ratings yet
Grade 11 Matrices
3 pages
The Construction of Family in Selected Disney Animated Films
No ratings yet
The Construction of Family in Selected Disney Animated Films
4 pages
Business Client Information Form
No ratings yet
Business Client Information Form
5 pages
GEZE - Product Data Sheet - EN - 697800130822
No ratings yet
GEZE - Product Data Sheet - EN - 697800130822
3 pages
Meteorological Instruments: MODEL 85000
No ratings yet
Meteorological Instruments: MODEL 85000
16 pages
Sa1 Frame
No ratings yet
Sa1 Frame
51 pages
ERP Training Schedule
No ratings yet
ERP Training Schedule
21 pages
Audio Compression Using Wavelet Techniques: Project Report
No ratings yet
Audio Compression Using Wavelet Techniques: Project Report
41 pages
Audi Dynamic Steering
100% (1)
Audi Dynamic Steering
34 pages
安川ES165N en 40
No ratings yet
安川ES165N en 40
4 pages
Create Gantt Chart and Cash Flow Using Excel With A File
No ratings yet
Create Gantt Chart and Cash Flow Using Excel With A File
6 pages
Audels Engineers and Mechanics Guide Volume 5 From WWW Jgokey Com
No ratings yet
Audels Engineers and Mechanics Guide Volume 5 From WWW Jgokey Com
556 pages
Anderson Peter Chapter 5 Two
No ratings yet
Anderson Peter Chapter 5 Two
4 pages
Experiment Explanation - Grade 7
No ratings yet
Experiment Explanation - Grade 7
5 pages
Cohesity License Terms Overview
No ratings yet
Cohesity License Terms Overview
5 pages
RTU Specification for SCADA Systems
100% (1)
RTU Specification for SCADA Systems
18 pages
EIL Participates in India Energy Week 2024
No ratings yet
EIL Participates in India Energy Week 2024
9 pages
Case Study
No ratings yet
Case Study
2 pages
CO2 Fire Suppression Systems Guide
100% (2)
CO2 Fire Suppression Systems Guide
21 pages
System and Communication
No ratings yet
System and Communication
9 pages
Imperfections in Crystalline Solids, D. 1st Edition, Wei Cai, William Nix
No ratings yet
Imperfections in Crystalline Solids, D. 1st Edition, Wei Cai, William Nix
410 pages

CSCI461 Assignment 1 Spring25

Uploaded by

CSCI461 Assignment 1 Spring25

Uploaded by

Nile University | Spring 2025 | CSCI461 Introduction to Big Data

INSTRUCTIONS: (READ ALL POINTS CAREFULLY)

- Assignment SHOULD be a team composed ONLY of 3-5 members.

- Start by creating a directory on your local machine named bd-a1/.

You might also like