Macse502 Programming-For-data-science Eth 1.0 83 Macse502

The document outlines the course MACSE502, 'Programming for Data Science', which covers Python and Scala programming for data analysis and big data processing. It includes course objectives, outcomes, a detailed syllabus with modules on data structures, libraries, and programming language selection criteria, as well as a list of experiments and evaluation methods. Textbooks and reference materials are also provided to support the curriculum.

Uploaded by

anusreev22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views4 pages

Macse502 Programming-For-data-science Eth 1.0 83 Macse502

Uploaded by

anusreev22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Course Code Course Title L T P C

MACSE502 Programming for Data Science 3 0 2 4

Pre-requisite NIL Syllabus Version
1.0
Course Objectives
1. Master Python data structures and object-oriented programming for data analysis
and web development.
2. Dive deep into the Pandas libraryfor data processing, modelling and visualization.
3. Understand Scala functional programming concepts, data structures, object-
oriented programming and exception handling.
4. Explore Scala libraries, Apache Spark architecture and its functions for big data
processing and analytics.
5. Learn criteria for selecting the appropriate programming language.
Course Outcomes
1. Demonstrate proficiency in implementing complex data structures and user-
defined data structures in Python.
2. Analyse and manipulate data using advanced features of Pandas
3. Evaluate the use of Scala’s case classes, companion objects, and traits in
building robust applications.
4. Design and deploy scalable data processing pipelines using Scala libraries and
Apache Spark.
5. Select the most suitable programming language based on the project
requirements.
6. Apply Python and Scala programming skills to design, develop, and deploy real-
world applications.
Module:1 Python Data Structures 8 hours
Condition and Branching Statements, Built- in data structures: List, Tuple, Dictionary,
Set, User defined data structures: Stack, Queue, Priority Queue, String handling
methods, Exception Handling, Object-Oriented Concepts, APIs and Data Collection,
Simple API and REST APIs- HTTP Requests, File Handling- Read/Write Frameworks
and Libraries, NLTK, ChatterBot
Module:2 Python Libraries 9 hours
Pandas-Series, DataFrame, Handling Missing Values, Built-in functions, Data
Operations, Filtering Data in DataFrame, Data Extraction, Working with Text Data,
Merging DataFrames - Data Mining – Scrapy - Beautiful Soup - Data Processing and
Modelling: NumPy - SciPy - Pandas - Keras - Scikit - Learn - PyTorch -TensorFlow,
XGBoost, Data Visualization: Matplotlib – Seaborn – Bokeh – Plotly - Folium.
Module:3 Scala Data Structures and Object-Oriented 12 hours
Programming
Expanded Function Format, Variables and Strings, Getting user input, Numbers,
Variable types, Operators, Booleans Data Structures-Arrays, Lists, Tuple, Sets, Hash
set, Maps - Functional Combinators Map, Scala Object and Class, Anonymous
object-Singleton, Companion Object-Case Classes, Objects-Constructors-Method
Overloading - This Keyword – Inheritance - Field Overriding - Final, Abstract Class-
Trait, Trait Mixins, Access Modifiers, Scala Array-REPL
Module:4 Scala Libraries and Spark Basics 8 hours
Scala Libraries- Breeze, saddle, Exception Handling, Apache Spark Architecture -
Spark Big Data - Apache Spark Components.
Module:5 Programming language selection criteria 6 hours
Size of the Deployment: Data, Resource and Load - Security - Skill Set - The
targeted platform - The elasticity of a language - The time to production - The
performance - The support and community – Purpose – Programmer experience -
Ease of Development and Maintenance - Efficiency - Availability of an IDE -Error
Checking and Diagnosis
Module:6 Contemporary Issues 2 hours
Total Lecture hours: 45 hours
Textbooks
1 Alvaro Fuentes, Become a Python Data Analyst – By Packt Publishing
(2018)
2 Bharti Motwani, Data Analytics using Python – By Wiley (2020)
3 Jules S. Damji, Learning Spark: Lightning-Fast Data Analytics, Second
Edition – By Shroff/O'Reilly (2020)
4 Data Science and Machine Learning using Python – 10 August 2022, MGH,
2022
Reference Books
1. Tome, E., Bhattacharjee, R. and Radford, D. Data Engineering with Scala
and Spark: Build streaming and batch pipelines that process massive
amounts of data using Scala. Packt Publishing Ltd. (2024)
2. Perrin, J.-G. Spark in Action, Second Edition: Covers Apache Spark 3 with
Examples in Java, Python, and Scala. Manning. (2020)
3. Wes McKinney, Python for Data Analysis: Data Wrangling with pandas,
NumPy, and Jupyter, OReilly, (2022)
4. https://www.coursera.org/learn/python-for-applied-data-science-ai
5. https://www.datacamp.com/blog/top-python-libraries-for-data-science
6. https://www.udemy.com/course/completescala3/?couponCode=IND21PM
7.. https://www.aalpha.net/blog/factors-to-consider-when-choosing-a-
programming-language/
Mode of Evaluation: Quiz, Assignment, Design Project, Case Study, Seminar, CAT
and FAT

List of Experiments (Indicative)

1. Basic Data Manipulation with Pandas: Create a DataFrame with columns
Name, Age, and City containing data for five individuals. Perform the following
operations: select only the Name and Age columns, filter rows where Age is
greater than 25, add a new column Country with a default value, and sort the
DataFrame by Age in descending order. Finally, calculate the average age of
the individuals.
2. Data Cleaning and Preprocessing with Pandas: Create a DataFrame with
some missing values in columns Name, Age, and City. Perform the following
operations: fill missing values in the Age column with the mean age, drop
rows where Name or City is missing, and convert the Age column to integer
type. Finally, normalize the Age column using min-max scaling.
3. Creating and Manipulating Arrays: Using NumPy, create a 2D array of shape
(3, 4) with random integers between 0 and 10. Perform the following
operations: calculate the mean and standard deviation of the entire array, slice
the array to get the first two rows and last two columns, reshape the array to
shape (4, 3), and perform element-wise multiplication with another array of the
same shape.
4. Feature Engineering, Exploratory Data Analysis: Create a DataFrame with
columns Feature1, Feature2, and Target containing random data. Perform the
following operations: create a new feature that is the logarithm of Feature1,
bin Feature2 into three categories (low, medium, high), and calculate the
correlation matrix of the DataFrame. Finally, create a scatter plot of Feature1
vs. Feature2 colored by Target, and interpret any visible patterns.
5. Scrape data from a webpage and store it in a structured format like CSV or
JSON.
6. Interactive Bar Chart with Plotly. Create an interactive bar chart showing the
population of different cities.
7. Interactive Scatter Plot with Plotly. Create an interactive scatter plot showing
the relationship between house size and price.
8. Interactive Line Plot with Bokeh. Create an interactive line plot showing the
daily temperatures over a week using Bokeh.
9. Interactive Bar Chart with Bokeh: Problem Statement: Create an interactive
bar chart showing the sales figures of different products using Bokeh.
10. Interactive Scatter Plot with Bokeh: Problem Statement: Create an interactive
scatter plot showing the relationship between petal length and petal width
from the Iris dataset using Bokeh.
11. Visualizing Clusters with scikit-learn and Plotly Problem Statement: Perform
K-means clustering on the Iris dataset and visualize the clusters using an
interactive 3D scatter plot in Plotly.
12. Visualizing PCA with scikit-learn and Plotly: Problem Statement: Perform
Principal Component Analysis (PCA) on the Iris dataset and visualize the first
two principal components using an interactive 2D scatter plot in Plotly.
13. Introduction to arrays in Scala: Create an array of integers with the values 2,
5, 9, 14, 20. Write a function that takes this array and returns the sum of its
elements. Next, create an array of strings with the names of five different
fruits. Write a function that concatenates all elements of this array, separated
by commas. Finally, iterate over the array of integers and print each element
to the consol.
14. Understanding lists, sets, and tuples in Scala: Create a list of the first five
prime numbers. Write a function that takes this list and returns a new list with
each element squared. Then, create a set of unique characters from the string
"hello world" and write a function that takes two sets and returns their
intersection. Create a tuple with three elements: an integer, a string, and a
boolean, then access and print each element.
15. Handling collisions and resolving conflicts in HashMap’s: Create a HashMap
with keys representing student names and values representing their grades,
then insert multiple entries including a duplicate key with a different grade.
Write a function to merge two HashMap’s, resolving conflicts by taking the
higher grade. Finally, write a function to handle collisions by chaining, using
lists to store multiple values for a single key.
16. Exploring advanced functional combinators beyond the basic set: Create a list
of integers from 1 to 10, then use filter to create a new list with only even
numbers. Use map to create a new list where each element is multiplied by 3.
Use flatMap to create a list of tuples where each integer is paired with its
square. Finally, use foldLeft to calculate the product of all elements in the list.
17. Case Classes: Create a case class Person with fields name, age, and city.
Create a list of Person objects representing five different individuals. Write a
function to filter out people older than 30. Next, write a function that groups
people by their city. Then, write a function that transforms each Person object
into a string in the format "Name is Age years old and lives in City". Finally,
write a function to sort the list of people by age in ascending order.
18. Use Saddle to load and manipulate a dataset of fruit prices and quantities,
filtering for apples and calculating their average price. You will then visualize
the price trend over time using Vegas. This exercise covers basic data
manipulation with Saddle and interactive visualization with Vegas.
19. Use Breeze library to calculate total revenue for each fruit type by multiplying
price and quantity from a dataset. You will then create a bar chart to visualize
these revenues using Plotly. Scala. This exercise introduces numerical
computations with Breeze and dynamic visualizations with Plotly. Scala.
20. Analyse fruit prices using Spire for precise statistics and Saddle for data
manipulation. You will compute the mean and variance of prices for each fruit
and visualize these trends over time using Vegas. This exercise demonstrates
the use of Spire for numerical precision, advanced data manipulation with
Saddle, and effective visualization with Vegas.
Total hours: 30 hours
Mode of Evaluation: Continuous Assessments and FAT
Recommended by Board of Studies
Approved by Academic Council No. Date

Synthetic Indices Trading Guide
100% (12)
Synthetic Indices Trading Guide
25 pages
StartUp Engineering
100% (2)
StartUp Engineering
218 pages
Python Data Science Project Guide
No ratings yet
Python Data Science Project Guide
4 pages
Datasciencewith AI
No ratings yet
Datasciencewith AI
12 pages
PDS Merged New
No ratings yet
PDS Merged New
19 pages
Data Science Machine Learning 17054
No ratings yet
Data Science Machine Learning 17054
27 pages
Python Essentials Objectives
No ratings yet
Python Essentials Objectives
2 pages
SE327 Data Analysis Lab Manual
No ratings yet
SE327 Data Analysis Lab Manual
37 pages
Python Data Science Certificate Course
No ratings yet
Python Data Science Certificate Course
5 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
Advance Python Syllabus
No ratings yet
Advance Python Syllabus
2 pages
Data Science Lab-KTU
No ratings yet
Data Science Lab-KTU
5 pages
SEM IV - FCSP-2 - CE - Syllabus-1
No ratings yet
SEM IV - FCSP-2 - CE - Syllabus-1
5 pages
3 CSE Multidisplinary Honours 10062024
No ratings yet
3 CSE Multidisplinary Honours 10062024
11 pages
Gujarat Technological University: Overview of Python and Data Structures
No ratings yet
Gujarat Technological University: Overview of Python and Data Structures
4 pages
Numpy Notes
No ratings yet
Numpy Notes
38 pages
Machine Learning Engineer Course Curriculum PDF
No ratings yet
Machine Learning Engineer Course Curriculum PDF
40 pages
Python (Till Libraries)
No ratings yet
Python (Till Libraries)
4 pages
ML Lab File
No ratings yet
ML Lab File
33 pages
B.tech Minor Syllabus-CSE (Data Science) - Final
No ratings yet
B.tech Minor Syllabus-CSE (Data Science) - Final
17 pages
PPDA N3MC01 Syllabus 2024-25
No ratings yet
PPDA N3MC01 Syllabus 2024-25
2 pages
Data Science and Machine Learning Using Python
No ratings yet
Data Science and Machine Learning Using Python
4 pages
B.Tech - AIDS R 2021
No ratings yet
B.Tech - AIDS R 2021
31 pages
Data Analytics 2025 V2.0
No ratings yet
Data Analytics 2025 V2.0
18 pages
Python and R Data Processing Guide
No ratings yet
Python and R Data Processing Guide
6 pages
Full Stack Roadmap
No ratings yet
Full Stack Roadmap
25 pages
Project Based Experiential Learning Python For Datascience: Course Objective
No ratings yet
Project Based Experiential Learning Python For Datascience: Course Objective
2 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Ai ML Data Science Course Syllabus Brochure
No ratings yet
Ai ML Data Science Course Syllabus Brochure
10 pages
Datascienceusing Python Training
No ratings yet
Datascienceusing Python Training
11 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Advanced Data Science & AI Diploma
No ratings yet
Advanced Data Science & AI Diploma
19 pages
DS Final
No ratings yet
DS Final
46 pages
Comprehensive Data Science Guide
No ratings yet
Comprehensive Data Science Guide
10 pages
Null-Data Science Python Syllabus
No ratings yet
Null-Data Science Python Syllabus
10 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Python and Data Analytics Curriculum
100% (2)
Python and Data Analytics Curriculum
3 pages
Data Science
No ratings yet
Data Science
5 pages
Python Course Outline
No ratings yet
Python Course Outline
24 pages
MGNM801 Syllabus
No ratings yet
MGNM801 Syllabus
2 pages
3rd Sem Syllabus
No ratings yet
3rd Sem Syllabus
5 pages
DS Curriculum
No ratings yet
DS Curriculum
4 pages
Data Science & MLOps 8-Month Roadmap
No ratings yet
Data Science & MLOps 8-Month Roadmap
35 pages
Practical 1to10
No ratings yet
Practical 1to10
32 pages
Python With Data Analytics
No ratings yet
Python With Data Analytics
6 pages
Data Science Master's Curriculum
No ratings yet
Data Science Master's Curriculum
13 pages
DS Curriculum 2024
No ratings yet
DS Curriculum 2024
12 pages
Data Science: Machine Learning
No ratings yet
Data Science: Machine Learning
37 pages
# Syllabus
No ratings yet
# Syllabus
2 pages
DSBA Curriculum Guide
No ratings yet
DSBA Curriculum Guide
18 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Data Analytics at NP IT SOLUTIONS
No ratings yet
Data Analytics at NP IT SOLUTIONS
4 pages
Roadmap Data Science Track
No ratings yet
Roadmap Data Science Track
3 pages
Data Analyst - Data Engineer
No ratings yet
Data Analyst - Data Engineer
7 pages
Python Programming - Brochure
No ratings yet
Python Programming - Brochure
6 pages
AIC3 - Python For Data Analysis - Scheda
No ratings yet
AIC3 - Python For Data Analysis - Scheda
4 pages
Dokumen - Pub Artificial Intelligence Concepts and Applications 9788126519934 9788126589043
No ratings yet
Dokumen - Pub Artificial Intelligence Concepts and Applications 9788126519934 9788126589043
1,484 pages
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
No ratings yet
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
6 pages
Maeng501 Technical-Report-Writing Eth 1.0 84 Maeng501 1
No ratings yet
Maeng501 Technical-Report-Writing Eth 1.0 84 Maeng501 1
4 pages
Hyper Parameter New
No ratings yet
Hyper Parameter New
4 pages
KNN Problems New
No ratings yet
KNN Problems New
2 pages
25 Cleverly Designed Minimal Logo Designs For Inspiration - Designbeep
No ratings yet
25 Cleverly Designed Minimal Logo Designs For Inspiration - Designbeep
13 pages
Hdo6000a Operators Manual
No ratings yet
Hdo6000a Operators Manual
212 pages
Amazon Invoice for Electronics Purchase
100% (1)
Amazon Invoice for Electronics Purchase
1 page
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
No ratings yet
Service Level Management Upgrade Training: HPSM For HP Enterprise Services
32 pages
Project 12
No ratings yet
Project 12
44 pages
Block Retráctil
No ratings yet
Block Retráctil
1 page
Bits ZG553 Ec-2r First Sem 2019-2020
No ratings yet
Bits ZG553 Ec-2r First Sem 2019-2020
2 pages
Unit 01-1
No ratings yet
Unit 01-1
33 pages
AI-Powered DeFi Trading Platform
No ratings yet
AI-Powered DeFi Trading Platform
22 pages
Java Past Paper
No ratings yet
Java Past Paper
3 pages
Gamayas Portfolio
No ratings yet
Gamayas Portfolio
17 pages
Keywords and Identifiers in C
No ratings yet
Keywords and Identifiers in C
3 pages
Overview History R
No ratings yet
Overview History R
15 pages
L21 L22 Varying CTReconstruction Parameters
No ratings yet
L21 L22 Varying CTReconstruction Parameters
24 pages
Worldspan Galileo
No ratings yet
Worldspan Galileo
8 pages
Now and Get: Best VTU Student Companion You Can Get
No ratings yet
Now and Get: Best VTU Student Companion You Can Get
5 pages
Boost OEE with TPM and Pareto Analysis
No ratings yet
Boost OEE with TPM and Pareto Analysis
15 pages
Bs en Iso 1307-2008 - Hortum Ölçü Ve Tolerans Standardi
No ratings yet
Bs en Iso 1307-2008 - Hortum Ölçü Ve Tolerans Standardi
12 pages
Brain Controlled Car For Disabled
No ratings yet
Brain Controlled Car For Disabled
19 pages
!!!!!!!!!AC SINGLE PHASE INDUCTION MOTOR SPEED CONTROL U2008b PDF
No ratings yet
!!!!!!!!!AC SINGLE PHASE INDUCTION MOTOR SPEED CONTROL U2008b PDF
6 pages
(Ebook) Visualization Analysis and Design by Tamara Munzner ISBN 9781466508910, 1466508914 Download
100% (1)
(Ebook) Visualization Analysis and Design by Tamara Munzner ISBN 9781466508910, 1466508914 Download
95 pages
Bolt - New Technical Implementation Explained
No ratings yet
Bolt - New Technical Implementation Explained
12 pages
Assigning A Sound File To An Instance. Assigning A Keyboard Key To An Instance. Assigning An Image File To An Instance. All of The Above. ( )
No ratings yet
Assigning A Sound File To An Instance. Assigning A Keyboard Key To An Instance. Assigning An Image File To An Instance. All of The Above. ( )
4 pages
S01M03 TP00003SG03F6E0V Ed1 5G System Requirements
No ratings yet
S01M03 TP00003SG03F6E0V Ed1 5G System Requirements
26 pages
Bus Naming On Xilinx Schematics PDF
No ratings yet
Bus Naming On Xilinx Schematics PDF
3 pages
Skylon (Album)
No ratings yet
Skylon (Album)
4 pages
CEMS Exam Guidelines 2023
No ratings yet
CEMS Exam Guidelines 2023
1 page
PROBLEM SENSING FOR TEACHERS AND MTs
No ratings yet
PROBLEM SENSING FOR TEACHERS AND MTs
91 pages

Macse502 Programming-For-data-science Eth 1.0 83 Macse502

Uploaded by

Macse502 Programming-For-data-science Eth 1.0 83 Macse502

Uploaded by

Course Code Course Title L T P C

MACSE502 Programming for Data Science 3 0 2 4

List of Experiments (Indicative)

You might also like