COMSATS University Islamabad, Wah Campus
Terminal Examinations Fall 2020
Department of Computer Science
Program/Class: BSSE-7 Date: 13th January, 2021
Subject: Data Warehousing and Data Mining Instructor: Mamuna Fatima
Total Time Allowed: 3 hrs Maximum Marks: 50 marks
Student Name: Registration#:
_______________________________ _______________________________
Instructions:
Attempt all questions.
BE PRECISE AND DON’T COPY.
Attempt the paper in sequence as given in question paper. Make sure to write
actual question number on the answer sheet.
Each question must be attempted on A4 sheet (HAND-WRITTEN) with
following personal information on the top of each sheet:
STUDENT NAME, REGISTRATION NUMBER, CNIC NUMBER,
CONTINUATION SHEET NUMBER (E.G., 1/4, 2/4 …), SIGNATURE.
Submit one pdf file on CUOnline on time. Late submissions will not be
accepted. So take care of time.
Question No 1 [10 marks]
10 MCQS on MS Teams—Time allowed 15 minutes
Long Questions
Question 02 [CLO 2] [10 marks]
A multinational company has stores in several regions. They would like to track profit
information across different departments (Video Sales and Video Rentals) and regions
(East, West, Central) in different years (e.g. 2011 and 2012). Design an appropriate
data warehouse schema using the star multi-dimensional model and discuss the fact
and dimension tables you would need. Would you need/recommend a snowflake
schema? Explain your views.
Question 03 [CLO 4] [10 marks]
Given the data in the following table. Apply Hierarchical Agglomerative clustering
using Average link. Use Euclidean distance as distance measure and draw
dendrogram. Clearly mention all steps and calculations.
S# x y
1 0.40 0.53
2 0.A2 0.38
3 0.35 0.32
4 0.26 0.C9
5 0.B8 0.41
6 0.45 0.30
Where,
A= Use (reg_no_last2digits % 5) + 1
B= Use (reg_no_last2digits % 5) + 2
C= Use (reg_no_last2digits % 5) + 3
Instruction: No marks will be given if numbers used other than your own registration
number.
Question 04 [CLO 4] [10 marks]
Apply the Decision Tree(DT) algorithm to the training data in below table and
clearly mention all steps and calculations of Entropy and Information Gain. Make
Decision Tree after applying calculations.
S# Holiday Weather Paper Picnic
(Category)
1 Yes Rainy easy No
2 Yes Rainy Difficult No
3 Yes Rainy Difficult Yes
4 Yes Sunny Difficult Yes
5 Yes Sunny easy Yes
6 Yes Sunny easy No
7 Yes Rainy Difficult No
8 Yes Sunny Difficult Yes
9 No Sunny Difficult No
10 No Rainy Difficult No
11 No Sunny easy No
a) Which attribute would information gain choose as the root of the tree?
b) Draw the decision tree that would be constructed by recursively applying
information gain to select roots of sub-trees, as in the Decision-Tree-Learning
algorithm.
c) Generate Decision rules from decision tree.
Question 05 [CLO 4] [10 marks]
Run the FP Growth algorithm on the following transactional database with minimum
support equal to 50%. Mine the FP-tree and extract the set of frequent patterns.
Show step by step execution.
TID ITEMS
T1 {A,B,C,E}
T2 {B,D,E,F}
T3 {A,B,C,D,F,G}
T4 {A,B,C,D,E,G,H}
T5 {A,C,D,E,H}
Good Luck