BHARATI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI
Department of Computer Engineering
CLASS -TE SEM-V
Subject : Data Warehousing and Mining
UT-I Question Bank
1. Costruction of Snowflake and Star Schema
Given a problem statement, construct both a Star Schema and a Snowflake Schema with
appropriate dimension tables and a fact table.
2. Explain the differences between OLTP and OLAP.
Explain the key differences between Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) with suitable examples.
3. Explain various ETL operations.
Discuss the Extract, Transform, Load (ETL) process in detail, and explain its role in data
warehousing.
4. Explain the architecture of a data warehouse.
Describe the typical architecture of a data warehouse and its main components.
5. Discuss various issues and applications of Data Mining.
Elaborate on the major issues in data mining and highlight its real-world applications.
6. Short note on data preprocessing and phases of data cleaning.(Handling missing
data noisy data)
Describe the steps involved in data preprocessing and explain different phases of data cleaning.
7. Dimensionality reduction and data discretization.
Discuss the methods used for dimensionality reduction and explain the concept of data
discretization.
8. Numerical problem based on Decision Tree.
Solve a numerical problem using a given dataset to construct a Decision Tree.
9. Short note on classification and clustering accuracy.
Briefly explain how the accuracy of classification and clustering algorithms is measured and
compared.
10. Write a short note on data pruning.
Define data pruning and explain its role in decision tree construction and model optimization.
BHARATI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI
Department of Computer Engineering
CLASS -TE SEM-V
Subject : Data Warehousing and Mining
Assignment 01
1. What are the basic building blocks of data warehouse?
2. Compare OLTP and OLAP.
3. Differentiate between star schema and snowflake schema. Design star schema
4. Differentiate between top down and bottom-up approaches for building data warehouse.
5. Discuss the data visualization technique.
6. Explain issues in data mining.
7. Explain data pre-processing.
8. Explain the steps involved in data mining when viewed as a process of knowledge
discovery.
9. Explain decision tree-based classification approach with example. Discuss
metrics for evaluating classifier performance.
Assignment 02
1. What are the different types of data handled in cluster analysis? Give examples.
2. Explain agglomerative hierarchical clustering with an example dendrogram.
3. What is market basket analysis? Give one real-world application.
4. Explain the concept of an association rule with an example and define support and
confidence.
Describe the steps of the Apriori algorithm for frequent itemset generation.
5. Compare web content mining, web structure mining, and web usage mining in tabular
form.
BHARATI VIDYAPEETH COLLEGE OF ENGINEERING,
NAVI MUMBAI
Department of Computer Engineering
CLASS -TE SEM-V
Subject : Data Warehousing and Mining
Module wise Question Bank
Module-1
1. Define a data warehouse and explain its key characteristics.
2. Draw and explain a typical data warehouse architecture.
3. Differentiate between a data warehouse and a data mart with examples.
4. Compare E-R modeling and dimensional modeling in the context of data
warehousing.
5. What is an information package diagram? Explain its use in dimensional modeling.
6. Differentiate between star schema, snowflake schema, factless fact table, and fact
constellation schema with neat diagrams.
7. What is meant by updating dimension tables? Explain slowly changing dimensions
(SCD) with types.
8. List and briefly describe the major steps in the ETL process.
9. Compare OLTP and OLAP systems in terms of purpose, design, and usage.
10. Explain slice, dice, roll-up, drill-down, and pivot operations in OLAP with
examples.
Module-2
1. What are data mining task primitives? Give examples for each type.
2. Draw and explain the architecture of a data mining system.
3. List and explain the main steps of the KDD (Knowledge Discovery in Databases)
process.
4. What are the major issues in data mining? Explain any four in detail.
5. Give at least five applications of data mining in different domains.
6. List the types of attributes in data mining and give one example of each.
7. Explain statistical description of data using mean, median, mode, variance, and
histogram.
8. Describe at least three data visualization techniques used in data mining.
9. Explain the steps of data preprocessing, including cleaning, integration,
transformation, reduction, and discretization.
10. What is concept hierarchy generation? Explain its role in data discretization.
Module3:-
1. Define classification in data mining and give two real-life applications.
2. Explain the basic concepts of decision tree induction.
3. Draw and explain the working of a decision tree using a small dataset example.
4. Describe the Naïve Bayesian classification algorithm with an example.
5. What are accuracy and error measures in classification? Explain any two.
6. Explain the holdout method for evaluating the accuracy of a classifier.
7. What is random subsampling? How does it differ from the holdout method?
8. Describe the process of k-fold cross-validation with an example.
9. Explain the bootstrap method for classifier evaluation.
10. Compare cross-validation and bootstrap in terms of advantages and limitations
Module-4
1. What are the different types of data handled in cluster analysis? Give examples.
2. Explain the Euclidean distance and Manhattan distance measures used in clustering.
3. Describe the k-means clustering algorithm with steps.
4. What are the limitations of the k-means algorithm?
5. Explain the k-medoids clustering method and compare it with k-means.
6. Differentiate between partitional and hierarchical clustering methods.
7. Explain agglomerative hierarchical clustering with an example dendrogram.
8. Explain divisive hierarchical clustering and compare it with agglomerative.
9. What is the role of a proximity (similarity/dissimilarity) matrix in hierarchical
clustering?
10. Compare k-means, k-medoids, agglomerative, and divisive methods in a tabular
format
Module-5
1. What is market basket analysis? Give one real-world application.
2. Define frequent itemset and closed itemset with examples.
3. Explain the concept of an association rule with an example and define support and
confidence.
4. Describe the steps of the Apriori algorithm for frequent itemset generation.
5. How are association rules generated from frequent itemsets?
6. List and explain at least three techniques for improving the efficiency of Apriori.
7. Explain the concept of frequent pattern mining without candidate generation (FP-
Growth method).
8. What are multilevel association rules? Give an example.
9. Explain multidimensional association rules with a suitable example.
10. Compare Apriori and FP-Growth in terms of working and efficiency.
Module-6
1. Define web mining and list its three main categories.
2. What is web content mining? Give one real-life application.
3. Explain the role of web crawlers in content mining.
4. What is a harvest system in web mining?
5. Describe the concept of a virtual web view and its importance.
6. What is personalization in the context of web content mining? Give an example.
7. Explain web structure mining and the working of the PageRank algorithm.
8. Describe the CLEVER algorithm and how it differs from PageRank.
9. What is web usage mining? List any two techniques used for it.
10. Compare web content mining, web structure mining, and web usage mining in
tabular form.
Refer University question papers for numericals