Thanks to visit codestin.com
Credit goes to www.scribd.com

100% found this document useful (1 vote)
43 views29 pages

DM-Final Revision No Ansswers

Uploaded by

amira hafez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
43 views29 pages

DM-Final Revision No Ansswers

Uploaded by

amira hafez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Revision

Identify the options below that a data warehouse can include.


a. Database table
b. Online data
c. Flat files
d. All of the above
Answer: d
Where is data warehousing used?
a. Transaction system
b. Logical system
c. Decision support system
d. None
Answer: c
Data warehouse is generally updated in real-time.
a. True
b. False
Answer: b
Choose the incorrect property of the data warehouse.
a. Collection from heterogeneous sources
b. Subject oriented
c. Time variant
d. Volatile
Answer: d
Where can the data be updated?
a. Informational environment
b. Data warehouse environment
c. Operational environment
d. Data mining environment
Answer: c

1|Pa ge
The time horizon in Data warehouse is usually __________
a. 1-2 years.
b. 3-4years.
c. 5-6 years.
d. 5-10 years
Answer: d
Identify the operation which can be performed in the data warehouse.
a. Alter
b. Modify
c. Scan
d. Read/write
Answer: c
Subject Oriented: Focuses on a specific area or subject such as sales, customers, or
inventory.
a. True
b. False
Answer: a
A data warehouse is built by ……. data from various sources of data such as a
mainframe and a relational database.
a. Integrating
b. Time-variation
c. Subject orienting
d. Non-volatilizing
Answer: a
Identify the correct option which defines Datamart.
a. A subgroup of data warehouse
b. Another type of data warehouse
c. Not related to data warehouse
d. None
Answer: a

2|Pa ge
A data mart is designed to optimize the performance for well-defined and
predicable uses.
a. True
b. False
Answer: a
__________describes the data contained in the data warehouse.
a. Relational data.
b. Operational data.
c. Metadata.
d. Informational data.
Answer: c
The figure represents __________ Data warehouse
a. Star schema.
b. Snowflake schema.
c. Fact constellation.
d. None of the above
Answer: b
Fact table are which of the following?
a. Completely denormalized
b. Partially denormalized
c. Completely normalized
d. Partially normalized
Answer: a
___________ is contain multiple fact tables share dimension tables.
a. Star schema.
b. Snowflake schema.
c. Fact constellation.
d. Star-snowflake schema
Answer: c

3|Pa ge
Star schema is suited to online transaction processing, and therefore is generally
used in operational systems, operational data stores, or an EDW.
a. True
b. False
Answer: b
The drawbacks of snowflake schema are time consuming joins and report
generation slow
a. True
b. False
Answer: a
___________ is a good alternative to the star schema.
a. Star schema.
b. Snowflake schema.
c. Fact constellation.
d. Star-snowflake schema
Answer: c
Concept hierarchy is important to determine dimensions and help apply OLAP
operations
a. True
b. False
Answer: a
.........................exposes the information being captured, stored, and managed by
operational systems in data warehouse design
a. Data source view
b. Top down view
c. Business query view
Answer: a
Drill across is an OLAP operation that involves drilling across
a. more than one data base
b. more than one fact table
4|Pa ge
c. more than one data warehouse
Answer: b
Drill through operation involves drilling through
a. the bottom level of the cube to its back-end relational tables
b. the top level of the cube to its back-end relational tables
c. all levels of the cube
Answer: a
Choosing a business process to model, e.g., orders, invoices, etc. is the -------step
in the data warehouse design process
a. First
b. Second
c. Third
d. Fourth
Answer: a
Determining which operations should be performed on the available cuboids is the
first step of
a. OLAP
b. Efficient Processing OLAP Queries
c. OLAM
Answer: b
The data from the operational environment enter ........................ of data
warehouse.
a. Current detail data
b. Older detail data
c. Lightly Summarized data
d. Highly summarized data
Answer: a
Which architecture involves mini warehouses with limited scope?
a. Generic Two-Level Architecture
b. Independent Data Mart

5|Pa ge
c. Dependent Data Mart
d. Logical Data Mart
Answer: b
In a data warehouse, concept hierarchies define relationships between:
a. Data tables and data sources
b. Users and access permissions
c. High-level and low-level attributes within a dimension
d. Measures and dimensions
Answer: c
Which of the following statements about OLAP operations is TRUE?
a. Drill-down navigates to a lower level of detail by adding dimensions.
b. Drill-up removes dimensions from the analysis.
c. Both drill-down and drill-up involve moving to a higher level of
summarization.
d. OLAP operations are limited to pre-defined views of the data.
Answer: a
Which of the following statements about OLAP operations is TRUE?
a. Drill-down navigates to a lower level of detail by adding dimensions.
b. Drill-up removes dimensions from the analysis.
c. Both drill-down and drill-up involve moving to a higher level of
summarization.
d. OLAP operations are limited to pre-defined views of the data.
Answer: a
Data warehouses are operational databases used for daily transactions
a. True
b. False
Answer: b
Data warehouses are used for real-time transaction processing
a. True
b. False

6|Pa ge
Answer: b
The top-down approach to data warehouse design is always the best option.
a. True
b. False
Answer: b
Data warehouses are useful for decision making in various industries, including
finance and retail.
a. True
b. False
Answer: a
A data warehouse serves as a sole part of a plan-execute-assess "closed-loop"
feedback system for enterprise management.
a. True
b. False
Answer: a
The business query view in data warehouse design represents the information
stored inside the data warehouse
a. True
b. False
Answer: b
The top-down approach to data warehouse design starts with experiments and
prototypes.
a. True
b. False
Answer: b
What does the bottom-up approach to data warehouse design start with?
a. Overall design and planning
b. Experiments and prototypes
c. Business analysis framework

7|Pa ge
d. End-user viewpoint
Answer: b
What does the top-down approach to data warehouse design start with?
a. Overall design and planning
b. Experiments and prototypes
c. Business analysis framework
d. End-user viewpoint
Answer: a
The combined approach, an organization can exploit the planned and strategic
nature of the top-down approach while retaining the rapid implementation and
opportunistic application of the bottom-up approach
a. True
b. False
Answer: a
The bottom-up approach is useful in the early stage of business modeling and
technology development.
a. True
b. False
Answer: a
What is the primary purpose of a data warehouse?
a. Supporting online transaction processing
b. Enhancing customer relationship management
c. Facilitating management's decision-making process
d. Managing social media campaigns
Answer: c
Which characteristic of data warehousing focuses on organizing data around
subjects like sales, product, and customer?
a. Subject-oriented
b. Integration
c. Time variant

8|Pa ge
d. Nonvolatile
Answer: a
What is the primary difference between operational database data and data
warehouse data in terms of time horizon?
a. Operational database data focuses on decision making.
b. Data warehouse data provides information from a historical perspective.
c. Operational database data has a longer time horizon.
d. Data warehouse data contains inconsistencies regarding naming convention.
Answer: b
What is the characteristic of nonvolatility in data warehousing?
a. Data are stored in a read-only format and do not change over time.
b. Data are organized around subjects like sales, product, and customer.
c. Data warehouse data contains inconsistencies regarding naming convention.
d. Data are updated regularly to ensure consistency.
Answer: a
Which of the following characteristics describe a data warehouse?
a. Distributed, real-time, and volatile
b. Integrated, time-variant, and subject-oriented
c. Fragmented, static, and operation-oriented
d. Dynamic, transactional, and volatile
Answer: b
Data warehousing involves integrating multiple heterogeneous sources to ensure
inconsistency.

a. True
b. False
Answer: b
Data in operational databases provide information from a historical perspective.
a. True
b. False
9|Pa ge
Answer: b
Nonvolatility in data warehousing means that data once recorded cannot be
updated.
c. True
d. False
Answer: a
Time variant characteristic of data warehousing implies that every key structure in
the data warehouse contains an element of time.
a. True
b. False
Answer: a
OLAP allows users to explore data from various perspectives by drilling down and
rolling up.
c. True
d. False
Answer: a
Data warehousing only involves building the data warehouse itself.
c. True
d. False
Answer: b
What is the primary function of Online Analytical Processing (OLAP) systems?
a. Real-time data processing
b. Multidimensional data analysis
c. Data cleansing and transformation
d. User authentication and authorization
Answer: b
What is the primary role of Data Warehouse Systems?
a. Online transaction processing (OLTP)
b. Data analysis and decision making

10 | P a g e
c. Data storage and retrieval
d. System administration
Answer: b
What is the term used to describe systems that organize and present data in various
formats to accommodate diverse user needs?
a. Online Transaction Processing (OLTP) systems
b. Data mining systems
c. Online Analytical Processing (OLAP) systems
d. Data integration systems
Answer: c
What do concept hierarchies define in the context of dimensions?
a. Sequential analysis techniques
b. Data storage structures
c. Mappings from low-level to higher-level concepts
d. Mappings from higher-level to low-level concepts
e. Data visualization techniques
Answer: c
What are the key components of data warehousing?
a. Build Data Warehouse, Data Analysis, and Data Storage
b. Data Storage, Data Presentation, and Data Mining
c. Build Data Warehouse, Online Analysis Processing (OLAP), and
Presentation
d. Data Collection, Data Cleaning, and Data Transformation
Answer: c
Data warehouse systems primarily support transaction processing
e. True
f. False
Answer: b
Data warehouse requires two operations in data accessing
a. loading of data

11 | P a g e
b. Access of data
c. None of the above
d. A & B
Answer: d
OLAP systems allow users to analyze data only in predetermined ways.
a. True
b. False
Answer: b
Drill Up operation in OLAP involves moving from lower-level summaries to
higher-level summaries.
a. True
b. False
Answer: a
Converting data from different sources into a common format for processing is
called ________.
a. selection.
b. preprocessing.
c. transformation.
d. interpretation.
Answer: c
Getting information from a database is called ____ (reading).
a. Extracting
b. Transforming
c. Loading
d. None
Answer: a
Why do we need a data mart?
a. To speed up the queries by reducing the volume of data to be scanned.
b. To segment data into different hardware platforms.
c. To structure data in a form suitable for a user access tool.

12 | P a g e
d. All mentioned above
Answer: d
Metadata contains at least _________.
a. the structure of the data.
b. the algorithms used for summarization.
c. the mapping from the operational environment to the data warehouse.
d. all of the above.
Answer: d
Which of the following is a primary goal of a data warehouse?
a. Real-time transaction processing
b. Data security and access control
c. Data integration and consolidation
d. High availability and fault tolerance
Answer: c
What is the primary purpose of data extraction in the ETL process?
a. To clean and rectify errors in the data
b. To convert data from legacy formats to warehouse formats
c. To gather data from multiple, heterogeneous, and external sources
d. To sort, summarize, and consolidate data
Answer: c
Which step in the ETL process is responsible for detecting and rectifying errors in
the data?
a. Data extraction
b. Data cleaning
c. Data transformation
d. Load
Answer: b
During which ETL step is data converted from legacy or host formats to warehouse
formats?
a. Data extraction

13 | P a g e
b. Data cleaning
c. Data transformation
d. Load
Answer: c
What is the main function of the 'Load' step in the ETL process?
a. Gathering data from various sources
b. Detecting and correcting errors in data
c. Converting data formats
d. Sorting, summarizing, consolidating, computing views, checking integrity,
and building indices and partitions
Answer: d
Which ETL step involves propagating updates from data sources to the warehouse?
a. Data extraction
b. Data cleaning
c. Data transformation
d. Refresh
Answer: d
What does ETL stand for?
a. Extract, Transfer, and Load
b. Extract, Transform, and Load
c. Extract, Translate, and Load
d. Extract, Transmit, and Load
Answer: b
What is a data warehouse based on?
a. Relational data model
b. Multidimensional data model
c. Hierarchical data model
d. Network data model
Answer: b
What does a data cube allow in a data warehouse?

14 | P a g e
a. Viewing data in a single dimension
b. Viewing data in multiple dimensions
c. Storing data in flat files
d. Analyzing unstructured data
Answer: b
In the data cube, what are the dimensions?
a. Raw data entries
b. Summarized data points
c. Perspectives or entities with respect to which an organization wants to keep
records
d. Database tables
Answer: c
How can dimension tables be specified in a data warehouse?
a. Only manually by users
b. Only automatically by software
c. By users or experts, or automatically based on data distributions
d. Only by database administrators
Answer: c
Which schema is viewed as a collection of stars and is hence called a galaxy
schema or fact constellation?
a. Star Schema
b. Snowflake Schema
c. Fact Constellation
d. Dimensional Schema
Answer: c
What is a characteristic of the Snowflake Schema?
a. Multiple fact tables sharing dimension tables
b. A single, large and central fact table
c. Dimension tables are denormalized
d. Dimension tables are normalized, splitting data into additional tables
Answer: d

15 | P a g e
In the Star Schema, how are the fact tables and dimension tables organized?
a. Each fact table has multiple dimension tables
b. Multiple fact tables share dimension tables
c. A single fact table and one table for each dimension
d. Multiple fact tables and multiple dimension tables are combined
Answer: c
Which schema does not capture hierarchies directly but is easy to understand and
define hierarchies?
a. Star Schema
b. Snowflake Schema
c. Fact Constellation
d. Dimensional Schema
Answer: a
What are the typical steps involved in a genetic algorithm (GA)?
a. Initialization, selection, crossover, mutation, evaluation, termination
b. Evaluation, crossover, termination, mutation, selection, initialization
c. Initialization, evaluation, selection, crossover, mutation, termination
d. Crossover, mutation, initialization, selection, evaluation, termination
Answer: c
Which operation in a genetic algorithm involves randomly flipping bits within a
single element?
a. Crossover
b. Initialization
c. Mutation
d. Evaluation
Answer: c
What is the purpose of the mutation operator in a genetic algorithm?
a. To introduce new genetic material into the population
b. To select the best solutions for the next generation
c. To evaluate the fitness of the solutions
d. To perform the crossover operation

16 | P a g e
Answer: a
Which step in the genetic algorithm involves generating a new population from
selected individuals using genetic operators?
a. Evaluation
b. Initialization
c. Reproduction
d. Termination
Answer: c
Which genetic operator is used to combine parts of two parent solutions to create a
new solution?
a. Selection
b. Crossover
c. Mutation
d. Reproduction
Answer: b
Which components are typically required in a genetic algorithm?
a. A population size and a mutation rate
b. A crossover probability and a selection mechanism
c. A genetic representation of the solution domain and a fitness function
d. A termination criterion and a selection pressure parameter
Answer: c
Which of the following is a common terminating condition for genetic algorithms?
a. A specific number of iterations is reached
b. The algorithm has used all available memory
c. The hardware encounters a malfunction
d. The fitness function becomes more complex
Answer: a
Which operator in Genetic Algorithms combines parts of two elements to create
new offspring?
a. Selection

17 | P a g e
b. Crossover
c. Mutation
d. Initialization
Answer: b
Which step in Genetic Algorithms involves creating a mating pool for the next
generation?
a. Initialization
b. Fitness Evaluation
c. Selection
d. Crossover
Answer: c
The fitness function is always problem independent
a. True
b. False
Answer: b
What types of problems are genetic algorithms well-suited for?
a. Optimization problems with a large search space
b. Sorting and searching problems
c. Machine learning and pattern recognition problems
d. Database management problems
Answer:
What is a fuzzy set?
a. A set with well-defined boundaries
b. A set with binary membership values
c. A set with continuous membership values between 0 and 1
d. A set with crisp and precise values
Answer:
Which of the following is a Fuzzy Logic Operator that represents the fuzzy AND
operation?
a. Min

18 | P a g e
b. Max
c. Algebraic Product
d. Bounded Difference
Answer: a
Which of the following steps uses center of gravity (COG) and maximum
methods?
a. Fuzzification.
b. Rule valuation.
c. Defuzzification.
d. Aggregation of all outputs.
Answer: c
What is the purpose of aggregation in fuzzy logic?
a. To convert fuzzy outputs into crisp values
b. To combine the fuzzy outputs from different rules
c. To convert crisp inputs into fuzzy sets
d. To apply fuzzy rules to fuzzy input values
Answer: b
What is the purpose of defuzzification in fuzzy logic?
a. To convert fuzzy outputs into crisp values
b. To convert crisp inputs into fuzzy sets
c. To apply fuzzy rules to fuzzy input values
d. To aggregate conclusions
Answer: a
What is the purpose of fuzzification in fuzzy logic?
a. To convert crisp inputs into fuzzy sets
b. To convert fuzzy outputs into crisp values
c. To perform logical operations on fuzzy rules
d. To aggregate conclusions
Answer: b
Following is the Defuzzification method.

19 | P a g e
a. Maximum membership principle.
b. centroid method.
c. Both A&B.
d. None of the above.
Answer: c
Which step in the fuzzy inferencing process involves determining the degree to
which each rule contributes to the final output?
a. Fuzzification
b. Inferencing
c. Composition
d. Defuzzification
Answer: b
____ is the process of performing data mining on the web.
a. Data mining
b. File mining
c. Web mining
d. None of them
Answer: c
Web data includes
a. web documents
b. hyperlinks between documents
c. usage logs of web sites
d. All the above
Answer: d
Web data sets could be very large
a. True
b. False
Answer: a
Which of the following is a web content mining Agent-Based Approach?
a. Multilevel-Databases

20 | P a g e
b. Information-Categorization
c. Web-Query Systems
d. None of the above
Answer: b
The research at the hyperlink level is called Hyperlink analysis.
a. True
b. False
Answer: a
Measuring the completeness of Web sites is one of the Web Usage Mining
applications? structure t2rebn?

a. True
b. False
Answer: b
_______ is the discovery of meaningful patterns from data generated by client-
server transactions (or) from Web server logs.
a. Web Content Mining
b. Web Structure Mining
c. Web Usage Mining
d. none of them
Answer: c
The Web Usage Mining Data Preparation phase includes:
a. Data cleaning
b. User identification
c. Transaction identification
d. All the above
Answer: d
Clustering and Classification are one of the Pattern Discovery Tasks?
a. True
b. False
Answer: a
21 | P a g e
Validation and Interpretation are the Pattern Analysis Tasks
a. True
b. False
Answer: a
Which of the following is a Pattern Discovery Task?
a. Clustering and Classification
b. Interpretation
c. Association Rules
d. Both a&c
Answer: d
_______ is the process of discovering structure information from the web.
a. Web Mining
b. Web Structure Mining
c. Web Usage Mining
d. none of them
Answer: b
Web graph consists of _______
a. Web pages
b. hyperlinks
c. None of the above
d. All of the above
Answer: d
______ can be used to retrieve useful information on the web.
a. Web pages
b. hyperlinks
c. None of the above
d. Web mining
Answer: b
Web Structure Mining Contains
a. PageRank
22 | P a g e
b. Hubs and Authorities
c. Web pages
d. All of the above
e. A&B
Answer: e
The structure of typical web graph consists of Web pages as nodes, and hyperlinks
as edges connecting between two related pages
a. True
b. False
Answer: a
The type of mining can be performed in
a. document level
b. hyperlink level
c. web page level
d. A&B
e. All of the above
Answer: d
_______contains links to highly important pages.
a. Authoritative pages
b. Hub pages
c. None of the above
d. All of the above
Answer: b
Hub defines an authority as the best source for the request.
authorities
a. True
b. False
Answer: b
What’s best source for requested information?
a. Authoritative pages
b. Hub pages

23 | P a g e
c. None of the above
d. All of the above
Answer: a
Find out the relevance of each web page is one of the Web Structure Mining
applications?
a. True
b. False
Answer: a
Sequentialassosiation
Patterns is discovering correlations among pages accessed together by a
client
a. True
b. False
Answer: b
What’s of the following help to the restructure of Web site?
a. Sequential Patterns
b. Association Rules
c. Clustering and Classification
d. None of the above
e. A&B
Answer: b
What’s of the following example about Sequential Patterns?
a. e-commerce marketing strategies
b. ads
c. recommendations
d. All of the above
e. B&C
Answer: e
What’s of the following example about Association Rules?
a. e-commerce marketing strategies
b. ads

24 | P a g e
c. recommendations
d. None of the above
e. B & C
Answer: a
Interpretation is used to eliminate the irrelative rules or patterns and to extract the
interesting rules or patterns from the output of the pattern discovery process.
validation
a. True
b. False
Answer: b
Web data are _________.
a. semi-structured
b. structured
c. unstructured
d. All of the above
e. A&C
Answer: e
Which of the following statements is true about web data sets for data mining?
a. They are typically small, in the range of kilobytes to megabytes.
b. They can be easily analyzed on a single personal computer.
c. Their size (tens to hundreds of terabytes) necessitates large server farms for
processing.
d. They are inherently well-organized and require minimal pre-processing.
Answer: c
Which of the following statements is NOT true about Web Mining?
a. Web Mining is the process of performing data mining on the web by
extracting web documents and discovering patterns from them.
b. Web Mining can be performed efficiently on a single server regardless of the
size of the data sets.
c. Web data sets can be very large, ranging from tens to hundreds of terabytes.
d. Proper organization of hardware and software is necessary to mine multi-
terabyte data sets in Web Mining.

25 | P a g e
Answer: b

What does the HITS algorithm in Web Structure Mining identify?


a. The most frequently accessed web pages
b. The best sources of information and pages that link to these sources.
c. The clickstream data of users
d. The structure of a website
Answer: b
What is Web Usage Mining mainly concerned with?
a. Extracting useful patterns from web server logs and user interactions
b. Organizing web documents into structured formats
c. Analyzing the structure of the web graph
d. Creating hyperlinks between related web pages
Answer: a
What does the term 'PageRank' refer to in Web Structure Mining?
a. A method to classify web pages based on their content
b. A technique to analyze user behavior on a web page
c. A system to prioritize web pages based on their importance and the number
of pages linking to them
d. A type of database used for storing web data
Answer: c
Which technique involves transforming unstructured web data into structured
formats for easier analysis?
a. Web Content Mining
b. Agent-Based Approach
c. Database Approaches
d. Web Structure Mining
Answer: c
What is Web Content Mining?

26 | P a g e
a. The process of discovering useful information from web contents or
documents
b. The method of storing web data
c. The analysis of web usage logs
d. The organization of web servers
Answer: a
Which of the following is NOT an issue related to Web Mining?
a. Web data sets can be very large
b. Need for large farms of servers
c. Easy organization of hardware and software
d. Difficulty in finding relevant information
Answer: c
Information gain is a technique used for adding more features to the dataset.
a. True
b. False
Answer: b
What’s of the following cannot be used for reducing the number of features?
a. Frequency based
b. Information gain
c. mutual information
d. cross entropy
e. All of the above
Answer: a
Stemming helps in reducing the number of features by converting words to their
base or root forms.
a. True
b. False
Answer: a
Which of the following is used for reduces words to their morphological roots?
a. Frequency based

27 | P a g e
b. Information gain
c. mutual information
d. Stemming
e. All of the above
Answer: d
In a Boolean feature model, the frequency of a word in a document is considered.
a. True
b. False
Answer: b
The Bag of Words considers the sequence in which words occur in a document.
a. True
b. False
Answer: b
The Bag of Words considers ignores the sequence in which words occur in a
document.
a. True
b. False
Answer: a
Bag of words represent unstructured documents
a. True
b. False
Answer: a
In a multilevel database, what type of information is stored at the lowest level?
a. Structured information
b. Unstructured information
c. Semi-structured information
d. High-level abstractions
Answer: c

28 | P a g e
What is the primary characteristic of data stored at the highest level in a multilevel
database?
a. Raw, unprocessed data
b. Data organized into relations and objects
c. Semi-structured information
d. Detailed transactional data
Answer: b
Which of the following is NOT typically associated with multilevel databases?
a. Lowest level containing semi-structured information
b. Highest level containing detailed transactional data
c. High level containing generalizations from lower levels
d. Organization of data into relations and objects at the highest level
Answer: b

Thank you for your efforts

29 | P a g e

You might also like