MultiDimensional Data Model
A Multidimensional Data Model is defined as a model that allows data to be organized and viewed
in multiple dimensions, such as product, time and location
Features of multi-dimensional data model
• It allows users to ask analytical questions associated with multiple dimensions which help
us know market or business trends.
• OLAP (online analytical processing) and data warehousing uses multi-dimensional
databases.
• It represents data in the form of data cubes. Data cubes allow to model and view the data
from many dimensions and perspectives.
• It is defined by dimensions and facts and is represented by a fact table. Facts are numerical
measures and fact tables contain measures of the related dimensional tables or names of
the facts.
Multidimensional Data Representation
Working on a Multidimensional Data Model
The following stages should be followed by every project for building a Multi Dimensional Data
Model:
Stage 1: Assembling data from the client
In first stage, a Multi-Dimensional Data Model collects correct data from the client. Mostly,
software professionals provide simplicity to the client about the range of data which can be
gained with the selected technology and collect the complete data in detail.
Stage 2: Grouping different segments of the system
In the second stage, the Multi Dimensional Data Model recognizes and classifies all the data to
Stage 3: Noticing the different proportions : In the third stage, it is the basis on which the design
of the system is based. In this stage, the main factors are recognized according to the user's point
of view. These factors are also known as "Dimensions". the respective section they belong to and
also builds it problem-free to apply step by step.
Stage 4: Preparing the actual-time factors and their respective qualities : In the fourth stage, the
factors which are recognized in the previous step are used further for identifying the related
qualities. These qualities are also known as "attributes" in the database.
Stage 5: Finding the actuality of factors which are listed previously and their qualities : In the
fifth stage, A Multi Dimensional Data Model separates and differentiates the actuality from the
factors which are collected by it. These actually play a significant role in the arrangement of a
Multi Dimensional Data Model.
Stage 6: Building the Schema to place the data, with respect to the information collected from
the steps above : In the sixth stage, on the basis of the data which was collected previously, a
Schema is built.
Example to Understand Multidimensional Data Model
1. Let us take the example of a firm. The revenue cost of a firm can be recognized on the basis of
different factors such as geographical location of firm's workplace, products of the firm,
advertisements done, time utilized to flourish a product, etc.
Example 1
2. Let us take the example of the data of a factory which sells products per quarter in Bangalore.
The data is represented in the table given below :
2D factory data
In the above given presentation, the factory's sales for Bangalore are, for the time dimension,
which is organized into quarters and the dimension of items, which is sorted according to the kind
of item which is sold. The facts here are represented in rupees (in thousands).
Now, if we desire to view the data of the sales in a three-dimensional table, then it is represented
in the diagram given below. Here the data of the sales is represented as a two dimensional table.
Let us consider the data according to item, time and location (like Kolkata, Delhi, Mumbai). Here
is the table :
3D data representation as 2D
This data can be represented in the form of three dimensions conceptually, which is shown in the
image below :
3D data representation
Features of multidimensional data models
• Measures: Measures are numerical values like sales or revenue that can be analyzed. They
are stored in fact tables in a multidimensional model.
• Dimensions: Dimensions are descriptive attributes like time, location, or product that give
context to measures. They are stored in dimension tables.
• Cubes: Cubes organize data into multiple dimensions, linking measures and dimensions
for fast and flexible analysis.
• Aggregation: Aggregation summarizes data (e.g., total sales by month), allowing users to
view data at different levels of detail.
• Drill-down: View data in more detail (e.g., from year → month).
• Roll-up: View data in summary (e.g., from day → quarter).
These help explore data across levels.
• Hierarchies: Hierarchies arrange dimensions into levels (e.g., Year > Quarter > Month >
Day), supporting drill-down and roll-up.
• OLAP (Online Analytical Processing): OLAP tools allow quick analysis of large data sets
using cubes, hierarchies, and aggregation for complex queries.
Advantage and Disadvantage of Data Model
Advantage Disadvantage
Easy to handle Requires skilled professionals
Simple to maintain Complex structure
Better performance than relational databases System performance drops if cache fails
More intuitive data representation (multi-viewed) Dynamic and harder to design
Handles complex systems and applications well Longer path to final output
Schemas for multidimensional data
Schema is a logical description of the entire database. It includes the name and description of
records of all record types including all associated data-items and aggregates. Much like a
database, a data warehouse also requires to maintain a schema. A database uses relational
model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema. In this
chapter, we will discuss the schemas used in a data warehouse.
Star Schema
• Each dimension in a star schema is represented with only one-dimension table.
• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a company with respect to the four
dimensions, namely time, item, branch, and location.
• There is a fact table at the center. It contains the keys to each of four dimensions.
• The fact table also contains the attributes, namely dollars sold and units sold.
Note − Each dimension has only one dimension table and each table holds a set of attributes. For
example, the location dimension table contains the attribute set {location_key, street, city,
province_or_state,country}. This constraint may cause data redundancy. For example,
"Vancouver" and "Victoria" both the cities are in the Canadian province of British Columbia. The
entries for such cities may cause data redundancy along the attributes province_or_state and
country.
Snowflake Schema
• Some dimension tables in the Snowflake schema are normalized.
• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are normalized. For
example, the item dimension table in star schema is normalized and split into two
dimension tables, namely item and supplier table.
• Now the item dimension table contains the attributes item_key, item_name, type, brand,
and supplier-key.
• The supplier key is linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.
Note − Due to normalization in the Snowflake schema, the redundancy is reduced and therefore,
it becomes easy to maintain and the save storage space.
Fact Constellation Schema
• A fact constellation has multiple fact tables. It is also known as galaxy schema.
• The following diagram shows two fact tables, namely sales and shipping.
• The sales fact table is same as that in the star schema.
• The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location.
• The shipping fact table also contains two measures, namely dollars sold and units sold.
• It is also possible to share dimension tables between fact tables. For example, time, item,
and location dimension tables are shared between the sales and shipping fact table.
Schema Definition
Multidimensional schema is defined using Data Mining Query Language (DMQL). The two
primitives, cube definition and dimension definition, can be used for defining the data
warehouses and data marts.
Difference between Star Schema and Snowflake Schema
The Star Schema and Snowflake Schema are two approaches to data warehouse design. In the
Star Schema, a central fact table is connected to dimension tables, forming a star-like structure.
This design is simpler and faster for querying. On the other hand, the Snowflake Schema
normalizes dimension tables into multiple related tables, resembling a snowflake. While it
reduces data redundancy, it can make queries more complex. The Star Schema prioritizes query
speed and simplicity, while the Snowflake Schema focuses on data normalization and storage
efficiency.
Star Schema
Star Schema is a type of multidimensional model used for data warehouses. In a star schema, the
fact tables and dimension tables are included. This schema uses fewer foreign-key joins. It forms
a star structure with a central fact table connected to the surrounding dimension tables.
Snowflake Schema
Snowflake Schema is also a type of multidimensional model used for data warehouses. In the
snowflake schema, the fact tables, dimension tables and sub-dimension tables are included. This
schema forms a snowflake structure with fact tables, dimension tables and sub-dimension tables.
Difference Between Star and Snowflake Schema
Feature Star Schema Snowflake Schema
Central fact table connected to Fact table connected to
Structure dimension tables normalized dimension tables
Data
Denormalized dimension tables Normalized dimension tables
Normalization
Feature Star Schema Snowflake Schema
Faster query execution due to fewer Slower query performance due to
Performance joins multiple joins
Complex design with multiple
Simple and easy to understand
Design Complexity levels of relationships
Uses more storage due to Uses less storage due to
Space Usage denormalization normalization
Data Redundancy Higher data redundancy Lower data redundancy
Foreign Keys Fewer foreign keys More foreign keys
Best for large datasets and quick ad- Best for structured, predictable
Use Cases hoc queries queries
High query complexity due to
Low query complexity
Query Complexity multiple joins
Easier to maintain due to simple More difficult to maintain due to
Maintainability design complexity
Feature Star Schema Snowflake Schema
Scalable but may encounter
More scalable for very large data
performance issues with large data
sets due to normalization
Scalability volumes
Better for systems that require
Ideal for BI tools and quick
Suitability for BI detailed reporting and data
reporting
Tools analysis
Lower data integrity due to Higher data integrity due
Data Integrity redundancy to normalization
Updates and More difficult to update due to Easier to update as data is
Modifications denormalization normalized
More complex to learn and
Easier to learn and implement
Learning Curve implement
Choosing Between Star Schema and Snowflake Schema
When selecting between Star Schema and Snowflake Schema, it’s important to align our choice
with our organization’s needs, data characteristics and performance expectations. Here’s a quick
guide to help we decide:
1. Star Schema
• Best for Simplicity and Speed: If we need a straightforward, easy-to-implement solution
with fast query execution, the Star Schema is ideal. It works well for small to medium
datasets where quick, simple queries are essential.
• Use Case: Perfect for scenarios with fewer dimensions and limited hierarchy levels, such
as sales data warehouses in small businesses. It allows for fast data retrieval with minimal
joins, making it suitable for quick reporting and analytics.
• Storage Considerations: Suitable when redundancy isn’t a significant issue and storage
requirements are manageable.
2. Snowflake Schema
• Best for Flexibility and Data Integrity: If we need to handle large datasets with multiple
levels of hierarchy and a high degree of normalization, the Snowflake Schema offers
greater flexibility. It’s perfect for maintaining data integrity across complex datasets.
• Use Case: Ideal for large organizations dealing with large, normalized datasets or those
with frequent updates, like customer or inventory management systems. It minimizes
redundancy and improves storage efficiency.
• Storage Considerations: Snowflake is more storage-efficient due to its normalized
structure, making it a great choice for scenarios with complex, high-volume data.
Which Schema is Right for You?
• If simplicity and speed are our priorities, the Star Schema is a better fit.
• If we need to handle complex data with frequent updates while minimizing storage, the
Snowflake Schema is more suitable.