CSC4316 – Data Management II
Assignment – In-Memory Data Modelling and Analysis
This Assignment is an individual piece of work. All Assignment will be screened for evidence of Academic Misconduct
(e.g. Collusion and Plagiarism).
Hand-in Date Tuesday 18 Feb. 25 – Friday 28 March. 25
Learning Outcomes This assignment assesses learning outcomes of lecture 1 and 3:
Assessed 1. Identify and explain the main concepts and key components of a data warehouse.
3. Explain and analyse the key techniques of data warehousing applications and OLAP.
Deliverables A Word document + An Excel Workbook.
Submission Method You should submit 2 files: ONE Word document + ONE Excel Workbook (Please DO NOT
zip)
Submission deadline is 4:00PM – a 30 minute grace period will be in the event you have
any technical issues that prevent you from submitting at the deadline time.
Overall Course This Assignment contributes 40% of the overall course grade.
Grade Your overall assignment grade will obtained using the grid on page 3.
Questions? Contact Abdullahi Ahamad Shehu, [email protected]
Notes on Penalties, Assignment submitted late without prior approval will be recorded as a Non-Submission
Extensions and (NS).
Deferrals If you, for genuine reasons, are unable to meet the submission date/time please note that
Assignment Extension requests must be submitted 24 hours before the submission date
and time to my email.
Please read the entire Assignment specification carefully before starting the Assignment. If any aspect of what you
are being asked to do is not clear, seek advice and assistance from the Course Lecturer.
Background
A film club keeps a record of films, recommended for viewing by its members, in a single Excel spreadsheet
(Assignment.xlsx). The club recognises that this is not ideal and so they want you to organise the data in a way that is
more efficient and better supports data analysis.
To complete this Assignment, you can either:
(i) use Excel in your own machine. I will provide you with the Assignment.xlsx file
Task 1: Data Modelling
Using Power Query and PowerPivot, reorganise the data into a, star schema, in-memory, data warehouse that will
allow for an analysis of films by various dimensions which you need to identify from the given data. You should also
add a time dimension to allow for analysis by year, quarter, month and day. Use a rolling calendar that starts from the
date of the first released film (21/12/1978) to the current date. (20 marks)
Deliverables of Task 1:
1.1 Excel Workbook containing your solution. NOTE: Please make sure when creating that workbook, you place it in
the same folder as Assignment.xlsx to avoid any file pathing issues when I open it on my end.
1.2. Word document include the following:
(a) List of Tables, Columns, Primary Keys and Foreign Keys
Table Name Nbr of rows Fact or Dimension Columns
Column 1
Column 2
Table 1 … Fact
…
…
… …
(b) List of Relationships
Table Name Primary Key Table Foreign Key
Table 1 Column… Table 2 Column…
…
(c) A discussion of your data warehousing solution, its strengths and weaknesses compared to a solution that uses
Microsoft SQL Server Integration Services (SSIS). (300 words max)
Task 2: Data Analysis
2.1 To support the film club with their film analysis, you need to create the following calculated columns or measures
(as appropriate) using PowerPivot and DAX formulae. (30 marks)
1. Years since release to show the number of years (to date) since a film was released.
2. Number of PG Cert films to show the total number of PG certificate films produced.
3. Film duration to show the following values:
a. "Long" when the film’s run time is greater than 150 minutes,
b. "Medium" when the film’s run time is greater than 100 minutes but less or equal 150, c.
"Short" otherwise.
4. Profit is the total box office dollars amount made by films minus the total budget dollars amount.
5. Director Ranking to show the rank of each director (director name) by profit.
6. YTD Profit to show the Year-To-Date profit.
2.2 Answer the following questions related to the above calculations (use PowerPivot tables to help you evidence
your answers): (10 marks)
1. What is the name of the actor with 10 casts?
2. What is the percentage of female casts (as a percentage of the grand total of casts)?
3. What is the ranking, in terms of films profit, of director Steven Spielberg?
Add a new worksheet inside your Excel file (call it Business Intelligence) and create the following: (20 marks)
• PivotTable 1 & PivotChart1: include a pivotable and its related PivotChart to show the Studios sorted by
descending order of the number of PG Certificate films produced.
• PivotTable 2: The second pivotable should show, for a year 2006 filter, the daily profits and year to date
profits.
Use data bars to enhance the visualisation of the YTD Profit column.
Deliverables of Task 2:
2.1. Excel Workbook: Add your solution to task 2 to the same Excel workbook started in task 1.
2.2. Word document: Add your solution to task 2 to the same Word document started in task 1. You should
include the following:
(a) List of DAX Formulae for Calculated Columns (CC) and Measures (M)
Name of CC or M CC or M Formula
Years since release …
…
(b) Answers to Questions:
Question Answer + Evidence (screenshot) showing how the answer is obtained
name of the actor with 10 casts …
…
(c) Screenshot of PivotTable 1, PivotChart 1 and PivotTable 2.