- Docker
- Docker Compose
- Git
The project consists of the following services:
- Postgres
- Adminer
- Metabase (Analytics)
- Prometheus (Monitoring)
- Grafana (Visualization)
git clone cd clifford-frempong-de-task/In the root directory of the project, create a .env file. This file will securely store sensitive credentials and configuration settings required for the project.
Use the structure below to configure your environment variables:
# ==========================
# API Configuration
# ==========================
API_KEY="your_api_key_here" # New York Times API Key
API_SECRET="your_api_secret_here" # New York Times API Secret (if required)
BASE_URL="https://api.nytimes.com/svc/books/v3/lists/overview.json" # NYT API Base URL
# ==========================
# Database Configuration
# ==========================
DB_HOST="localhost" # Database Host (e.g., localhost or IP address)
DB_PORT="5432" # Database Port (default for PostgreSQL is 5432)
DB_USER="your_db_username" # Database Username
DB_PASSWORD="your_db_password" # Database Password
DB_NAME="your_database_name" # Database Name
# ==========================
# Grafana Configuration
# ==========================
GF_USER="your_db_username" # Grafana Username
GF_PASSWORD="your_db_password" # Grafana Password
# ==========================
# Data Extraction Settings
# ==========================
start_date="YYYY-MM-DD" # Data extraction start date (e.g., 2021-01-01)
end_date="YYYY-MM-DD" # Data extraction end date (e.g., 2023-12-31)
# ==========================
# Marquez Settings
# ==========================
MARQUEZ_NAMESPACE="namespace for marqueez" # Marqueez namespaceA .env-example file is provided in the project for guidance. Copy it and rename it to .env:
cp .env-example .envThen, update it with your actual credentials.
Ensure the .env file is NOT committed to version control. The .gitignore file should already exclude .env, but double-check to prevent accidental exposure of sensitive information.
# .gitignore
.envdocker-compose up -builddocker compose ps
If all is well, you'll have everything running in their own containers, with a load generator script configured to load data from 2021-2023 into the postgres db
Access the Postgres UI and confirm the presence of 7 tables dim_book, dim_date, dim_list, dim_publisher, fact_book_rankings, fact_publisher_performance and load_status with data in them. Use "postgres" as the username and password when logging in.

-
In a browser, go to Metabase UI
-
Click Let's get started.
- Complete the first set of fields asking for your email address. This information isn't crucial for anything but does have to be filled in
- On the Add your data page, select Postgres and fill in the following information:
- Proceed past the screens until you reach your primary dashboard
- Click New
- Click SQL query
-
From Select a database, select DeelCompanyDB.
-
In the query editor, enter:
SELECT * FROM load_status;
-
You can save the output and add it to a dashboard:
- Which book remained in the top 3 ranks for the longest time in 2022? description:
WITH consecutive_rankings AS (
SELECT
b.title,
b.author,
fr.rank,
d.full_date,
COUNT(*) OVER (PARTITION BY b.book_key) as weeks_in_top_3
FROM fact_book_rankings fr
JOIN dim_book b ON fr.book_key = b.book_key
JOIN dim_date d ON fr.date_key = d.date_key
WHERE d.year = 2022
AND fr.rank <= 3
)
SELECT
title,
author,
weeks_in_top_3,
MIN(full_date) as first_appearance,
MAX(full_date) as last_appearance
FROM consecutive_rankings
GROUP BY title, author, weeks_in_top_3
ORDER BY weeks_in_top_3 DESC
LIMIT 1;- Which are the top 3 lists to have the least number of unique books in their rankings for the entirety of the data? description:
SELECT
l.list_name,
COUNT(DISTINCT b.book_key) as unique_books_count
FROM fact_book_rankings fr
JOIN dim_list l ON fr.list_key = l.list_key
JOIN dim_book b ON fr.book_key = b.book_key
GROUP BY l.list_key, l.list_name
ORDER BY unique_books_count ASC
LIMIT 3;- Publishers are ranked based on how their respective books performed on this list. For each book, a publisher gets points based on the best rank a book got in a given period of time. The publisher gets 5 points if the book is ranked 1st, 4 for 2nd rank, 3 for 3rd rank, 2 for 4th and 1 point for 5th. Create a quarterly rank for publishers from 2021 to 2023, getting only the top 5 for each quarter. description:
WITH publisher_points AS (
SELECT
p.publisher_key,
p.publisher_name,
d.year,
d.quarter,
d.quarter_name,
SUM(CASE
WHEN fr.rank = 1 THEN 5
WHEN fr.rank = 2 THEN 4
WHEN fr.rank = 3 THEN 3
WHEN fr.rank = 4 THEN 2
WHEN fr.rank = 5 THEN 1
ELSE 0
END) as total_points,
ROW_NUMBER() OVER (
PARTITION BY d.year, d.quarter
ORDER BY SUM(CASE
WHEN fr.rank = 1 THEN 5
WHEN fr.rank = 2 THEN 4
WHEN fr.rank = 3 THEN 3
WHEN fr.rank = 4 THEN 2
WHEN fr.rank = 5 THEN 1
ELSE 0
END) DESC
) as quarterly_rank
FROM fact_book_rankings fr
JOIN dim_book b ON fr.book_key = b.book_key
JOIN dim_publisher p ON b.publisher_key = p.publisher_key
JOIN dim_date d ON fr.date_key = d.date_key
WHERE d.year BETWEEN 2021 AND 2023
AND fr.rank <= 5
GROUP BY p.publisher_key, p.publisher_name, d.year, d.quarter, d.quarter_name
)
SELECT
publisher_name,
year,
quarter_name,
total_points
FROM publisher_points
WHERE quarterly_rank <= 5
ORDER BY year, quarter, quarterly_rank;- Two friends Jake and Pete have podcasts where they review books. Jake's team reviews the book ranked first on every list, while Pete’s team reviews the book ranked third. Both of them share books, if Jake’s team wants to review a book, they first check with Pete’s before buying and vice versa. Which team bought what book in 2023? description:
WITH team_books AS (
SELECT
b.title,
b.author,
l.list_name,
d.full_date,
fr.rank,
CASE
WHEN fr.rank = 1 THEN 'Jake'
WHEN fr.rank = 3 THEN 'Pete'
END as reviewer,
ROW_NUMBER() OVER (
PARTITION BY b.book_key
ORDER BY d.full_date
) as first_appearance
FROM fact_book_rankings fr
JOIN dim_book b ON fr.book_key = b.book_key
JOIN dim_list l ON fr.list_key = l.list_key
JOIN dim_date d ON fr.date_key = d.date_key
WHERE d.year = 2023
AND fr.rank IN (1, 3)
)
SELECT
reviewer as purchased_by,
title,
author,
list_name,
full_date as purchase_date
FROM team_books
WHERE first_appearance = 1
ORDER BY full_date, list_name;Prometheus collects metrics from PostgreSQL via the Postgres Exporter. Access Prometheus at:
Check under Status → Target Health to ensure PostgreSQL metrics are being scraped.

Grafana visualizes metrics collected by Prometheus. Access Grafana at:
Login Credentials:
- Username:
admin - Password:
admin
- Log in to Grafana
- Navigate to Connections → Data Sources → Add Data Source
- Select Prometheus
- Go to Dashboards → Import
- Enter Dashboard ID
9628 - Set Prometheus as the data source
Each service operates within its own container, enabling isolation of any issues that may arise. This ensures that if one component encounters a problem, it won't affect the functioning of the others.
Data Lineage Tracking with OpenLineage
Metadata Management & Visualization with Marquez















