- Project Overview
- Running the App Locally
- Deployment
- API Access & Authentication
- Architecture & Code Structure
- Database Design & Migrations
- Testing & CI/CD
- Caching Strategy
- Logging & Observability
- Next Steps & Further Improvements
This analytics platform:
- Loads and validates dataset configurations and associated SQL statements from a single unified JSON data source.
- Maps datasets to SQL and executes statements against a PostgreSQL database using a code-first SQLAlchemy ORM approach.
- Manages database sessions explicitly using a unit of work pattern for clear transaction boundaries.
- Caches complex aggregate query results with a TTL-based memory cache to optimize performance while avoiding memory bloat.
- Applies dependency injection via the
punqcontainer to keep modules decoupled and testable. - Enforces strict API contracts with Pydantic models validating input/output schemas.
- Drives API behavior with BDD-style acceptance tests for maintainability and regression safety.
- Secures endpoints with AWS Cognito JWT authentication, validating tokens against well-known keys.
- Supports containerized deployment with Docker and automated CI/CD pipelines via GitHub Actions.
- Integrates a mock AI endpoint simulating LLM-generated SQL to dynamically add new datasets.
To run the application locally with Docker and Poetry:
-
Install dependencies:
poetry install
-
Prepare the application directory:
cd ~/app || (mkdir ~/app && cd ~/app)
-
Build the Docker image:
docker build -t analytics-platform . -
Run a PostgreSQL container (for database):
docker run --rm -p 5432:5432 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=testdb postgres
-
Run the FastAPI app container:
docker run -d --restart unless-stopped --name analytics-platform --env-file ../.env -p 8000:8000 analytics-platform
You can then access the API locally at http://localhost:8000.
The entire infrastructure is deployed using CloudFormation IAC templates located at /deploy-template.yml. Key deployment details:
- EC2 instance running Docker with the FastAPI application.
- PostgreSQL database hosted on AWS RDS.
- All components reside within a secure VPC with properly configured inbound and outbound rules.
- Secure SSH access configured for CI/CD pipeline integration.
- Uvicorn runs the FastAPI app locally on EC2, fronted by NGINX reverse proxy handling HTTPS termination.
- AWS Cognito manages authentication via a user pool, supporting OAuth2 flows.
- Logs are streamed to AWS CloudWatch and accessible directly via SSH on the EC2 instance.
-
Live Swagger docs available at:
https://data.aphextwinning.com/ -
Token generation endpoint (AWS Cognito OAuth2):
https://3eef4c69-d944-4d39-837f-97f3f72f5f93.auth.eu-west-2.amazoncognito.com/oauth2/token -
Client credentials for token generation:
- Client ID:
7gkl7in37d96ijhi1901gln0nv - Required scope:
api/platform_access - Auth: Basic Auth with Base64 encoding of
client_id:client_secret
- Client ID:
-
Authenticated endpoints validate JWT tokens using Cognito’s
.well-knownkeys.
-
Modular codebase split into logical domains:
src.core— pure domain models (dataclasses) and protocol interfacessrc.application— services orchestrating business logic and LLM prompt generationsrc.infrastructure— ORM, DB session, caching, and repository implementationssrc.web— FastAPI routing, middleware, and request/response modelssrc.crosscutting— shared concerns like logging and error handling
-
Uses Pydantic for input validation and response models.
-
Dependency injection is managed via the punq container in
src.bootstrap. -
The code favors explicit unit of work patterns for database transactions to reduce boilerplate and improve clarity.
-
Runtime polymorphism and duck typing improve flexibility without sacrificing static type hints or readability.
-
Schema designed to mirror the provided JSON data.
-
Uses Alembic for managing database migrations and version control.
-
Imperative mapping with SQLAlchemy separates domain models from ORM models.
-
No database-level constraints or triggers; lifecycle and business logic handled fully in code.
-
Unit of work pattern (
SqlAlchemyUnitOfWork) controls session lifecycle with explicit commits and implicit rollbacks.
class SqlAlchemyUnitOfWork:
__slots__ = "session_factory", "logger", "session"
def __init__(self, settings: Settings, logger: Logger):
self.logger = logger
engine = sqlalchemy.ext.asyncio.create_async_engine(
settings.DATABASE_URL,
echo=False,
future=True,
)
self.session_factory = async_sessionmaker(
bind=engine,
expire_on_commit=False,
class_=AsyncSession,
)
async def __aenter__(self):
self.session = self.session_factory()
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
try:
if exc_type:
await self.session.rollback()
finally:
await self.session.close()- Automated regression tests written with
unittestin BDD style, located intests.test_scenarios.
def test_create_metrics(self):
scenario = CreateMetricConfigurationScenario(self.context)
scenario \
.given_i_have_an_app_running() \
.when_the_create_metric_configuration_endpoint_is_called_with_metric_configuration() \
.and_data_is_created_for_the_metric() \
.then_the_status_code_should_be(201) \
.then_the_metrics_should_have_been_created() \
.then_an_info_log_indicates_endpoint_called()-
Tests spin up an in-memory, isolated application lifecycle for parallel execution without cross-test contamination.
-
Shared state managed via concurrent-safe in-memory structures.
-
GitHub Actions CI/CD pipeline (
.github/workflows/cicd.yml) automates:- Running regression tests on every PR.
- Deploying code and CloudFormation changes automatically.
- Securely injecting environment variables and CloudFormation outputs into EC2.
- Remote Docker container restarts via SSH and SCP file transfers.
-
Implemented basic in-memory caching with TTL for metric configurations and query aggregates.
-
This reduces repeated expensive computations for data unlikely to change during runtime.
-
TTL is critical to prevent out-of-memory issues within the container, balancing performance and scalability.
-
Custom middleware captures detailed audit logs, error tracing, and request-scoped variables.
-
Logs are sent to AWS CloudWatch with enhanced filtering and live observability.
-
EC2 instances also allow direct log access over SSH via standard output streaming.
-
Endpoint-level logs include scoped context propagation to facilitate root-cause analysis.
with logging_scope(
operation=get_metrics.__name__,
id=id_str,
start_date=start_date,
end_date=end_date,
day_range=day_range,
):
logger.info("Endpoint called")
metrics = await get_metrics_service(
_id=id_str,
start_date=start_date,
end_date=end_date,
day_range=day_range
)