A user-controlled automation for generating daily press briefs.
This project is a user-controlled automation (packaged to conveniently run in a docker container) for generating daily press briefs. This machinery expects only one essential input file:
-
config.yamlcontaining newspapers described by name and RSS feeds that should be included in a brief. -
and at least one of the following options for storing data:
HOST_OUTPUTandBRIEF_OUTPUTto save it locally,DROPBOX_ACCESS_TOKENto upload it to Dropbox.
Once given the above parameters, the following algorithm is used to assemble a daily brief:
-
The set of provided newspapers in the
config.yamlfile is read. -
For each of these provided newspapers, the list of news are read from that newspaper's RSS feeds.
-
All these news from all the provided newspapers are combined into one single list. This combined list is then trimmed so it fits on a few pages when news are printed (yes, that means lots of trimming!).
-
A 2-column page PDF is generated, presenting all the news obtained in the previous step. This PDF is made available through an
HOST_OUTPUTvolume for further consumption by the user.
The whole configuration is defined via a .env file containing required parameters that are used then to deploy the application locally as a Docker container or to a cloud infrastructure as an AWS Lambda function.
# AWS credentials
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_DEFAULT_REGION=...
# Pressbrief parameters
DROPBOX_ACCESS_TOKEN=...
HOST_OUTPUT=./output
BRIEF_OUTPUT=/output
LIMIT_PER_RSS=32
URL2QR=FalseAs seen above, it is also possible to configure the some secondary params, which otherwise should get sane defaults:
-
LIMIT_PER_RSS-- the maximum number of news from one RSS feed to include (step#2). Default4. -
URL2QR-- the flag indicating whether URLs should be converted to QR codes. DefaultTrue.
For a cloud deployment, the following parameters are required:
-
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYassigned to an IAM with the permissions listed below:AWSCloudFormationFullAccessAWSLambdaFullAccessIAMFullAccess- optional: to restrict the above permissions, create a custom policy with actions included in
aws-policy.jsonfile (reference: IAM JSON Policy Reference)
-
AWS_DEFAULT_REGION-- the AWS region in which resources will be created.
The deployment process are performed using one of the available scripts:
-
./docker-compose.aws.yamlfor AWS deployment viadocker-compose, -
./docker-compose.yamlfor local deployments viadocker-compose.
Both of them require Docker and Docker Compose installed.
The connection to RSS feeds is made using the Python package feedparser. The library was chosen because of its stability and regular patches.
The PDF files are generated using the Python package weasyprint, which allow to convert an HTML page to a PDF page. From many available solutions this one was chosen beacuse of CSS flexbox and columns support which made it easy to generate 2-columns page PDF files.
Uploading files to Dropbox is implemented using the Python package dropbox which is the official Dropbox API Client for integrating with the Dropbox API v2. Its use requires the creation of a Dropbox App which will allow to get an access token. The detailed instruction is available at this link.
An automated deployment to AWS is performed using a CloudFormation template which describe all resources, roles and permissions needed to execute the application. In addition, a GitHub Workflow is configured so the deployment is triggered on every push to the master branch.
The following commands might be helpful in debugging:
- to get info about the function:
docker run \
--interactive \
--env-file .env \
pressbrief-aws \
aws lambda get-function \
--function-name Pressbrief- to trigger the function:
docker run \
--interactive \
--env-file .env \
pressbrief-aws \
aws lambda invoke \
--function-name Pressbrief \
--invocation-type Event \
/dev/null