This is the Back-end of a news website, which includes a web crawler that can crawl news from zoomit.ir website.
- The project is accessible through the links below:
-
Python
-
PostgreSQL
-
Clone the repo
git clone https://github.com/farzanmosayyebi/TechNews
-
Navigate to
srcdirectorycd src -
Install the requirements
pip install -r requirements.txt
-
Apply migrations
Note: First, You will need to create the PostgreSQL database and set the environment variables in a file named
.envwith the following format insrcdirectory:TechNews/src/.env:SECRET_KEY=your-secret-key DB_NAME=your-db-name DB_USER=your-db-user DB_PASSWORD=your-db-password DB_HOST=your-db-host DB_PORT=your-db-portThen run
python manage.py migrate
-
To start the project, in
srcdirectory runpython manage.py runserver
- Url to see the Swagger UI
127.0.0.1:8000/swagger/
-
In
srcdirectory run- Windows
python manage.py test ..\tests
- Linux/MacOS
python manage.py test ../tests
-
In
srcdirectory, runpython manage.py crawl --limit <number-of-items-to-scrape>
- This is a custom django
Commandwhich crawls the specified number of items fromzoomit.irwebsite. The default number is 500.
- To crawl 50 items
python manage.py crawl --limit 50
- This is a custom django
-
In root directory of the project, run:
docker compose up
Note: You need to provide a file named
app.env(using--env-file) that contains the environment variables for the project.
About Dockerfiles:- Two dockerfiles are implemented :
- Dockerfile.base: Which is the base file that only installs dependencies. Backend, celery-beat and celery-flower containers will be run upon the image built from this file.
- Dockerfile.worker: This file also installs Google Chrome and needed packages in order to be able to run selenium in celery workers. Celery-worker container will be run upon the image built from this file.
- Two dockerfiles are implemented :
At startup, 500 news will be crawled from zoomit.ir. After that, celery beat is scheduled to push crawl tasks to message queue daily at midnight. which means everyday at midnight, 60 news items will be crawled from zoomit.ir.
- Distributed under the MIT License. See
LICENSEfor more information.