Tech News

About The Project

This is the Back-end of a news website, which includes a web crawler that can crawl news from zoomit.ir website.

The project is accessible through the links below:
- Deployed website (API)
- Celery Flower

(back to top)

Built With

(back to top)

Getting Started

Prerequisites

Python
PostgreSQL

Installation

Clone the repo

git clone https://github.com/farzanmosayyebi/TechNews

Navigate to src directory
```
cd src
```
Install the requirements
```
pip install -r requirements.txt
```
Apply migrations

Note: First, You will need to create the PostgreSQL database and set the environment variables in a file named .env with the following format in src directory:

TechNews/src/.env:
```
SECRET_KEY=your-secret-key
DB_NAME=your-db-name
DB_USER=your-db-user
DB_PASSWORD=your-db-password
DB_HOST=your-db-host
DB_PORT=your-db-port
```
Then run
```
python manage.py migrate
```

Usage

To start the project, in src directory run
```
python manage.py runserver
```
- Url to see the Swagger UI
```
127.0.0.1:8000/swagger/
```

Running the tests

In src directory run

Windows

python manage.py test ..\tests

Linux/MacOS

python manage.py test ../tests

Running the crawler

In src directory, run
```
python manage.py crawl --limit <number-of-items-to-scrape>
```
- This is a custom django Command which crawls the specified number of items from zoomit.ir website. The default number is 500.
Example
- To crawl 50 items
```
python manage.py crawl --limit 50
```

(back to top)

Running with docker compose

In root directory of the project, run:
```
docker compose up
```
Note: You need to provide a file named app.env (using --env-file) that contains the environment variables for the project.

About Dockerfiles:
- Two dockerfiles are implemented :
  - Dockerfile.base: Which is the base file that only installs dependencies. Backend, celery-beat and celery-flower containers will be run upon the image built from this file.
  - Dockerfile.worker: This file also installs Google Chrome and needed packages in order to be able to run selenium in celery workers. Celery-worker container will be run upon the image built from this file.

(back to top)

Crawler schedule

At startup, 500 news will be crawled from zoomit.ir. After that, celery beat is scheduled to push crawl tasks to message queue daily at midnight. which means everyday at midnight, 60 news items will be crawled from zoomit.ir.

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile.base		Dockerfile.base
Dockerfile.worker		Dockerfile.worker
LICENSE		LICENSE
README.md		README.md
compose.yaml		compose.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tech News

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Running the tests

Running the crawler

Example

Running with docker compose

Crawler schedule

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

farzanmosayyebi/TechNews

Folders and files

Latest commit

History

Repository files navigation

Tech News

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Running the tests

Running the crawler

Example

Running with docker compose

Crawler schedule

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages