Jobs Crawler

A scalable web crawler built with Scrapy to extract job listings from multiple platforms. The project includes:

Modular spiders for targeted websites (e.g., LinkedinSpider, IndeedSpider).
Data pipelines for cleaning and storing results (CSV, JSON, or databases).
Configurable settings (user-agent, delay, proxies) to avoid blocking.

Installation

To use the jobs crawler, you need to meet the following requirements:

Python 3.x
Scrapy library
Other dependency libraries

Once you have the required environment set up, you can follow these steps to install and run the tool:

Clone the project repository
Navigate to the project directory: cd jobs-crawler
Install the required dependencies: pip install -r requirements.txt

Usage Instructions

Navigate to your project directory and run the spider using the following command:

scrapy crawl demo_spider -a domain=example.com -a url=http://www.example.com -o links.csv

This command will start the spider and save the results in a CSV file named links.csv.

Remember to replace 'example.com' with the actual domain you want to check for links, and adjust the starting URL if needed.

Notes

Please be aware that using this tool may exert slight access pressure on the target website; please use responsibly within reasonable limits. This tool is provided as-is, and we are not responsible for any issues caused by its use.

Contributions

If you discover any issues or opportunities for improvement, feel free to raise an Issue or submit a Pull Request. Your contributions are greatly appreciated!

License

This project is open-source under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
jobs_crawler		jobs_crawler
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jobs Crawler

Installation

Usage Instructions

Notes

Contributions

License

About

Uh oh!

Releases

Packages

Languages

License

wangeguo/jobs-crawler

Folders and files

Latest commit

History

Repository files navigation

Jobs Crawler

Installation

Usage Instructions

Notes

Contributions

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages