🚀 Job Scraper Pro

🚀 Job Scraper Pro

A powerful Django-based web application for scraping job listings from multiple platforms including Indeed and Glassdoor. Built with modern web technologies and featuring a beautiful, responsive UI.

✨ Features

🔍 Multi-Platform Job Scraping

Indeed - Scrape job listings with location-based filtering
Glassdoor - Extract job data with company insights
Extensible - Easy to add new platforms (LinkedIn support coming soon)

📊 Smart Data Management

Duplicate Prevention - Intelligent fingerprinting system prevents duplicate entries
Company Tracking - Automatic company database management
Source Attribution - Track which platform each job was found on
Real-time Updates - Live job count and progress tracking

🎨 Modern User Interface

Responsive Design - Works perfectly on desktop, tablet, and mobile
Dark Theme - Beautiful gradient design with modern aesthetics
Real-time Progress - Live updates during scraping operations
Interactive Dashboard - Clean, intuitive job browsing experience

📈 Advanced Functionality

CSV Export - Download scraped data for analysis
Bulk Operations - Clear all jobs with one click
Progress Tracking - Real-time scraping progress with detailed status
Error Handling - Robust error management and user feedback

🛠️ Technology Stack

Backend: Django 5.2.6
Web Scraping: Selenium 4.15.2, BeautifulSoup4
Database: SQLite (easily configurable for PostgreSQL/MySQL)
Frontend: HTML5, CSS3, JavaScript (Vanilla)
Browser Automation: Chrome WebDriver with anti-detection features

📋 Prerequisites

Python 3.8 or higher
Google Chrome browser
Virtual environment (recommended)
Windows/macOS/Linux

🚀 Quick Start

1. Clone the Repository

git clone <repository-url>
cd job_scrape

2. Set Up Virtual Environment

# Windows
python -m venv venv
venv\Scripts\activate

# macOS/Linux
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up Database

cd jobsuite
python manage.py makemigrations
python manage.py migrate

5. Create Admin User (Optional)

python manage.py createsuperuser

6. Run the Application

python manage.py runserver

7. Access the Dashboard

Open your browser and navigate to: http://127.0.0.1:8000/

📖 How to Use

Basic Job Scraping

Select Platform: Choose between Indeed or Glassdoor
Enter Job Role: Specify the position you're looking for (e.g., "Data Scientist", "Software Engineer")
Set Location: Enter the desired location (e.g., "New York, NY", "San Francisco, CA", "Remote")
Choose Limit: Set how many jobs to scrape (1-100)
Click "Run Scraper": Watch the magic happen!

Advanced Features

Real-time Progress Tracking

Live progress updates during scraping
Detailed status messages
Automatic error handling and recovery

Data Management

View Results: Browse scraped jobs in the beautiful table interface
Export Data: Download results as CSV for further analysis
Clear Database: Reset all data with one click (with confirmation)

Platform-Specific Features

Indeed Scraping:

Location-based job filtering
Anti-detection measures
Automatic pagination handling

Glassdoor Scraping:

Company insights and ratings
Advanced job filtering
Robust error handling

🗂️ Project Structure

job_scrape/
├── jobsuite/                    # Django project
│   ├── jobs/                   # Main app
│   │   ├── scraper/           # Scraping modules
│   │   │   ├── indeed_scraper.py
│   │   │   ├── glassdoor.py
│   │   │   ├── pipeline.py
│   │   │   └── ...
│   │   ├── templates/         # HTML templates
│   │   ├── models.py          # Database models
│   │   ├── views.py           # View logic
│   │   └── forms.py           # Form definitions
│   ├── manage.py
│   └── settings.py
├── requirements.txt
└── README.md

🔧 Configuration

Database Settings

The app uses SQLite by default. To use PostgreSQL or MySQL:

Update DATABASES in jobsuite/settings.py
Install the appropriate database adapter
Run migrations

Scraping Limits

Default job limit: 20
Maximum job limit: 100
Rate limiting: Built-in delays to prevent blocking

Browser Settings

Chrome WebDriver automatically managed
Anti-detection features enabled
Headless mode available for server deployment

📊 Data Models

Job Model

Title: Job position title
Company: Foreign key to Company model
Location: Job location
Description: Full job description
Source URL: Original job posting URL
Sources: List of platforms where job was found
Fingerprint: Unique identifier for deduplication
Posted At: When the job was originally posted
Scraped At: When we scraped the job

Company Model

Name: Company name
Website: Company website URL
Email: Contact email (if found)
Created At: When company was added to database

🚨 Troubleshooting

Common Issues

Chrome Driver Problems:

The app automatically downloads and manages ChromeDriver
Ensure Chrome browser is installed and up-to-date

Scraping Failures:

Check your internet connection
Some platforms may have rate limiting
Try reducing the job limit

Database Issues:

Run python manage.py migrate to update database schema
Check file permissions for SQLite database

Memory Issues:

Reduce the job limit for large scrapes
Clear old data using the "Clear All Jobs" button

Getting Help

Check the console output for detailed error messages
Ensure all dependencies are installed correctly
Verify your Python version (3.8+ required)
Check that Chrome browser is installed

🔮 Future Enhancements

LinkedIn scraping support
Scheduled scraping with cron jobs
Advanced filtering and search
Email notifications for new jobs
Job application tracking
Company analysis dashboard
API endpoints for external integration
Docker containerization

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Legal Notice

This tool is for educational and personal use only. Please respect the terms of service of the platforms you're scraping from. Use responsibly and consider the impact on the target websites.

🙏 Acknowledgments

Django community for the excellent framework
Selenium team for browser automation capabilities
BeautifulSoup for HTML parsing
All contributors and users of this project

Happy Job Hunting! 🎯

Built with ❤️ for job seekers and recruiters everywhere

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
jobs		jobs
jobsuite		jobsuite
README.md		README.md
Uncleaned_DS_jobs.csv		Uncleaned_DS_jobs.csv
Uncleaned_random_jobs.csv		Uncleaned_random_jobs.csv
db.sqlite3		db.sqlite3
manage.py		manage.py
progress_27b40118-0c0a-4e99-b8d2-32751ae4f275.json		progress_27b40118-0c0a-4e99-b8d2-32751ae4f275.json
progress_5ca3a7ae-ed80-4082-b9de-b815831fa446.json		progress_5ca3a7ae-ed80-4082-b9de-b815831fa446.json
test.ipynb		test.ipynb

Ah93/jobs_scraper

Folders and files

Latest commit

History

Repository files navigation

🚀 Job Scraper Pro

✨ Features

🔍 Multi-Platform Job Scraping

📊 Smart Data Management

🎨 Modern User Interface

📈 Advanced Functionality

🛠️ Technology Stack

📋 Prerequisites

🚀 Quick Start

1. Clone the Repository

2. Set Up Virtual Environment

3. Install Dependencies

4. Set Up Database

5. Create Admin User (Optional)

6. Run the Application

7. Access the Dashboard

📖 How to Use

Basic Job Scraping

Advanced Features

Real-time Progress Tracking

Data Management

Platform-Specific Features

🗂️ Project Structure

🔧 Configuration

Database Settings

Scraping Limits

Browser Settings

📊 Data Models

Job Model

Company Model

🚨 Troubleshooting

Common Issues

Getting Help

🔮 Future Enhancements

🤝 Contributing

📄 License

⚠️ Legal Notice

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages