A powerful Django-based web application for scraping job listings from multiple platforms including Indeed and Glassdoor. Built with modern web technologies and featuring a beautiful, responsive UI.
- Indeed - Scrape job listings with location-based filtering
- Glassdoor - Extract job data with company insights
- Extensible - Easy to add new platforms (LinkedIn support coming soon)
- Duplicate Prevention - Intelligent fingerprinting system prevents duplicate entries
- Company Tracking - Automatic company database management
- Source Attribution - Track which platform each job was found on
- Real-time Updates - Live job count and progress tracking
- Responsive Design - Works perfectly on desktop, tablet, and mobile
- Dark Theme - Beautiful gradient design with modern aesthetics
- Real-time Progress - Live updates during scraping operations
- Interactive Dashboard - Clean, intuitive job browsing experience
- CSV Export - Download scraped data for analysis
- Bulk Operations - Clear all jobs with one click
- Progress Tracking - Real-time scraping progress with detailed status
- Error Handling - Robust error management and user feedback
- Backend: Django 5.2.6
- Web Scraping: Selenium 4.15.2, BeautifulSoup4
- Database: SQLite (easily configurable for PostgreSQL/MySQL)
- Frontend: HTML5, CSS3, JavaScript (Vanilla)
- Browser Automation: Chrome WebDriver with anti-detection features
- Python 3.8 or higher
- Google Chrome browser
- Virtual environment (recommended)
- Windows/macOS/Linux
git clone <repository-url>
cd job_scrape# Windows
python -m venv venv
venv\Scripts\activate
# macOS/Linux
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtcd jobsuite
python manage.py makemigrations
python manage.py migratepython manage.py createsuperuserpython manage.py runserverOpen your browser and navigate to: http://127.0.0.1:8000/
- Select Platform: Choose between Indeed or Glassdoor
- Enter Job Role: Specify the position you're looking for (e.g., "Data Scientist", "Software Engineer")
- Set Location: Enter the desired location (e.g., "New York, NY", "San Francisco, CA", "Remote")
- Choose Limit: Set how many jobs to scrape (1-100)
- Click "Run Scraper": Watch the magic happen!
- Live progress updates during scraping
- Detailed status messages
- Automatic error handling and recovery
- View Results: Browse scraped jobs in the beautiful table interface
- Export Data: Download results as CSV for further analysis
- Clear Database: Reset all data with one click (with confirmation)
Indeed Scraping:
- Location-based job filtering
- Anti-detection measures
- Automatic pagination handling
Glassdoor Scraping:
- Company insights and ratings
- Advanced job filtering
- Robust error handling
job_scrape/
โโโ jobsuite/ # Django project
โ โโโ jobs/ # Main app
โ โ โโโ scraper/ # Scraping modules
โ โ โ โโโ indeed_scraper.py
โ โ โ โโโ glassdoor.py
โ โ โ โโโ pipeline.py
โ โ โ โโโ ...
โ โ โโโ templates/ # HTML templates
โ โ โโโ models.py # Database models
โ โ โโโ views.py # View logic
โ โ โโโ forms.py # Form definitions
โ โโโ manage.py
โ โโโ settings.py
โโโ requirements.txt
โโโ README.md
The app uses SQLite by default. To use PostgreSQL or MySQL:
- Update
DATABASESinjobsuite/settings.py - Install the appropriate database adapter
- Run migrations
- Default job limit: 20
- Maximum job limit: 100
- Rate limiting: Built-in delays to prevent blocking
- Chrome WebDriver automatically managed
- Anti-detection features enabled
- Headless mode available for server deployment
- Title: Job position title
- Company: Foreign key to Company model
- Location: Job location
- Description: Full job description
- Source URL: Original job posting URL
- Sources: List of platforms where job was found
- Fingerprint: Unique identifier for deduplication
- Posted At: When the job was originally posted
- Scraped At: When we scraped the job
- Name: Company name
- Website: Company website URL
- Email: Contact email (if found)
- Created At: When company was added to database
Chrome Driver Problems:
- The app automatically downloads and manages ChromeDriver
- Ensure Chrome browser is installed and up-to-date
Scraping Failures:
- Check your internet connection
- Some platforms may have rate limiting
- Try reducing the job limit
Database Issues:
- Run
python manage.py migrateto update database schema - Check file permissions for SQLite database
Memory Issues:
- Reduce the job limit for large scrapes
- Clear old data using the "Clear All Jobs" button
- Check the console output for detailed error messages
- Ensure all dependencies are installed correctly
- Verify your Python version (3.8+ required)
- Check that Chrome browser is installed
- LinkedIn scraping support
- Scheduled scraping with cron jobs
- Advanced filtering and search
- Email notifications for new jobs
- Job application tracking
- Company analysis dashboard
- API endpoints for external integration
- Docker containerization
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is for educational and personal use only. Please respect the terms of service of the platforms you're scraping from. Use responsibly and consider the impact on the target websites.
- Django community for the excellent framework
- Selenium team for browser automation capabilities
- BeautifulSoup for HTML parsing
- All contributors and users of this project
Happy Job Hunting! ๐ฏ
Built with โค๏ธ for job seekers and recruiters everywhere