Enterprise-grade website monitoring platform that detects downtime, validates content, and analyzes performance metrics at scale.
SiteSentinel delivers mission-critical website monitoring with powerful content validation, comprehensive metrics collection, and advanced performance analytics. Built for scale, it supports thousands of concurrent checks with flexible execution models and provides real-time insights through its PostgreSQL integration and interactive dashboards.
- Configurable website monitoring with customizable check intervals
- Content verification using regex patterns
- Response time tracking and HTTP status code validation
- Dual-mode scheduling with thread-based and Dask distributed computing
- Simple thread-based scheduler implementation for reliability
- PostgreSQL database integration with local or remote PostgreSQL
- Comprehensive logging system with rotation
- Graceful shutdown handling
- Dask dashboard for real-time monitoring of distributed tasks
The application consists of several modular components:
main.py: Entry point and orchestrationdatabase.py: PostgreSQL database integration using connection poolingmonitor.py: Website availability and content checkingscheduler.py: Dual-mode scheduler supporting threads and Dask distributed computingvalidators.py: Configuration validation
- Multiple processing and threading approaches are provided:
- The scheduler supports both a simple thread-based implementation for maximum reliability and a Dask-based distributed computing approach
- Enable Dask by setting
"use_dask": truein config.json - Configure number of Dask workers with the
max_workerssetting in config.json - When Dask is enabled, a dashboard URL will be displayed at startup for monitoring tasks (typically at http://localhost:8787)
- The Dask console is available at http://localhost:8787 and the URL is logged at 10-second intervals for easy access
- Configuration supports 1000 websites with regex pattern matching
- Comprehensive test suite for scheduler, database, and website monitoring at different scales
- Connects to a PostgreSQL database (configured in config.json)
-
Clone the repository:
git clone https://github.com/yourusername/Health_Check_PostgreSQL.git cd Health_Check_PostgreSQL -
Install dependencies:
pip install -r requirements.txt
-
Verify PostgreSQL installation:
# Check if PostgreSQL is installed python check_postgres.py
-
Set up PostgreSQL:
For Linux/macOS:
chmod +x setup_postgres.sh ./setup_postgres.sh
For Windows:
setup_postgres.bat -
Initialize the database schema:
python setup_db.py
The config.json file includes:
- Database connection parameters
- Website monitoring configurations (URLs, intervals, regex patterns)
- Application settings (worker count, timeouts, etc.)
- Scheduler options (
use_dask: true/falsefor enabling distributed execution)
SiteSentinel offers two execution modes that can be configured in the config.json file:
-
Thread-based Execution (Default): This is the standard mode using Python's built-in threading capabilities.
- Set
"use_dask": falsein config.json - Offers excellent reliability and simplicity
- Best for smaller deployments or when monitoring fewer websites
- Runs with a configurable thread pool defined by
max_workers
- Set
-
Distributed Execution with Dask: Enables parallel processing across multiple cores or even machines.
- Set
"use_dask": truein config.json - Provides a dashboard for real-time task monitoring at http://localhost:8787
- Offers better performance for large-scale monitoring (hundreds or thousands of websites)
- Distributes load across configurable number of workers
- Handles task queuing, retries, and resource management
- Set
To switch between modes, simply update the use_dask parameter in your config.json file and restart the application.
When running SiteSentinel with Dask enabled, a web-based dashboard is automatically available at http://localhost:8787. This dashboard provides:
- Real-time visualization of running tasks
- Worker status and resource utilization
- Task progress and completion statistics
- Performance metrics and timing information
- Diagnostic tools for troubleshooting
The dashboard URL is displayed in the console output and logged every 10 seconds for convenient access. This powerful monitoring interface is especially valuable when scaling to thousands of websites.
SiteSentinel is designed to efficiently monitor thousands of websites. Below is a sample of a configuration with 10,000 website entries.
{
"database": {
"host": "localhost",
"port": 5432,
"dbname": "sitesentinel",
"user": "postgres",
"password": "postgres",
"sslmode": "prefer"
},
"max_workers": 500,
"retry_limit": 3,
"connection_timeout": 10,
"use_dask": true,
"websites": [
{
"url": "https://www.google.com",
"check_interval_seconds": 30,
"regex_pattern": "Google Search"
},
{
"url": "https://www.bing.com",
"check_interval_seconds": 60,
"regex_pattern": "Microsoft Bing"
},
{
"url": "https://www.yahoo.com",
"check_interval_seconds": 120,
"regex_pattern": "Yahoo Search"
}
// Additional 9,997 website entries would be here
]
}When scaling to 10,000 websites:
-
Distribution of Check Intervals
- Critical websites: 30-60 second intervals (~5% of sites)
- Important websites: 120-300 second intervals (~35% of sites)
- Standard monitoring: 600 second intervals (~60% of sites)
-
Resource Planning
- Database capacity: ~10GB for a year of monitoring history
- Network bandwidth: ~50 requests per second at peak
- CPU utilization: Scales linearly with concurrent checks
-
Performance Optimization
- Concurrent connections: Configurable up to 1000 simultaneous checks
- Database indexes: Optimized for time-series queries
- Result caching: Reduces database load for frequently accessed sites
-
Monitoring Distribution
- Consider distributing monitoring across multiple nodes for geographical diversity
- Implement retry logic with exponential backoff for transient failures
- Use separate worker pools for different check intervals
For large-scale deployments, the Dask execution mode is strongly recommended to efficiently manage the workload across multiple workers.
A utility script is provided to query and analyze the PostgreSQL database:
python src/main.pyThe application will:
- Connect to the PostgreSQL database
- Load the website configurations
- Start monitoring each website at the specified intervals
- Store results in the database
The application includes a utility for querying the database:
# Show monitoring summary for the last 24 hours
python query_db.py --summary
# Analyze website performance
python query_db.py --analyze --days 7
# Query specific tables
python query_db.py --query monitoring_results --limit 20
# Run custom SQL queries
python query_db.py --sql "SELECT * FROM monitoring_results WHERE success = false"
# List all database tables
python query_db.py --list-tables
# Describe table structure
python query_db.py --describe website_configsFor monitoring a large number of websites, the application supports Dask distributed computing:
- Enable Dask in the config.json file:
"use_dask": true - Adjust the number of workers:
"max_workers": 50
When Dask is enabled, the application will display a link to the Dask dashboard for real-time monitoring of task execution.
SiteSentinel includes a comprehensive test suite that verifies functionality at different scales:
A battery of tests validates the system's performance and reliability with increasing workloads:
-
10 Websites Test: Basic functionality test with 10 popular websites
python -m unittest test.test_10_websites
-
100 Websites Test: Medium-scale test with 100 dynamically generated domains
python -m unittest test.test_100_websites
-
500 Websites Test: Large-scale test with 500 domains from external source
python -m unittest test.test_500_websites
-
1000 Websites Test: High-load test with 1000 randomly generated domains
python -m unittest test.test_1000_websites
-
10000 Websites Test: Extreme-scale stress test with 10000 domains
python -m unittest test.test_10000_websites
Unit tests for individual components ensure reliability:
-
Database Tests: Validates database operations and connection pooling
python -m unittest test.test_database
-
Scheduler Tests: Verifies task scheduling and execution in both thread and Dask modes
python -m unittest test.test_scheduler
-
Website Regex Tests: Tests content validation patterns
python -m unittest test.test_website_regex
To run the complete test suite:
python -m unittest discover -s testFor performance reasons, you may want to run the larger scale tests individually.
src/- Core application codemain.py- Main entry pointmonitor.py- Website monitoring logicdatabase.py- Database operationsscheduler.py- Task schedulingvalidators.py- Input validation
schema.sql- Database schemasetup_db.py- Database initializationquery_db.py- Database query utilitycheck_postgres.py- PostgreSQL installation checkersetup_postgres.sh/bat- PostgreSQL setup scriptsconfig.json- Application configurationlogs/- Application logs
The application logs detailed information about monitoring activities to the logs/ directory. The main log file is website_monitor.log.
Contributions are welcome! Please feel free to submit a Pull Request.
See requirements.txt for dependencies.