Stream-Syntra is a comprehensive real-time analytics platform that demonstrates enterprise-grade streaming data architecture. Built with modern open-source technologies, it showcases end-to-end data ingestion, processing, storage, and visualization capabilities.
Stream-Syntra is designed as a production-ready template for building scalable real-time analytics systems. The platform seamlessly integrates multiple data sources, processes streaming events through Apache Kafka, persists them in a high-performance time-series database (QuestDB), and provides real-time visualization through interactive dashboards.
- π Real-Time Data Ingestion: Multi-source data streaming with Apache Kafka
- β‘ High-Performance Storage: QuestDB time-series database with columnar storage
- π Interactive Dashboards: Grafana-powered real-time visualization
- π¬ Data Science Ready: Integrated Jupyter notebooks for ML/AI workflows
- π Monitoring & Observability: Built-in metrics collection and system monitoring
- π³ Containerized Architecture: Full Docker-compose deployment
- π‘οΈ Production Ready: High availability and fault-tolerant design
Stream-Syntra implements a modern streaming data architecture following industry best practices:
graph TB
subgraph "π₯ Data Sources"
A[Cryptocurrency Trading APIs]
B[GitHub Public Events]
C[IoT Sensors & Smart Meters]
D[Factory Plant Sensors]
E[Transaction Systems]
end
subgraph "π Message Streaming Layer"
F[Apache Kafka Cluster]
G[Schema Registry]
H[Kafka Connect]
end
subgraph "πΎ Time-Series Database"
I[QuestDB]
end
subgraph "π Analytics & Visualization"
J[Grafana Dashboards]
K[Jupyter Notebooks]
L[Data Science & ML Models]
end
subgraph "π Monitoring & Observability"
M[Telegraf Agent]
N[System Metrics]
end
A --> F
B --> F
C --> I
D --> I
E --> F
F --> G
F --> H
G --> H
H --> I
I --> J
I --> K
I --> L
M --> I
F --> M
I --> N
| Component | Technology | Purpose |
|---|---|---|
| Message Broker | Apache Kafka | High-throughput distributed streaming |
| Database | QuestDB | High-performance time-series analytics |
| Visualization | Grafana | Real-time dashboards and monitoring |
| Data Science | Jupyter Notebook | Interactive analysis and ML workflows |
| Schema Management | Confluent Schema Registry | Data schema evolution and compatibility |
| Data Integration | Kafka Connect | Scalable connector framework |
| Monitoring | Telegraf | Metrics collection and aggregation |
| Containerization | Docker & Docker Compose | Consistent deployment environment |
Stream-Syntra provides several pre-configured dashboards for different data types:
- URL: http://localhost:3000/d/trades-crypto-currency/trades-crypto-currency?orgId=1&refresh=250ms
- Features: Real-time price movements, volume analysis, trading patterns
- Refresh Rate: 250ms for ultra-low latency visualization
- URL: http://localhost:3000/d/github-events-questdb/github-events-dashboard?orgId=1&refresh=5s
- Features: Repository activity, commit patterns, developer insights
- Data Source: Live GitHub public events API
- URL: http://localhost:3000/d/qdb-iot-demo/device-data-questdb-demo?orgId=1&refresh=500ms
- Features: Sensor readings, device health, environmental monitoring
- Use Cases: Smart meters, factory sensors, environmental monitoring
Ensure you have the following installed:
- Docker (20.x or higher)
- Docker Compose (2.x or higher)
- Git (for cloning the repository)
- 8GB+ RAM (recommended for optimal performance)
-
Clone Stream-Syntra
git clone https://github.com/your-username/stream-syntra.git cd stream-syntra -
Configure GitHub Token (Optional - for GitHub data ingestion)
export GITHUB_TOKEN=your_github_personal_access_tokenπ‘ Tip: Create a GitHub Personal Access Token with read access to public repositories.
-
Fix Docker Permissions (Linux/macOS)
export DOCKER_COMPOSE_USER_ID=$(id -u)
Start the entire platform with a single command:
docker-compose upβ±οΈ Initial startup time: 30-60 seconds (subsequent starts are much faster) πΎ Disk usage: ~1GB for Docker images
Once all services are running, access these endpoints:
| Service | URL | Credentials |
|---|---|---|
| π QuestDB Console | http://localhost:9000 | None required |
| π Grafana Dashboards | http://localhost:3000 | admin / quest |
| π¬ Jupyter Notebooks | http://localhost:8888 | None required |
| π§ Kafka Connect API | http://localhost:8083 | None required |
Stream-Syntra supports multiple data ingestion patterns:
Perfect for interactive data exploration and prototyping:
-
Cryptocurrency Trading Data
- Navigate to: http://localhost:8888/notebooks/Send-Trades-To-Kafka.ipynb
- Dataset: 1M+ real crypto trades (March 2024)
- Format: AVRO via Kafka
- Frequency: 50ms intervals (configurable)
-
GitHub Events Data
- Navigate to: http://localhost:8888/notebooks/Send-Github-Events-To-Kafka.ipynb
- Source: GitHub Public Events API
- Format: JSON via Kafka
- Frequency: 10s intervals (API rate limit compliance)
-
IoT & Sensor Data
- Navigate to: http://localhost:8888/notebooks/IoTEventsToQuestDB.ipynb
- Pattern: Direct to QuestDB via ILP protocol
- Use Case: High-throughput sensor data
Production-ready implementations in multiple languages:
cd ingestion/python
pip install -r requirements.txt
# GitHub events
python github_events.py
# Smart meters
python smart_meters_send_to_kafka.py
# Trading data
python send_trades_to_kafka.py ../../notebooks/tradesMarch.csv tradescd ingestion/go/github_events
go get
go run .
# For trading data
cd ../trades
go run trades_sender.go --topic="trades" --csv=../../../notebooks/tradesMarch.csvcd ingestion/nodejs
npm install node-rdkafka @octokit/rest
node github_events.jscd ingestion/java/github_events
mvn package
java -jar target/github-events-1.0-SNAPSHOT-jar-with-dependencies.jarcd ingestion/rust/github_events
cargo runStream-Syntra includes production-ready ML workflows for time-series forecasting:
-
Trading Data Forecasting
- Notebook: http://localhost:8888/notebooks/Time-Series-Forecasting-ML-trades.ipynb
- Models: Prophet, Linear Regression
- Use Case: Price prediction, volume forecasting
-
GitHub Activity Forecasting
- Notebook: http://localhost:8888/notebooks/Time-Series-Forecasting-ML.ipynb
- Models: ARIMA, LSTM-ready architecture
- Use Case: Repository activity prediction
- Real-time Data Querying: PostgreSQL-compatible SQL interface
- Advanced Analytics: Time-series specific SQL extensions
- Scalable Processing: Columnar storage for billion-row analytics
- ML Integration: Direct pandas/scikit-learn connectivity
-- Trading volume analysis
SELECT timestamp, symbol, COUNT() as trades,
AVG(price) as avg_price, SUM(volume) as total_volume
FROM trades
SAMPLE BY 5m;
-- GitHub activity patterns
SELECT timestamp, type, COUNT() as events
FROM github_events
SAMPLE BY 15m;
-- IoT sensor aggregations
SELECT timestamp, device_id,
AVG(temperature) as avg_temp,
MAX(humidity) as max_humidity
FROM iot_data
SAMPLE BY 1h;- Columnar Storage: Optimized for analytical workloads
- Time Partitioning: Automatic data organization by time
- Parallel Processing: Multi-core query execution
- Compression: Efficient storage with fast decompression
- At-Least-Once Delivery: Guaranteed message processing
- Schema Evolution: Forward/backward compatible data formats
- Exactly-Once Semantics: Available through Kafka transactions
- Backpressure Handling: Automatic flow control
- System Metrics: CPU, memory, disk, network monitoring via Telegraf
- Application Metrics: Kafka lag, QuestDB performance, ingestion rates
- Custom Dashboards: Grafana-based business intelligence
- Alerting: Configurable thresholds and notifications
| Service | Configuration Location | Key Settings |
|---|---|---|
| QuestDB | questdb/ volume |
Web console, ILP endpoint, Postgres port |
| Kafka | broker-1/, broker-2/ |
Replication, retention, partitions |
| Grafana | dashboard/grafana/ |
Datasources, dashboards, users |
| Jupyter | notebooks/ |
Pre-loaded notebooks and data |
# Docker user permissions
export DOCKER_COMPOSE_USER_ID=$(id -u)
# GitHub API access
export GITHUB_TOKEN=your_personal_access_token
# Custom configurations
export KAFKA_RETENTION_HOURS=168 # 7 days
export QUESTDB_HTTP_PORT=9000
export GRAFANA_PORT=3000Stream-Syntra is designed for horizontal scaling:
- Kafka Brokers: Add more brokers in docker-compose.yml
- Kafka Connect Workers: Scale connect workers for higher throughput
- QuestDB Instances: Partition data across multiple QuestDB instances
- Monitoring: Add more Telegraf agents for distributed monitoring
- Security: Configure authentication, TLS/SSL, network policies
- Persistence: Use external volumes/persistent disks
- Monitoring: Set up external monitoring (Prometheus, DataDog, etc.)
- Backup: Implement database backup strategies
- Scaling: Configure auto-scaling policies
- Load Balancing: Add load balancers for high availability
Consider these enhancements for production:
- Kafka Schema Registry Clusters: Multi-region schema management
- QuestDB Enterprise: Advanced security and clustering features
- Grafana Enterprise: Advanced dashboarding and user management
- Confluent Platform: Enterprise Kafka features and support
- Kubernetes Deployment: Container orchestration at scale
- QuestDB Time-Series Guide - Advanced SQL for time-series
- Kafka Streaming Patterns - Stream processing architectures
- Grafana Dashboard Design - Effective visualization techniques
- Time-Series Forecasting - ML model implementation
- Financial Trading: Real-time market data analysis and algorithmic trading
- IoT & Manufacturing: Industrial sensor monitoring and predictive maintenance
- DevOps & Monitoring: Application performance monitoring and log analytics
- Social Media Analytics: Real-time engagement tracking and sentiment analysis
- Supply Chain: Logistics optimization and inventory management
# Fix Grafana volume permissions
sudo chown -R $(id -u):$(id -g) dashboard/grafana/home_dir/
export DOCKER_COMPOSE_USER_ID=$(id -u)# Increase Docker memory allocation (8GB recommended)
docker system prune -a # Clean up unused containers/images# Check for port conflicts
netstat -tulpn | grep :3000
# Kill conflicting processes or modify docker-compose.yml portsMonitor service health through these endpoints:
# QuestDB health
curl http://localhost:9003/status
# Kafka Connect status
curl http://localhost:8083/connectors
# Check running containers
docker-compose ps# Graceful shutdown
docker-compose down
# Remove volumes (β οΈ deletes all data)
docker-compose down -v
# Complete cleanup (removes images)
docker-compose down -v --rmi all# Backup QuestDB data
tar -czf questdb-backup-$(date +%Y%m%d).tar.gz questdb/questdb_root/
# Clean up old data (example for 30-day retention)
# Configure in QuestDB: ALTER TABLE trades DROP PARTITION where timestamp < dateadd('d', -30, now())Stream-Syntra delivers enterprise-grade performance:
| Metric | Performance | Configuration |
|---|---|---|
| Ingestion Rate | 1M+ events/sec | QuestDB ILP protocol |
| Query Latency | <100ms | Billion-row aggregations |
| Dashboard Refresh | 250ms | Real-time visualization |
| Storage Efficiency | 10:1 compression | Columnar time-series storage |
| Throughput | 10GB+/hour | Multi-broker Kafka cluster |
Stream-Syntra is designed as a learning platform and portfolio showcase. Contributions are welcome!
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Make changes and test thoroughly
- Submit pull request with detailed description
- Follow existing code style and patterns
- Add comprehensive documentation
- Include unit tests for new features
- Update README.md for significant changes
This project is licensed under the MIT License - see the LICENSE file for details.
Stream-Syntra demonstrates expertise in:
β
Real-Time Data Architecture - End-to-end streaming pipeline design
β
Cloud-Native Technologies - Docker, microservices, container orchestration
β
Big Data Engineering - High-throughput data processing and storage
β
Data Visualization - Interactive dashboards and business intelligence
β
Machine Learning Operations - ML model integration and time-series forecasting
β
DevOps Practices - Infrastructure as code, monitoring, and observability
β
Multi-Language Development - Python, Java, Go, Node.js, Rust implementations
β
Database Expertise - Time-series databases, SQL optimization, data modeling
β
API Integration - RESTful services, real-time APIs, schema management
β
Production Readiness - Scalability, fault tolerance, security considerations
Stream-Syntra - Demonstrating Modern Data Engineering Excellence
πΌ LinkedIn: https://www.linkedin.com/in/michael-eniolade/ π§ Email: [email protected]
Built with passion for real-time analytics and modern data architecture
Keywords: Real-time Analytics, Apache Kafka, QuestDB, Time-Series Database, Stream Processing, Data Engineering, Docker, Grafana, Machine Learning, IoT Analytics, Financial Data, GitHub Analytics, Data Visualization, Business Intelligence