A real-time genetic risk prediction system built with Quarkus WebSockets, deployed on Azure Red Hat OpenShift with event-driven architecture and scale-to-zero capabilities.
This project implements a healthcare ML application that processes genetic data in real-time using WebSocket connections, Kafka event streaming, and machine learning inference. The system is designed for cost-effective deployment on OpenShift with comprehensive monitoring and HIPAA-compliant security.
graph TB
Frontend[Web Frontend] --> WebSocket[Quarkus WebSocket Service]
WebSocket --> Kafka[Apache Kafka]
Kafka --> VEP[VEP Annotation Service]
VEP --> VEPAPI[Ensembl VEP API]
VEP --> Kafka
Kafka --> WebSocket
WebSocket --> Frontend
KEDA[KEDA Autoscaler] --> WebSocket
KEDA --> VEP
Insights[Red Hat Insights] --> Cost[Cost Management]
subgraph "OpenShift Cluster"
WebSocket
VEP
Kafka
KEDA
end
- π Quarkus WebSocket Service: Real-time genetic data processing and session management
- π¬ VEP Annotation Service: Genetic variant annotation using Ensembl VEP API
- π Apache Kafka: Event streaming backbone with multiple topics for different scaling modes
- β‘ KEDA: Event-driven autoscaling for both pod and node scaling
- π° Red Hat Insights: Cost management and observability with chargeback capabilities
- π Normal Mode: Pod scaling based on Kafka lag (genetic-data-raw topic)
- KEDA Trigger: Kafka consumer lag threshold
- Scaling: 0-10 pods based on message backlog
- Use Case: Standard genetic sequence processing
- π Big Data Mode: Memory-intensive processing (genetic-bigdata-raw topic)
- KEDA Trigger: Higher partition count, memory-optimized scaling
- Scaling: 0-5 pods with increased memory allocation
- Use Case: Large genomic datasets, complex variant analysis
- β‘ Node Scale Mode: Cluster autoscaler with dedicated compute nodes (genetic-nodescale-raw topic)
- KEDA Trigger: Kafka lag + cluster autoscaler integration
- Scaling: Triggers new compute-intensive nodes when needed
- Use Case: Massive workloads requiring additional cluster capacity
- Azure Red Hat OpenShift cluster with admin access
- OpenShift CLI (
oc
) installed and logged in - Java 17 (hard requirement for local development)
- Podman (preferred over Docker for containerization)
- Git repository access
Start with the Getting Started Tutorial for a complete walkthrough.
π For complete deployment instructions, see DEPLOYMENT.md π― Need help choosing? See Deployment Decision Matrix
β Comprehensive Deployment (Recommended):
# Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor
# Run comprehensive enhanced deployment script (includes ALL components)
./scripts/deploy-clean-enhanced.sh
# Access application (get URL from script output)
β‘ Basic Deployment (Minimal Components):
# For basic deployment without KEDA scaling, OpenShift AI, or advanced features
./scripts/deploy-clean.sh
# Note: This deploys core components only. For full functionality, use enhanced script above.
Manual Quick Start:
# 1. Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor
# 2. Deploy operators
oc apply -k k8s/base/operators
# 3. Deploy infrastructure
oc apply -k k8s/base/infrastructure
# 4. Deploy applications
oc apply -k k8s/base/applications/quarkus-websocket -n healthcare-ml-demo
oc apply -k k8s/base/applications/vep-service -n healthcare-ml-demo
# 5. Grant permissions and start builds
oc policy add-role-to-user system:image-puller system:serviceaccount:healthcare-ml-demo:vep-service -n healthcare-ml-demo
oc start-build quarkus-websocket-service -n healthcare-ml-demo
oc start-build vep-service -n healthcare-ml-demo
# 6. Access application
oc get route quarkus-websocket-service -n healthcare-ml-demo
# Open: https://<route-url>/genetic-client.html
healthcare-ml-genetic-predictor/
βββ quarkus-websocket-service/ # Quarkus WebSocket application
β βββ src/main/java/ # Java source code
β βββ src/main/resources/ # Application resources
β βββ pom.xml # Maven configuration
βββ k8s/ # OpenShift/Kubernetes manifests
β βββ base/ # Base Kustomize resources
β β βββ operators/ # Operator subscriptions
β β βββ infrastructure/ # Kafka, namespace
β β βββ applications/ # Application deployments
β β βββ eventing/ # KEDA, Knative eventing
β βββ overlays/ # Environment-specific configs
β β βββ dev/ # Development environment
β β βββ staging/ # Staging environment
β β βββ prod/ # Production environment
β βββ components/ # Reusable components
βββ docs/ # Documentation
βββ research.md # Technical research notes
βββ README.md # This file
- Quarkus 3.8.6: Cloud-native Java framework
- WebSockets: Real-time genetic data communication
- SmallRye Reactive Messaging: Kafka integration
- Micrometer: Metrics and monitoring
- Azure Red Hat OpenShift: Container orchestration
- AMQ Streams (Kafka): Event streaming platform
- OpenShift Serverless (Knative): Scale-to-zero services
- KEDA: Event-driven autoscaling
- OpenShift AI: ML model serving
- Kustomize: Configuration management
- OpenShift BuildConfig: Source-to-Image builds
- Red Hat Insights: Cost management
- Prometheus: Metrics collection
cd quarkus-websocket-service
./mvnw quarkus:dev
Open http://localhost:8080/genetic-client.html
and test with sample genetic sequences:
- Basic DNA:
ATCGATCGATCG
- Complex:
ATGCGTACGTAGCTAGCTA
curl http://localhost:8080/q/health
curl http://localhost:8080/q/metrics
# Check KEDA ScaledObjects
oc get scaledobject -n healthcare-ml-demo
# Check HPA created by KEDA
oc get hpa -n healthcare-ml-demo
# Monitor scaling in action
watch oc get pods -n healthcare-ml-demo
# Check KEDA operator status
oc get pods -n openshift-keda | grep keda
- Red Hat Insights: Integrated cost tracking
- Cost Center:
genomics-research
- Project:
risk-predictor-v1
- Billing Model: Chargeback
- Application metrics via Micrometer/Prometheus
- Kafka metrics for genetic data processing
- KEDA scaling metrics
- Custom healthcare ML metrics
- Non-root container execution
- Security Context Constraints (SCC)
- Network policies for traffic isolation
- Audit logging enabled
- TLS encryption for all communications
- RBAC for service account permissions
- Secure secrets management
- Container image scanning
- Minimal resource allocation
- Ephemeral storage
- Debug logging enabled
- Single replicas
- High availability setup
- Persistent storage with backup
- Strict resource limits
- Production monitoring
- KEDA: Kafka lag-based scaling
- Knative: HTTP traffic-based scaling
- Cold Start: <10 seconds
- Cost Optimization: Zero cost when idle
- WebSocket connection: <100ms latency
- Genetic sequence processing: <500ms
- Kafka message throughput: 1000 msg/sec
- Concurrent connections: 100+
We welcome contributions from the community! This project has many opportunities for enhancement and expansion.
- Code of Conduct - Our community standards
- Security Policy - How to report vulnerabilities
- Contributing Guide - Comprehensive contribution opportunities
- π₯ Red Hat Cost Management Console Access - Help validate console.redhat.com access
- π Alternative Cost Visualization - Create local dashboards for cost monitoring
- π Enhanced Security & Compliance - Implement healthcare-grade security features
- 𧬠Advanced ML Models - Expand genetic analysis capabilities
- π Read: Contributing Guide for detailed opportunities
- π Browse: Open Issues for current needs
- π Start: Fork the repository and create a feature branch
- β Test: Validate changes locally and on OpenShift
- π Submit: Create a pull request with clear description
- Cost Management: Console access validation and alternative dashboards
- Security: HIPAA compliance and healthcare-grade security features
- ML/AI: Advanced genetic analysis models and OpenShift AI integration
- Documentation: Tutorials, guides, and community resources
- Performance: Optimization and advanced scaling configurations
- Integration: Multi-cloud deployments and healthcare system integration
π― Complete Documentation Suite - Comprehensive DiΓ‘taxis framework documentation
- π Tutorials: Getting Started | Local Development | Scaling Demo | Kafka Lag Scaling
- π οΈ How-To Guides: Deploy to OpenShift | Advanced Troubleshooting
- π Reference: API Reference | Configuration
- π‘ Explanation: System Architecture | Scaling Strategy
- Quarkus WebSocket Service - Threading validation and API endpoints
- OpenShift Deployment Guide - Kustomize-based deployment structure
- Technical Research - Research foundation and ML approaches
- Development Specification - Technical specifications and requirements
- π Documentation: Start with the Complete Documentation Suite
- π§ Advanced Troubleshooting: Use Context-Aware Debugging Guide
- π§ Augment Code: Leverage AI-Assisted Development Guide
- π System Logs:
oc logs -f deployment/quarkus-websocket-service -n healthcare-ml-demo
- βοΈ Configuration:
kustomize build k8s/base
- Critical Issues: Follow Emergency Response Procedures
- System Recovery: Use Recovery Scripts for automated system restoration
- Cost Management: Monitor via Red Hat Insights Cost Dashboard
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with β€οΈ for healthcare innovation on Azure Red Hat OpenShift