Healthcare ML Genetic Risk Predictor

A real-time genetic risk prediction system built with Quarkus WebSockets, deployed on Azure Red Hat OpenShift with event-driven architecture and scale-to-zero capabilities.

🧬 Overview

This project implements a healthcare ML application that processes genetic data in real-time using WebSocket connections, Kafka event streaming, and machine learning inference. The system is designed for cost-effective deployment on OpenShift with comprehensive monitoring and HIPAA-compliant security.

🏗️ Architecture

graph TB
    Frontend[Web Frontend] --> WebSocket[Quarkus WebSocket Service]
    WebSocket --> Kafka[Apache Kafka]
    Kafka --> VEP[VEP Annotation Service]
    VEP --> VEPAPI[Ensembl VEP API]
    VEP --> Kafka
    Kafka --> WebSocket
    WebSocket --> Frontend

    KEDA[KEDA Autoscaler] --> WebSocket
    KEDA --> VEP

    Insights[Red Hat Insights] --> Cost[Cost Management]

    subgraph "OpenShift Cluster"
        WebSocket
        VEP
        Kafka
        KEDA
    end

Key Components

🌐 Quarkus WebSocket Service: Real-time genetic data processing and session management
🔬 VEP Annotation Service: Genetic variant annotation using Ensembl VEP API
📊 Apache Kafka: Event streaming backbone with multiple topics for different scaling modes
⚡ KEDA: Event-driven autoscaling for both pod and node scaling
💰 Red Hat Insights: Cost management and observability with chargeback capabilities

Scaling Modes & KEDA Integration

📊 Normal Mode: Pod scaling based on Kafka lag (genetic-data-raw topic)
- KEDA Trigger: Kafka consumer lag threshold
- Scaling: 0-10 pods based on message backlog
- Use Case: Standard genetic sequence processing
🚀 Big Data Mode: Memory-intensive processing (genetic-bigdata-raw topic)
- KEDA Trigger: Higher partition count, memory-optimized scaling
- Scaling: 0-5 pods with increased memory allocation
- Use Case: Large genomic datasets, complex variant analysis
⚡ Node Scale Mode: Cluster autoscaler with dedicated compute nodes (genetic-nodescale-raw topic)
- KEDA Trigger: Kafka lag + cluster autoscaler integration
- Scaling: Triggers new compute-intensive nodes when needed
- Use Case: Massive workloads requiring additional cluster capacity

🚀 Quick Start

Prerequisites

Azure Red Hat OpenShift cluster with admin access
OpenShift CLI (oc) installed and logged in
Java 17 (hard requirement for local development)
Podman (preferred over Docker for containerization)
Git repository access

🎯 Choose Your Path

🎓 New to the System?

Start with the Getting Started Tutorial for a complete walkthrough.

🚀 Quick Deploy (Experienced Users)

📖 For complete deployment instructions, see DEPLOYMENT.md 🎯 Need help choosing? See Deployment Decision Matrix

✅ Comprehensive Deployment (Recommended):

# Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor

# Run comprehensive enhanced deployment script (includes ALL components)
./scripts/deploy-clean-enhanced.sh

# Access application (get URL from script output)

⚡ Basic Deployment (Minimal Components):

# For basic deployment without KEDA scaling, OpenShift AI, or advanced features
./scripts/deploy-clean.sh

# Note: This deploys core components only. For full functionality, use enhanced script above.

Manual Quick Start:

# 1. Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor

# 2. Deploy operators
oc apply -k k8s/base/operators

# 3. Deploy infrastructure
oc apply -k k8s/base/infrastructure

# 4. Deploy applications
oc apply -k k8s/base/applications/quarkus-websocket -n healthcare-ml-demo
oc apply -k k8s/base/applications/vep-service -n healthcare-ml-demo

# 5. Grant permissions and start builds
oc policy add-role-to-user system:image-puller system:serviceaccount:healthcare-ml-demo:vep-service -n healthcare-ml-demo
oc start-build quarkus-websocket-service -n healthcare-ml-demo
oc start-build vep-service -n healthcare-ml-demo

# 6. Access application
oc get route quarkus-websocket-service -n healthcare-ml-demo
# Open: https://<route-url>/genetic-client.html

📁 Project Structure

healthcare-ml-genetic-predictor/
├── quarkus-websocket-service/          # Quarkus WebSocket application
│   ├── src/main/java/                  # Java source code
│   ├── src/main/resources/             # Application resources
│   └── pom.xml                         # Maven configuration
├── k8s/                                # OpenShift/Kubernetes manifests
│   ├── base/                           # Base Kustomize resources
│   │   ├── operators/                  # Operator subscriptions
│   │   ├── infrastructure/             # Kafka, namespace
│   │   ├── applications/               # Application deployments
│   │   └── eventing/                   # KEDA, Knative eventing
│   ├── overlays/                       # Environment-specific configs
│   │   ├── dev/                        # Development environment
│   │   ├── staging/                    # Staging environment
│   │   └── prod/                       # Production environment
│   └── components/                     # Reusable components
├── docs/                               # Documentation
├── research.md                         # Technical research notes
└── README.md                           # This file

🔧 Technology Stack

Application Layer

Quarkus 3.8.6: Cloud-native Java framework
WebSockets: Real-time genetic data communication
SmallRye Reactive Messaging: Kafka integration
Micrometer: Metrics and monitoring

Infrastructure Layer

Azure Red Hat OpenShift: Container orchestration
AMQ Streams (Kafka): Event streaming platform
OpenShift Serverless (Knative): Scale-to-zero services
KEDA: Event-driven autoscaling
OpenShift AI: ML model serving

Deployment & Operations

Kustomize: Configuration management
OpenShift BuildConfig: Source-to-Image builds
Red Hat Insights: Cost management
Prometheus: Metrics collection

🧪 Testing

Local Development

cd quarkus-websocket-service
./mvnw quarkus:dev

WebSocket Testing

Open http://localhost:8080/genetic-client.html and test with sample genetic sequences:

Basic DNA: ATCGATCGATCG
Complex: ATGCGTACGTAGCTAGCTA

Health Checks

curl http://localhost:8080/q/health
curl http://localhost:8080/q/metrics

KEDA Scaling Verification

# Check KEDA ScaledObjects
oc get scaledobject -n healthcare-ml-demo

# Check HPA created by KEDA
oc get hpa -n healthcare-ml-demo

# Monitor scaling in action
watch oc get pods -n healthcare-ml-demo

# Check KEDA operator status
oc get pods -n openshift-keda | grep keda

📊 Monitoring & Observability

Cost Management

Red Hat Insights: Integrated cost tracking
Cost Center: genomics-research
Project: risk-predictor-v1
Billing Model: Chargeback

Metrics

Application metrics via Micrometer/Prometheus
Kafka metrics for genetic data processing
KEDA scaling metrics
Custom healthcare ML metrics

🔒 Security & Compliance

HIPAA Compliance

Non-root container execution
Security Context Constraints (SCC)
Network policies for traffic isolation
Audit logging enabled

Security Features

TLS encryption for all communications
RBAC for service account permissions
Secure secrets management
Container image scanning

🌍 Environment Configuration

Development

Minimal resource allocation
Ephemeral storage
Debug logging enabled
Single replicas

Production

High availability setup
Persistent storage with backup
Strict resource limits
Production monitoring

📈 Scaling & Performance

Scale-to-Zero

KEDA: Kafka lag-based scaling
Knative: HTTP traffic-based scaling
Cold Start: <10 seconds
Cost Optimization: Zero cost when idle

Performance Targets

WebSocket connection: <100ms latency
Genetic sequence processing: <500ms
Kafka message throughput: 1000 msg/sec
Concurrent connections: 100+

🤝 Contributing

We welcome contributions from the community! This project has many opportunities for enhancement and expansion.

Community Guidelines

Code of Conduct - Our community standards
Security Policy - How to report vulnerabilities
Contributing Guide - Comprehensive contribution opportunities

High Priority Areas

🔥 Red Hat Cost Management Console Access - Help validate console.redhat.com access
📊 Alternative Cost Visualization - Create local dashboards for cost monitoring
🔒 Enhanced Security & Compliance - Implement healthcare-grade security features
🧬 Advanced ML Models - Expand genetic analysis capabilities

Quick Start for Contributors

📖 Read: Contributing Guide for detailed opportunities
🔍 Browse: Open Issues for current needs
🚀 Start: Fork the repository and create a feature branch
✅ Test: Validate changes locally and on OpenShift
📝 Submit: Create a pull request with clear description

Areas Seeking Contributions

Cost Management: Console access validation and alternative dashboards
Security: HIPAA compliance and healthcare-grade security features
ML/AI: Advanced genetic analysis models and OpenShift AI integration
Documentation: Tutorials, guides, and community resources
Performance: Optimization and advanced scaling configurations
Integration: Multi-cloud deployments and healthcare system integration

📚 Documentation

🎯 Complete Documentation Suite - Comprehensive Diátaxis framework documentation

Quick Access

🎓 Tutorials: Getting Started | Local Development | Scaling Demo | Kafka Lag Scaling
🛠️ How-To Guides: Deploy to OpenShift | Advanced Troubleshooting
📖 Reference: API Reference | Configuration
💡 Explanation: System Architecture | Scaling Strategy

Component Documentation

Quarkus WebSocket Service - Threading validation and API endpoints
OpenShift Deployment Guide - Kustomize-based deployment structure
Technical Research - Research foundation and ML approaches
Development Specification - Technical specifications and requirements

📞 Support

Getting Help

📚 Documentation: Start with the Complete Documentation Suite
🔧 Advanced Troubleshooting: Use Context-Aware Debugging Guide
🧠 Augment Code: Leverage AI-Assisted Development Guide
📊 System Logs: oc logs -f deployment/quarkus-websocket-service -n healthcare-ml-demo
⚙️ Configuration: kustomize build k8s/base

Emergency Response

Critical Issues: Follow Emergency Response Procedures
System Recovery: Use Recovery Scripts for automated system restoration
Cost Management: Monitor via Red Hat Insights Cost Dashboard

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Built with ❤️ for healthcare innovation on Azure Red Hat OpenShift

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
.github		.github
docs-site		docs-site
docs		docs
k8s		k8s
notebooks/genetic-analysis		notebooks/genetic-analysis
quarkus-websocket-service		quarkus-websocket-service
scripts		scripts
test-data		test-data
vep-service		vep-service
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMUNITY_SETUP_SUMMARY.md		COMMUNITY_SETUP_SUMMARY.md
CONTRIBUTING.md		CONTRIBUTING.md
DEMO.md		DEMO.md
DEPLOYMENT.md		DEPLOYMENT.md
DOCUMENTATION_SETUP_GUIDE.md		DOCUMENTATION_SETUP_GUIDE.md
ENHANCED-DEPLOYMENT-SUMMARY.md		ENHANCED-DEPLOYMENT-SUMMARY.md
KEDA_SCALING_VALIDATION_REPORT.md		KEDA_SCALING_VALIDATION_REPORT.md
LICENSE		LICENSE
NODE_SCALING_SUMMARY.md		NODE_SCALING_SUMMARY.md
NOTICE		NOTICE
PRODUCTION_READINESS_ASSESSMENT.md		PRODUCTION_READINESS_ASSESSMENT.md
README.md		README.md
README_REVIEW_SUMMARY.md		README_REVIEW_SUMMARY.md
SECURITY.md		SECURITY.md
deploy-docs-local.sh		deploy-docs-local.sh
fix-mdx-issues.py		fix-mdx-issues.py
image-1.png		image-1.png
image.png		image.png
llms.txt		llms.txt
migrate-docs.py		migrate-docs.py
podman-compose.test.yml		podman-compose.test.yml
requirements.txt		requirements.txt
run-docs-local.sh		run-docs-local.sh
strategic-commits-summary.md		strategic-commits-summary.md

License

tosin2013/healthcare-ml-genetic-predictor

Folders and files

Latest commit

History

Repository files navigation

Healthcare ML Genetic Risk Predictor

🧬 Overview

🏗️ Architecture

Key Components

Scaling Modes & KEDA Integration

🚀 Quick Start

Prerequisites

🎯 Choose Your Path

🎓 New to the System?

🚀 Quick Deploy (Experienced Users)

📁 Project Structure

🔧 Technology Stack

Application Layer

Infrastructure Layer

Deployment & Operations

🧪 Testing

Local Development

WebSocket Testing

Health Checks

KEDA Scaling Verification

📊 Monitoring & Observability

Cost Management

Metrics

🔒 Security & Compliance

HIPAA Compliance

Security Features

🌍 Environment Configuration

Development

Production

📈 Scaling & Performance

Scale-to-Zero

Performance Targets

🤝 Contributing

Community Guidelines

High Priority Areas

Quick Start for Contributors

Areas Seeking Contributions

📚 Documentation

🎯 Complete Documentation Suite - Comprehensive Diátaxis framework documentation

Quick Access

Component Documentation

📞 Support

Getting Help

Emergency Response

📄 License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages