Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A real-time genetic risk prediction system built with Quarkus WebSockets, deployed on Azure Red Hat OpenShift with event-driven architecture and scale-to-zero capabilities.

License

Notifications You must be signed in to change notification settings

tosin2013/healthcare-ml-genetic-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Healthcare ML Genetic Risk Predictor

Apache 2.0 License Contributor Covenant Open Source Love

A real-time genetic risk prediction system built with Quarkus WebSockets, deployed on Azure Red Hat OpenShift with event-driven architecture and scale-to-zero capabilities.

🧬 Overview

This project implements a healthcare ML application that processes genetic data in real-time using WebSocket connections, Kafka event streaming, and machine learning inference. The system is designed for cost-effective deployment on OpenShift with comprehensive monitoring and HIPAA-compliant security.

alt text

πŸ—οΈ Architecture

graph TB
    Frontend[Web Frontend] --> WebSocket[Quarkus WebSocket Service]
    WebSocket --> Kafka[Apache Kafka]
    Kafka --> VEP[VEP Annotation Service]
    VEP --> VEPAPI[Ensembl VEP API]
    VEP --> Kafka
    Kafka --> WebSocket
    WebSocket --> Frontend

    KEDA[KEDA Autoscaler] --> WebSocket
    KEDA --> VEP

    Insights[Red Hat Insights] --> Cost[Cost Management]

    subgraph "OpenShift Cluster"
        WebSocket
        VEP
        Kafka
        KEDA
    end
Loading

Key Components

  • 🌐 Quarkus WebSocket Service: Real-time genetic data processing and session management
  • πŸ”¬ VEP Annotation Service: Genetic variant annotation using Ensembl VEP API
  • πŸ“Š Apache Kafka: Event streaming backbone with multiple topics for different scaling modes
  • ⚑ KEDA: Event-driven autoscaling for both pod and node scaling
  • πŸ’° Red Hat Insights: Cost management and observability with chargeback capabilities

Scaling Modes & KEDA Integration

  1. πŸ“Š Normal Mode: Pod scaling based on Kafka lag (genetic-data-raw topic)
    • KEDA Trigger: Kafka consumer lag threshold
    • Scaling: 0-10 pods based on message backlog
    • Use Case: Standard genetic sequence processing
  2. πŸš€ Big Data Mode: Memory-intensive processing (genetic-bigdata-raw topic)
    • KEDA Trigger: Higher partition count, memory-optimized scaling
    • Scaling: 0-5 pods with increased memory allocation
    • Use Case: Large genomic datasets, complex variant analysis
  3. ⚑ Node Scale Mode: Cluster autoscaler with dedicated compute nodes (genetic-nodescale-raw topic)
    • KEDA Trigger: Kafka lag + cluster autoscaler integration
    • Scaling: Triggers new compute-intensive nodes when needed
    • Use Case: Massive workloads requiring additional cluster capacity

πŸš€ Quick Start

Prerequisites

  • Azure Red Hat OpenShift cluster with admin access
  • OpenShift CLI (oc) installed and logged in
  • Java 17 (hard requirement for local development)
  • Podman (preferred over Docker for containerization)
  • Git repository access

🎯 Choose Your Path

πŸŽ“ New to the System?

Start with the Getting Started Tutorial for a complete walkthrough.

πŸš€ Quick Deploy (Experienced Users)

πŸ“– For complete deployment instructions, see DEPLOYMENT.md 🎯 Need help choosing? See Deployment Decision Matrix

βœ… Comprehensive Deployment (Recommended):

# Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor

# Run comprehensive enhanced deployment script (includes ALL components)
./scripts/deploy-clean-enhanced.sh

# Access application (get URL from script output)

⚑ Basic Deployment (Minimal Components):

# For basic deployment without KEDA scaling, OpenShift AI, or advanced features
./scripts/deploy-clean.sh

# Note: This deploys core components only. For full functionality, use enhanced script above.

Manual Quick Start:

# 1. Clone repository
git clone https://github.com/tosin2013/healthcare-ml-genetic-predictor.git
cd healthcare-ml-genetic-predictor

# 2. Deploy operators
oc apply -k k8s/base/operators

# 3. Deploy infrastructure
oc apply -k k8s/base/infrastructure

# 4. Deploy applications
oc apply -k k8s/base/applications/quarkus-websocket -n healthcare-ml-demo
oc apply -k k8s/base/applications/vep-service -n healthcare-ml-demo

# 5. Grant permissions and start builds
oc policy add-role-to-user system:image-puller system:serviceaccount:healthcare-ml-demo:vep-service -n healthcare-ml-demo
oc start-build quarkus-websocket-service -n healthcare-ml-demo
oc start-build vep-service -n healthcare-ml-demo

# 6. Access application
oc get route quarkus-websocket-service -n healthcare-ml-demo
# Open: https://<route-url>/genetic-client.html

πŸ“ Project Structure

healthcare-ml-genetic-predictor/
β”œβ”€β”€ quarkus-websocket-service/          # Quarkus WebSocket application
β”‚   β”œβ”€β”€ src/main/java/                  # Java source code
β”‚   β”œβ”€β”€ src/main/resources/             # Application resources
β”‚   └── pom.xml                         # Maven configuration
β”œβ”€β”€ k8s/                                # OpenShift/Kubernetes manifests
β”‚   β”œβ”€β”€ base/                           # Base Kustomize resources
β”‚   β”‚   β”œβ”€β”€ operators/                  # Operator subscriptions
β”‚   β”‚   β”œβ”€β”€ infrastructure/             # Kafka, namespace
β”‚   β”‚   β”œβ”€β”€ applications/               # Application deployments
β”‚   β”‚   └── eventing/                   # KEDA, Knative eventing
β”‚   β”œβ”€β”€ overlays/                       # Environment-specific configs
β”‚   β”‚   β”œβ”€β”€ dev/                        # Development environment
β”‚   β”‚   β”œβ”€β”€ staging/                    # Staging environment
β”‚   β”‚   └── prod/                       # Production environment
β”‚   └── components/                     # Reusable components
β”œβ”€β”€ docs/                               # Documentation
β”œβ”€β”€ research.md                         # Technical research notes
└── README.md                           # This file

πŸ”§ Technology Stack

Application Layer

  • Quarkus 3.8.6: Cloud-native Java framework
  • WebSockets: Real-time genetic data communication
  • SmallRye Reactive Messaging: Kafka integration
  • Micrometer: Metrics and monitoring

Infrastructure Layer

  • Azure Red Hat OpenShift: Container orchestration
  • AMQ Streams (Kafka): Event streaming platform
  • OpenShift Serverless (Knative): Scale-to-zero services
  • KEDA: Event-driven autoscaling
  • OpenShift AI: ML model serving

Deployment & Operations

  • Kustomize: Configuration management
  • OpenShift BuildConfig: Source-to-Image builds
  • Red Hat Insights: Cost management
  • Prometheus: Metrics collection

πŸ§ͺ Testing

Local Development

cd quarkus-websocket-service
./mvnw quarkus:dev

WebSocket Testing

Open http://localhost:8080/genetic-client.html and test with sample genetic sequences:

  • Basic DNA: ATCGATCGATCG
  • Complex: ATGCGTACGTAGCTAGCTA

Health Checks

curl http://localhost:8080/q/health
curl http://localhost:8080/q/metrics

KEDA Scaling Verification

# Check KEDA ScaledObjects
oc get scaledobject -n healthcare-ml-demo

# Check HPA created by KEDA
oc get hpa -n healthcare-ml-demo

# Monitor scaling in action
watch oc get pods -n healthcare-ml-demo

# Check KEDA operator status
oc get pods -n openshift-keda | grep keda

πŸ“Š Monitoring & Observability

Cost Management

  • Red Hat Insights: Integrated cost tracking
  • Cost Center: genomics-research
  • Project: risk-predictor-v1
  • Billing Model: Chargeback

Metrics

  • Application metrics via Micrometer/Prometheus
  • Kafka metrics for genetic data processing
  • KEDA scaling metrics
  • Custom healthcare ML metrics

πŸ”’ Security & Compliance

HIPAA Compliance

  • Non-root container execution
  • Security Context Constraints (SCC)
  • Network policies for traffic isolation
  • Audit logging enabled

Security Features

  • TLS encryption for all communications
  • RBAC for service account permissions
  • Secure secrets management
  • Container image scanning

🌍 Environment Configuration

Development

  • Minimal resource allocation
  • Ephemeral storage
  • Debug logging enabled
  • Single replicas

Production

  • High availability setup
  • Persistent storage with backup
  • Strict resource limits
  • Production monitoring

πŸ“ˆ Scaling & Performance

Scale-to-Zero

  • KEDA: Kafka lag-based scaling
  • Knative: HTTP traffic-based scaling
  • Cold Start: <10 seconds
  • Cost Optimization: Zero cost when idle

Performance Targets

  • WebSocket connection: <100ms latency
  • Genetic sequence processing: <500ms
  • Kafka message throughput: 1000 msg/sec
  • Concurrent connections: 100+

🀝 Contributing

We welcome contributions from the community! This project has many opportunities for enhancement and expansion.

Community Guidelines

High Priority Areas

  • πŸ”₯ Red Hat Cost Management Console Access - Help validate console.redhat.com access
  • πŸ“Š Alternative Cost Visualization - Create local dashboards for cost monitoring
  • πŸ”’ Enhanced Security & Compliance - Implement healthcare-grade security features
  • 🧬 Advanced ML Models - Expand genetic analysis capabilities

Quick Start for Contributors

  1. πŸ“– Read: Contributing Guide for detailed opportunities
  2. πŸ” Browse: Open Issues for current needs
  3. πŸš€ Start: Fork the repository and create a feature branch
  4. βœ… Test: Validate changes locally and on OpenShift
  5. πŸ“ Submit: Create a pull request with clear description

Areas Seeking Contributions

  • Cost Management: Console access validation and alternative dashboards
  • Security: HIPAA compliance and healthcare-grade security features
  • ML/AI: Advanced genetic analysis models and OpenShift AI integration
  • Documentation: Tutorials, guides, and community resources
  • Performance: Optimization and advanced scaling configurations
  • Integration: Multi-cloud deployments and healthcare system integration

πŸ“š Documentation

🎯 Complete Documentation Suite - Comprehensive DiÑtaxis framework documentation

Quick Access

Component Documentation

πŸ“ž Support

Getting Help

  1. πŸ“š Documentation: Start with the Complete Documentation Suite
  2. πŸ”§ Advanced Troubleshooting: Use Context-Aware Debugging Guide
  3. 🧠 Augment Code: Leverage AI-Assisted Development Guide
  4. πŸ“Š System Logs: oc logs -f deployment/quarkus-websocket-service -n healthcare-ml-demo
  5. βš™οΈ Configuration: kustomize build k8s/base

Emergency Response

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with ❀️ for healthcare innovation on Azure Red Hat OpenShift

About

A real-time genetic risk prediction system built with Quarkus WebSockets, deployed on Azure Red Hat OpenShift with event-driven architecture and scale-to-zero capabilities.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •