Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AI Talent Hub Hackathon 2023, Data Driven Life Science track

TSheyd/AITH-hack23

Repository files navigation

AITH-hack23

AI Talent Hub Hackathon 2023, Data Driven Life Science track

ML-Based Biomarker Discovery on Bulk RNA-Seq Data

This tool aims at providing researchers with a more effective and simpler (compared to conventional tools like deseq2) workflow for detecting biomarkers - genes that may indicate a specific biological condition (e.g. disease).

Team:

  • Fedor Logvin (ITMO University, Saint Petersburg, Russia)
  • Anton Changailidi (ITMO University, Saint Petersburg, Russia)
  • Danil Trotsenko (ITMO University, Saint Petersburg, Russia)
  • Timur Sheydaev (ITMO University, Saint Petersburg, Russia)
  • Xenia Sukhanova (ITMO University, Saint Petersburg, Russia)

ML Pipeline

The analysis consists of following steps:

  1. A maximum-based low counts filter (Rau et al.) to eliminate genes with low counts.
  2. Bayesian search to obtain the best hyperparameters values for XGboost and Elastic net logistic regression models.
  3. Model gets fitted on a specified number of random subsamples of 80% from row count, using the best hyperparametes discovered earlier.
  4. For both models, (n_obs) important genes are retained from each iteration; at the end all genes which occur in specified number of iterations are kept.
  5. To identify, whether expression of selected genes significantly differs within defined groups, Mann-Whitney U test is performed. FDR is controlled at level a=0.05.

System requirements

  • Custom Rau filter written in C# requires a .NET framework, which can be found here.
  • Python 3.9+ is recommended, older versions were not tested.
  • Required python packages can be found in requirements.txt. Keep in mind that scikit-optimize requires older NumPy versions(<=1.23.5).

Deployment

  • Specify Telegram bot token and a logging directory in config.toml
  • On Linux, install systemctl services for Dash app and Telegram bot (copy service config files to /lib/systemd/system/)
  • Run systemctl services
  • Run redis server for RQ job scheduler

About

AI Talent Hub Hackathon 2023, Data Driven Life Science track

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages