AITH-hack23

AI Talent Hub Hackathon 2023, Data Driven Life Science track

ML-Based Biomarker Discovery on Bulk RNA-Seq Data

This tool aims at providing researchers with a more effective and simpler (compared to conventional tools like deseq2) workflow for detecting biomarkers - genes that may indicate a specific biological condition (e.g. disease).

Team:

Fedor Logvin (ITMO University, Saint Petersburg, Russia)
Anton Changailidi (ITMO University, Saint Petersburg, Russia)
Danil Trotsenko (ITMO University, Saint Petersburg, Russia)
Timur Sheydaev (ITMO University, Saint Petersburg, Russia)
Xenia Sukhanova (ITMO University, Saint Petersburg, Russia)

ML Pipeline

The analysis consists of following steps:

A maximum-based low counts filter (Rau et al.) to eliminate genes with low counts.
Bayesian search to obtain the best hyperparameters values for XGboost and Elastic net logistic regression models.
Model gets fitted on a specified number of random subsamples of 80% from row count, using the best hyperparametes discovered earlier.
For both models, (n_obs) important genes are retained from each iteration; at the end all genes which occur in specified number of iterations are kept.
To identify, whether expression of selected genes significantly differs within defined groups, Mann-Whitney U test is performed. FDR is controlled at level a=0.05.

System requirements

Custom Rau filter written in C# requires a .NET framework, which can be found here.
Python 3.9+ is recommended, older versions were not tested.
Required python packages can be found in requirements.txt. Keep in mind that scikit-optimize requires older NumPy versions(<=1.23.5).

Deployment

Specify Telegram bot token and a logging directory in config.toml
On Linux, install systemctl services for Dash app and Telegram bot (copy service config files to /lib/systemd/system/)
Run systemctl services
Run redis server for RQ job scheduler

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
assets		assets
data		data
mail		mail
services		services
tg		tg
.gitignore		.gitignore
Main.py		Main.py
PyRauLCF.py		PyRauLCF.py
README.md		README.md
RauLowCountsFilter.deps.json		RauLowCountsFilter.deps.json
RauLowCountsFilter.dll		RauLowCountsFilter.dll
SharpDX.D3DCompiler.dll		SharpDX.D3DCompiler.dll
SharpDX.DXGI.dll		SharpDX.DXGI.dll
SharpDX.Direct3D11.dll		SharpDX.Direct3D11.dll
SharpDX.dll		SharpDX.dll
config.toml		config.toml
dash_app.py		dash_app.py
requirements.txt		requirements.txt
rq_sch.py		rq_sch.py
tg_bot.py		tg_bot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

AITH-hack23

ML-Based Biomarker Discovery on Bulk RNA-Seq Data

ML Pipeline

System requirements

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Uh oh!

Uh oh!

TSheyd/AITH-hack23

Folders and files

Latest commit

History

Repository files navigation

AITH-hack23

ML-Based Biomarker Discovery on Bulk RNA-Seq Data

ML Pipeline

System requirements

Deployment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages