Thanks to visit codestin.com
Credit goes to github.com

Skip to content
JulieBaker1 edited this page Aug 25, 2022 · 40 revisions

HighlyRegionalGenes - Guided Tutorial

Introduction

Select genes that have biological significance is useful for downstream analysis.Here we bring up a feature selection method called HighlyRegionalGeneswhich identify genes with regional patterns.The method mainly contains three steps:

  • construct SNN
  • calculate score and rank genes
  • select the regional distributed genes

We use iteration to make the SNN more precise.Below is the workflow.

figure1.png

The code below can be obtained from HighlyRegionalGenes.The method can be recognize as a alternative for the HighlyVarationalgene() feature-selection method in Seurat.

Installation

the package can be installed directly from the github.

install.packages("devtools")
library(devtools)
install_github("JulieBaker1/HighlyRegionalGenes")

Quick start

Here, we provide an example data of pbmc from 10X Genomics. Users can download it and run following scripts to understand the workflow of HighlyRegionalGenes.

Step1:Load library

library(Seurat)
library(HighlyRegionalGenes)

Step2:Run pca

Our idea is based on the cell neighborhood relationship. If an neighborhood relationship can't be obtained in advance, we should first run PCA to construct neighborhood relationship(SNN in our method), otherwise, this step can be ignored.Notably, besides all the genes to run PCA, the genes find by Seurat's HighlyVariablegenes() function or genes you think are beneficial to construct neighborhood relationship are also recommended.

# load the data
pbmc=readRDS("pbmc.rds")
# provide the features that used to run PCA.
all.genes=rownames(pbmc)
pbmc=ScaleData(pbmc,features=all.genes,verbose = FALSE)
pbmc=RunPCA(pbmc,features=all.genes,verbose = FALSE)
# get the dimensions used to construct SNN.
ElbowPlot(pbmc)

elbow.png

Step3: Identify highly regional features

To ensure our neighborhood relationship is accurate, we use newly found features to construct SNN again and again until the overlap between the newly found features and the previous features comes to overlap_stop, 0.75 by default. If you pursue efficiency and don't want the iteration, just set overlap_stop 0.

the parameter that user can define.

  • obj:an Seurat object
  • dims:the dimentions of PCA used to construct SNN.(10 by default)
  • nfeatures:the number of features for downstream analysis.(2000 by default)
  • overlap_stop:when the overlap come to this number, iterate will stop.(0.75 by default)
  • snn:the neighborhood relationship that is calculated in advance.
  • max_iteration:the max iteration times
  • neigh_num:number of neighbors
  • is.save:whether to save intermediate results
  • dir:the save directory

example:

    # find regional distributed genes
    pbmc=FindRegionalGenes(pbmc,dims = 1:10,nfeatures = 2000,overlap_stop = 0.95)
    # the featureplot of the 2 most highly regional genes
    FeaturePlot(pbmc,head(RegionalGenes(pbmc), 2))

featureplot.png

Step4:Choose gene number

We can set gene number manually or detect automatically by finding knee point.

   # detect automatically by finding knee point. If you set gene number manually, you can skip  this step.
   gene_num = HRG_elbowplot(pbmc)
   #output
   reional_gene = RegionalGenes(pbmc,nfeatures = gene_num)

score_elbow_plot.png