Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Data analysis for the manuscript, "Site- and cell-type-specific miRNA and mRNA genes and networks across the cortex, striatum, and hypothalamus", Zacharias et al., Comm Biol, 2025

Notifications You must be signed in to change notification settings

amzacharias/chronoCNS

Repository files navigation


Preface

Data analysis for “Site- and cell-type-specific miRNA and mRNA genes and networks across the cortex, striatum, and hypothalamus”.


Setup

Important:

  • Consider reading the README.html file which has a floating table of contents.

  • This project assumes you are using resources from the The Centre for Advanced Computing.

    • The CAC uses SLURM to allocate jobs.
    • It is highly recommended that you use a cloud computing system. You may need to edit scripts to load dependencies in a manner compatible with your system.
  • Ensure all scripts and data are stored in an R project folder.

  • Script names are numbered so the order of execution is more obvious.

  • Set the R current working directory to the project working directory. Most scripts assume that the project directory is the current working directory.

  • Caution! Some scripts use absolute paths (especially bash scripts)

    • Run the following commands in the terminal to replace the absolutePath spaceholder found in scripts with your absolute path to the project directory.
    find . -type f -name "*.sh" -exec sed -i'' -e 's#absolutePath#/my/custom/path#g' {} +
    find . -type f -name "*.R" -exec sed -i'' -e 's#absolutePath#/my/custom/path#g' {} +
    

Primary session info:

  • R version 4.4.0 (2024-04-24)
  • Platform: x86_64-redhat-linux-gnu (64-bit)
  • Running under: CentOS Linux 7 (Core)
  • Matrix products: default
  • BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

Packages:

Package Version
arrayQualityMetrics 3.60.0
Biobase 2.64.0
biomaRt 2.60.0
cividis 0.2.0
colorspace 2.1-0
ComplexHeatmap 2.20.0
ComplexUpset 1.3.3
cowplot 1.1.3
DESeq2 1.44.0
devtools 2.4.5
dplyr 1.1.4
DT 0.33
dynOmics 1.0
edgeR 4.2.0
GEOquery 2.72.0
ggplot2 3.5.1
ggpubr 0.6.0
gprofiler2 0.2.3
Hmisc 5.1-2
htmltools 0.5.8.1
IsoformSwitchAnalyzeR 1.18.0
knitr 1.46
limma 3.60.0
lmms 1.3.3
MetaCycle 1.2.0
miRBaseConverter 1.11.1
multiMiR 1.26.0
optparse 1.7.5
patchwork 1.2.0
pheatmap 1.0.12
purrr 1.0.2
rain 1.38.0
RColorBrewer 1.1-3
readxl 1.4.3
renv 1.0.7
rmarkdown 2.26
rsconnect 1.3.1
rtracklayer 1.64.0
scales 1.3.0
shiny 1.8.1.1
shinythemes 1.2.0
stringr 1.5.1
tibble 3.2.1
tidyr 1.3.1
tidyverse 2.0.0
UpSetR 1.4.0
VennDiagram 1.7.3
WGCNA 1.72-5

Main pipeline

Helpers

Notice the 0_helpers folder. This directory contains many R functions that minimize repetition of code and are generally helpful.

Download data

  1. Navigate to the 0_data folder.
    • R current working directory remains the project working directory
    • Terminal working directory becomes ./0_data by running cd ./0_data in the command line
  2. Make the following folders: seqreads and series.
  3. Manually download SRA Accession Lists and SRA metadata / run tables from…
    1. mRNA: BioProject PRJNA636378
    2. microRNA: BioProject PRJNA636377
  4. Run bash scripts beginning with download.
    1. Note that these files use absolute paths
    2. Dependencies: StdEnv/2020 gcc/9.3.0 sra-toolkit/2.10.8
    3. For every accession ID in a dataset, prefetch, fastq-dump, and gzip the relevant data
    4. Use .out logs to monitor download progress
    5. Scripts aren’t written to parallelize downloads of files, but I recommend threads for future users!
  5. Prepare metadata/coldata. Note: coldata will be used interchangeably with metadata.
    1. Download GEO metadata.

      1. Run ./1_readSeriesMatrix.R which downloads GEO series matrixes, in txt format and converts to csv.
      • If you prefer to not use the GEOQuery package, it’s likely possible to directly use the txt files.
    2. Run 2_makeColdata.R to make coldata files by merging the SRA run tables and series matrixes. This file also makes new ctTime and ztTime columns. Finally, the script removes columns that aren’t directly needed for the project.

  6. Run ./3_timeDesign.R to inspect the number of samples per timepoint, tissue, and sequence type.

QC of sequencing reads (1)

  1. Navigate to the 1_qcSeqReads/1_qcB4Trim folder.

    • R current working directory remains the project working directory
    • Terminal working directory becomes ./1_qcSeqReads/1_qcB4Trim by running cd ./1_qcSeqReads/1_qcB4Trim in the command line
  2. Run 1_writeFastqcScripts.R to generate individual FastQC scripts.

    • Rather than running quality control on every sample in a loop, run multiple scripts at once.
  3. Execute fastqc scripts. Do not execute all scripts at once! I recommend running 10 at a time. Use 2_checkSuccess.R and jobsToRun.sh to ensure all jobs have been run!

    # 1 cpu, max 10 gigabytes of memory
    module load StdEnv/2020
    module load nixpkgs/16.09
    module load fastqc/0.11.9
    
    fastqc -f fastq -o $OUTDIR $INDATAPATH
    
  4. Run 3_writeMultiqcScripts.R to generate a multiQC script for each tissue.

  5. Execute multiqc scripts.

    # 1 cpu, max 1 GB of memory
    module load StdEnv/2020 python/3.9.6
    #pip install --user multiqc
    #pip install --user --upgrade multiqc
    
    # Begin MultiQC
    multiqc \
      --outdir $OUTDIR \
      --filename $FILENAME \
      --force \
      --interactive \
      --cl_config "fastqc_config: { fastqc_theoretical_gc: mm10_txome }" \
      $FQPATHS
    

Clean sequencing reads

  1. Clean mRNA reads with Trimmomatic only.
    1. Navigate to ./2_trimMRNA .

    2. Run 1_writeIndivScripts.R.

    3. Run individual scripts. Use 2_checkSuccess.R and jobsToRun.sh to ensure all jobs have been run.

      # 5 cpu, max 5 GB memory
      # Dependencies
      module load nixpkgs/16.09 trimmomatic/0.36 # trimmomatic
      
      # Begin Trimmomatic
      java -jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/trimmomatic/0.36/trimmomatic-0.36.jar PE \
      -threads 10 \
      $FWDPATH $REVPATH \
      ${OUTPATH}/${ID}_1.pair.trim.fastq.gz ${OUTPATH}/${ID}_1.unpair.trim.fastq.gz \
      ${OUTPATH}/${ID}_2.pair.trim.fastq.gz ${OUTPATH}/${ID}_2.unpair.trim.fastq.gz \
      SLIDINGWINDOW:4:20 MINLEN:36
      
      • SLIDINGWINDOW:4:20 = Over a sliding window of 4 bps, remove bps with an average phred quality score below 20
      • MINLEN:36 = Drop a read if it’s below 36 bps long
  2. Clean microRNA reads with CutAdapt and Trimmomatic.
    1. Navigate to ./2_trimMiRNA.

    2. Run 1_writeIndivScripts.R.

    3. Run individual scripts. Use 2_checkSuccess.R and jobsToRun.sh to ensure all jobs have been run.

      # 5 cpu, max 5 GB memory
      # Begin CutAdapt
      cutadapt --cores 10 \
        --adapter TGGAATTCTCGGGTGCCAAGG \
        --error-rate 0.25 \
        --no-indels \
        --minimum-length 15 \
        --overlap 6 \
        --times 1 \
        --match-read-wildcards \
        --untrimmed-output ${OUTPATH}/cutAdapt/${ID}.NO3AD.fastq.gz \
        --too-short-output ${OUTPATH}/cutAdapt/${ID}.short.fastq.gz \
        --output ${OUTPATH}/cutAdapt/${ID}.cutClean.fastq.gz \
        $FWDPATHs
      #
      # Begin Trimmomatic
      java -jar /cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/trimmomatic/0.36/trimmomatic-0.36.jar SE \
        -threads 10 \
        ${OUTPATH}/cutAdapt/${ID}.cutClean.fastq.gz \
        ${OUTPATH}/${ID}.trim.fastq.gz \
        SLIDINGWINDOW:4:20 MINLEN:15
      
      • CutAdapt parameters from Encode project’s pipeline
      • -a = 3 prime adapter sequence, from Illumina website
      • -e = maximum allowed error rate when finding adapters
      • –no-indels = no indels when matching adapters
      • -m = minimum processed read length
      • -O = minimum overlap between adapter and read sequence, ignored for anchored adapters

QC of sequencing reads (2)

  1. Repeat steps from QC of sequencing reads (1). For mRNA, navigate to ./1_qcSeqReads/2_qcAftTrim. For miRNA, navigate to ./1_qcSeqReads/2_qcAftCutTrim.

Download and index reference genome

  1. Navigate to the ./0_resources/gencode folder.
  2. Make the following folders: indexHisat2 and indexStar.
  3. Run the 1_downloadGencode.sh script to get the primary fasta and gtf files for GRCm39.
  4. Run the 2_subsetMiRNA.sh script to extract features whose transcript_type is “miRNA” from the gtf file.
  5. Run the 3_indexHisat2 script to prepare the reference genome for Hisat2 alignment. This script uses hisat2_extract_splice_sites.py and hisat2_extract_exons.py to improve Hisat2’s handling of splice sites.
  6. Run the 3_indexStar.sh script to prepare the reference genome for STAR alignment. --sjdbOverhang 50 = an input argument for indexing because maximum read length is 51.

Align and quantify microRNA reads

  1. Navigate to the 3_alignStar folder.

  2. Generate an individual script for each sample with 1_writeStarScripts.R.

  3. Run scripts. Use 2_checkSuccess.R and jobsToRun.sh to ensure all jobs have been run.

    # 5 cpus, 40 GB max memory. Each script takes ~12 minutes.
    module load StdEnv/2020 gcc/9.3.0 star/2.7.9a samtools/1.13
    # align and quantify
    PARAMS='--runThreadN 10 --alignEndsType EndToEnd \
    --outFilterMismatchNmax 1 --outFilterMultimapScoreRange 0 \
    --quantMode TranscriptomeSAM GeneCounts --outReadsUnmapped Fastx \
    --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 10 \
      --outSAMunmapped Within --outFilterScoreMinOverLread 0\
      --outFilterMatchNminOverLread 0 --outFilterMatchNmin 16 \
      --alignSJDBoverhangMin 1000 --alignIntronMax 1  \
      --outWigType wiggle --outWigStrand Stranded --outWigNorm RPM '
    STAR --genomeDir $INDEX \
      --sjdbGTFfile $MIRNA_GTF \
      --readFilesCommand gunzip -c \
      --readFilesIn $SINGLE_END_1 \
      --outFileNamePrefix ${OUT_PATH}. \
      $PARAMS
    samtools index -@ 20 ${OUT_PATH}.Aligned.sortedByCoord.out.bam
    
    • STAR parameters from Encode project’s pipeline
    • –alignEndsType EndToEnd = force end-to-end read alignment (no soft-clipping)
    • –outFilterMismatchNmax 1 = maximum number of mismatches in alignment
    • –outFilterMultimapScoreRange 0 = if a read maps to multiple regions, only alignments with a score matching the best alignment will be output
    • –quantMode TranscriptomeSAM GeneCounts = output sam/bam file with transcript alignments & output matrix with number of reads aligning to each “gene”.
    • –outReadsUnmapped Fastx = output unmapped or partially mapped reads
    • –outSAMtype BAM SortedByCoordinate = output a bam file that is sorted by coordinate
    • –outFilterMultimapNmax 10 = the default, max # loci a read can align to
    • –outSAMunmapped Within = output unmapped reads within the main SAM file (i.e. Aligned.out.sam)
    • –outFilterScoreMinOverLread 0 = alignment will be output only if the number of matched bases is higher than or equal to this value, normalized to read length
    • –outFilterMatchNminOverLread 0 = alignment will be output only if the number of matched bases is higher than or equal to this value., normalized to read length
    • –outFilterMatchNmin 16 = same as outFilterMatchNminOverLread, but not normalized. In other words, the minimum mapped read length is 16 bps long.
    • –alignSJDBoverhangMin 1000 = minimum overhang (i.e. block size) for annotated spliced alignments
    • –alignIntronMax 1 = maximum intron length
    • … remaining parameters are for generating wiggle files, which can be used to visualize results with the UCSC genome browser or Integrative Genomics Viewer.
  4. Run the 3_getRates.R script to get an overview of STAR’s alignment rates.

Align and quantify mRNA reads

Alignment

  1. Navigate to ./3_alignHisat2

  2. Generate an alignment script for each sample with 1_writeHisat2Scripts.R.

  3. Run scripts. Use 2_checkSuccess.R and jobsToRun.sh to ensure all jobs have been run.

    # 5 cpu, 15 GB max memory, each script takes ~1-2 hours
    module load StdEnv/2020 samtools/1.10 hisat2/2.2.1 
    echo ALignment Started at $(date +'%T')
    hisat2 -p 7 -x $INDEX -1 $PAIRED_END_1 -2 $PAIRED_END_2 \
      --dta --sensitive --no-discordant --no-mixed \
      --summary-file $SUMMARY_PATH --time --verbose \
      -S ${ALIGN_PATH}.sam
    
    echo Samtools processing started at $(date +'%T')
    samtools view -b -@ 7 ${ALIGN_PATH}.sam > ${ALIGN_PATH}.bam
    rm ${ALIGN_PATH}.sam
    echo collate started at $(date +'%T')
    samtools collate -@ 7 -o ${ALIGN_PATH}.col.bam  ${ALIGN_PATH}.bam ${ALIGN_PATH}_tmpcol
    rm ${ALIGN_PATH}.bam
    echo fixmate started at $(date +'%T')
    samtools fixmate -m -@ 7 ${ALIGN_PATH}.col.bam  ${ALIGN_PATH}.fix.bam 
    rm ${ALIGN_PATH}.col.bam
    echo sort started at $(date +'%T')
    samtools sort -@ 7 -T ${ALIGN_PATH}_sort -o ${ALIGN_PATH}.sort.bam ${ALIGN_PATH}.fix.bam 
    rm ${ALIGN_PATH}.fix.bam
    echo markdup started at $(date +'%T')
    samtools markdup -@ 7 -T ${ALIGN_PATH}_tmpmrk -s ${ALIGN_PATH}.sort.bam ${ALIGN_PATH}.sort.mrkdup.bam 
    rm ${ALIGN_PATH}.sort.bam
    echo index started at $(date +'%T')
    samtools index -b -@ 7 ${ALIGN_PATH}.sort.mrkdup.bam  ${ALIGN_PATH}.sort.mrkdup.bam.bai
    
    • Hisat2 parameters adapted from the Beijing Genomic’s Institute’s arguments example dataset 1, example dataset 2
      • –dta = reported alignments tailored for tools like StringTie. Require longer anchor lengths for novel splice sites
      • –sensitive = same as --bowtie2-dp 1 -k 30 --score-min L,0,-0.5
        • –bowtie2-dp 1 = use Bowtie2’s conditional dynamic programming.
        • -k 30 = search for at most 30 distinct primary alignments for each read. Default = 5 (linear index) or 10 (graph index).
        • –score-min = minimum score function for an alignment to be valid f(x) = 0 + -0.5 * x where x = read length. Default = L,0,-0.2.
      • –no-discordant = don’t allow unique alignment of mates
      • –no-mixed = don’t try to find alignments for individual mates after hisat fails to identify concordant/discordant alignments
    • Using samtools to 1) convert SAM to BAM, 2) mark duplicates and sort the BAM file, & 3) dndex the bam file. For marking duplicates and sorting by coordinates, use the example workflow from the samtools-markdup manual; author = Andrew Whitwham from the Sanger Institute
  4. Run the 3_getRates.R script to get an overview of Hisat2’s alignment rates.

Quantification

  1. Navigate to ./4_stringtie

  2. Run 1_writePass1IndivScripts.R to write individual scripts for pass 1. Execute scripts in the pass1IndivScripts directory. Use the 2_checkSuccess.R and jobsToRun.sh scripts to monitor progress.

    # 1 cpu, 5 GB memory
    # REF_GTF is the full GTF file from Gencode
    module load StdEnv/2020 stringtie/2.1.5
    stringtie $INPUT -p 5 -G $REF_GTF -o $OUT_GTF
    
  3. Run 3_writeGtfLists.R and 4.0_writeMergeScripts.R to prepare the merging of individual GTFS from pass 1. Tissues are kept separate!

  4. Run *.sh* files in the 4_merge folder to execute the merging of GTF files.

    # 5 cpu, 3 GB memory
    module load StdEnv/2020 stringtie/2.1.5
    stringtie --merge -p 20 -o $OUTPUT -G $REF_GTF $GTFS_LIST
    
  5. Evaluate StringTie performance with 5.1_writeGffCompareScripts.R and *.sh scripts in the 5_gffCompare folder.

  6. Run 1_writePass2IndivScripts.R to write individual scripts for pass 2. Execute scripts in the pass2IndivScripts directory. Use the 2_checkSuccess.R and jobsToRun.sh scripts to monitor progress.

    # 1 cpu, 5 GB memory
    # REF_GTF is the merged gtf that corresponds to this sample's tissue
    module load StdEnv/2020 stringtie/2.1.5
    stringtie $INPUT -b $BALL -e -p 5 -G $REF_GTF -o $OUT_GTF
    
  7. To generate gene count matrixes, switch your R version to 4.2.1 and run 7_isoformAnalyzeR.R

    • This script uses an absolute path! Edit the script to use your project directory.

Data preparation

  1. Navigate to 5_dataPrep
  2. Clean miRNA count matrixes
  3. Navigate to the miRNA folder.
  4. Run 0_id2name.R to get a dataframe with ensembl ID to gene name/symbol conversion information.
  5. Run 0_makeCountMats.R to merge gene count matrixes from STAR into single dataframes, one per tissue.
  6. Run 1_outlierRemoval.R to … 1. Perform outlier detection with arrayQualityMetrics. A sample is considered an outlier if
    • it is marked as an outlier before and after normalization by the same outlier detection metrics, and/or,
    • it is marked as an outlier by multiple outlier detection metrics after normalization 2. Normalize counts with the weighted trimmed mean of M-values method
  7. Run 2_corrReps.R to get the spearman correlation between samples, before and after outlier removal.
  8. Run 2_filtering.R to perform non-specific filtering to remove lowly expressed features (mean CPM < 1).
  9. Repeat steps from Clean miRNA count matrixes, except replace “miRNA” with “mRNA” in the folder name.
  • Caveats:
    1. No need to merge count matrixes for each tissue.
    2. Samples SRR11902345 and SRR11902411 were manually removed (hard coding) from the hypothalamus dataframes upon inspection of PCA.
  1. Inspect the number of samples per tissue and timepoint after sample removal with timeDesign.R

Identify cycling genes

  1. Navigate to ./6_rhythmicity
  2. Identify cycling mRNAs
    1. Navigate to 1_mRNA24h

    2. Run the 1_metacycle script to identify 24-hour period cycling features. Cycling genes have a combined, Benjamini-Hochberg corrected p-value that is less than 0.05.

      meta2d(
          infile = inPath,
          outdir = outPath,
          filestyle = "csv",
          minper = 24,
          maxper = 24,
          timepoints = "line1",
          outputFile = TRUE,
          combinePvalue = "fisher",
          cycMethod = c("JTK", "LS"),
          nCores = 1
      )
      
    3. Summarize results with 2_summarizeRes.R

    4. To inspect/visualize results, run 3_make*.R scripts.

    5. Repeat 1-4. with RAIN in the 1_mRNAHarm directory. Cycling genes have a period 4 to 30 hours and adjusted p-value < 0.05.

      rain(
          x = df,
          deltat = 3, # sampling interval
          period = 17, # period to search for
          period.delta = 13, # period +/- delta to consider
          peak.border = c(0.3, 0.7), # default
          measure.sequence = sequenceLists[[set]],
          method = "independent",
          # MC options are bonferroni, benjamini-hochberg (BH), or adaptive BH (ABH).
          adjp.method = "ABH",
          verbose = TRUE
      )
      
    6. Repeat 1-4 with ARSER from MetaCycle in the 1_mRNAArser directory. Cycling genes have a period 4 to 30 hours and adjusted p-value < 0.05.

      meta2d(
          infile = inPath,
          outdir = outPath,
          filestyle = "csv",
          minper = 4,
          maxper = 30,
          timepoints = "line1",
          outputFile = TRUE,
          combinePvalue = "fisher",
          cycMethod = "ARS",
          nCores = 1
      )
      
      • Run 1.5_getDomCycle.R to select the cycle with the largest amplitude when one gene has multiple cycles. Run this script before running 2*.R and 3*.R scripts.
  3. Identify cycling miRNAs
    1. Repeat steps from Identify cycling mRNAs, except replace “mRNA” with “miRNA” in folder and filenames.
      • JTK cycle will not run for the Corpus Striatum because there are no samples from time 15.
  4. Compare MetaCycle vs RAIN vs ARSER
    1. Navigate to the 2_24hVsHarm directory to compare MetaCycle results to RAIN results.
      1. Run 1_compare24VsHarm.R to quantify the similarities and differences across results.
      2. Run 2*.R scripts to visualize the similarities and differences across results.
    2. Navigate to the 2_ArserVsHarm directory to compare ARSER results to RAIN results.
      1. Run 1_compareAvsR.R to quantify the similarities and differences across results.
      2. Run 2*.R to visualize the similarities and differences across results.
  5. Compare rhythmic feature across tissues
    1. Navigate to the 2_compareTissues24h folder
    2. Run 1_differenceTissues to find genes that are unique to each tissue. iii.Run 2_sharedTissues to find genes that are shared across multiple tissues.
    3. Run 3_sharedDiffParams.R to see how shared genes have different or similar cycling patterns across tissues.
    4. Run 4_plotCompareTissues.R to visualize shared and different genes across tissues (venn diagrams and upset plots).
    5. Repeat for results from RAIN by navigating to the 2_compareTissuesHarm folder, and running scripts with the aforementioned names.
  6. Compare cycling genes to genes previously associated with chronotype.
    1. Navigate to the 2_compareGwasChrono24h folder.
    2. Run the 1_chronotypeGWAS.R script.

Pairwise association between microRNA and mRNAs

Find previously observed targets of cycling microRNAs

  1. Prepare an ensembl ID to mature mirbase ID conversion table with 7_0_mirPrep/miRNAConvertTable24h.R
  2. Identify experimentally validated and predicted targets with MultiMiR.
  1. Navigate to 7.1_multiMiR.
  2. Run 1_querymultiMiR24h.R to query the package.
  3. Run 2_processmultiMiR24h. to explore and format the results.

DynOmics

  1. Navigate to ./7.2_dynOmics
  2. Run the 1_runDynOmics.sh script to identify cycling mRNA-microRNA pairs in each tissue.
    • Note that the Corpus Striatum is skipped because there were no cycling microRNAS.
  3. Run 2_examineAssocs.R and 2_subsetMultiCycl.R to inspect/summarize the results.
  4. Run 3_compareTissueAssocs.R to compare results across tissues.
  5. Run 3_plotAssocs.R and 3_compareTissueAssocsUpset.R to plots the results.

Networks of mRNA and microRNAs

Create co-expression networks

  1. Navigate to ./8_wgcna
  2. Use 1_runWgcna.sh to run 1_makeNetworkMRNA.R and 1_makeNetworkMiRNA.R remotely.
    1. Input is the log2 TMM counts of features that remain after non-specific filtering
    2. Determine the optimal soft threshold for a signed network is calculated. We use a 0.85 threshold scale-free topology fit for plotting.
    3. Calculate the similarity between features (similarity = (correlation + 1) / 2). What is the relationship between gene A and gene B?
    4. Calculate the adjacency (`similarity^soft threshold). Adds weighting to connectivity between genes.
    5. Calculate the signed TOM matrix. What is the relationship between the neighbors of gene A and gene B?
    6. Cluster genes using the dissimilarity TOM matrix (1 - TOM)
    7. Using dynamic tree cutting to identify modules. Relevant parameters: deepSplit = 4, pamRespectsDendro = FALSE, minClusterSize = 30
    8. Calculate eigengenes. iix. Merge modules with high correlations between eigengenes. Maximum dissimilarity that qualifies merging is 0.3 (cutHeight = 0.3)
    9. Calculate module membership. Essentially, the pearson correlation between module eigengenes and the feature’s expression.
    10. Get hub genes. In this case, the hub gene is the gene with the highest connectivity in a module.
    11. Annotate results with what we already know about the genes. I.e. make the geneInfo dataframes.

Investigate modules

  1. Run the 2.a_cyclEnrich*.R scripts to run a hypergeometric test between cycling genes in a tissue and modules in a tissue. These scripts also plot the results as bubble plots.
  2. Run the 2.b_cyclCompositionPlotting.R script to plot the composition of period categories and cycling genes in cycling modules.
  3. Run the 3.a_cellTypeMRNA.R script to approximate cell-types for mRNA cycling modules.
  4. Run the 3.b_mRNAmiRNAassoc.R script to find the correlation between mRNA-miRNA eigengenes.
  5. Run the 4.a_cytoscape.R to prepare files necessary for plotting modules as a graph-network w/ cytoscape.

Pathway enrichment analysis

  1. Navigate to the ./9_enrichment directory.
  2. To perform pathway analysis of Metacycle and RAIN results, navigate to the rhythmicity directory. Run 1_runGprof.sh to query G: profiler and plot results. Run 2_compareTissues.R to compare tissues and plot results.
  3. To analyze DynOmics and WGCNA results, repeat the above steps, except with the dynOmics and wgcna folders.
  4. Navigate to the compareGwasChrono directory and run 1_gprofGwas.R to perform pathway analysis on genes that are both cycling and previously linked to human chronotype susceptibility.

Done!

About

Data analysis for the manuscript, "Site- and cell-type-specific miRNA and mRNA genes and networks across the cortex, striatum, and hypothalamus", Zacharias et al., Comm Biol, 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published