Thanks to visit codestin.com
Credit goes to chaochaowong.github.io

1 Getting started

The peakable package provides convenient methods for facilitating CUT&RUN and CUT&Tag-seq peaksets quality control and downstream analysis. The functionality breaks down into three parts:

  1. Peak data import/export
  2. QC tools including PCA and cosine similarity based on peak hits
  3. Finding consensus between replicates

1.1 Visit BED and GRanges

The BED file is the common format to store peak intervals. The Bioconductor package rtracklayers provides tools for importing and exporting stardardized BED+6 and BED+12 files, while plyranges offers functions specifically for MACS2 narrow peaks (read_narrowPeaks() and write_narrowpeaks()).

The standardized BED+6 columns include chromosome, start, end, name, score, and strand. MACS2 narrow and broad peaks follow this stardard format with additional columns such as signalValue, pValue, qValue, and peak (only for narrow peaks). SEACR peaks, on the other hand, are in the BED+3 format with columns chromosome, start, and end, followed by additional columns AUC, max.signal, and max.signal.region. The peakable package provides convenient tools to both costom BED formats.

In Bioconductor, the imported peaks are formatted as GRanges instances, a standard data structure for representing genomic intervals. The plyranges pacakge provides dplyr-like interface functions for arithematics on GRanges. For more information, please visit here.

library(rtracklayer)
library(plyranges)
library(dplyr)
library(ggplot2)

# if peakable not install yet: 
# devtools::install_github('chaochaowong/peakable')
library(peakable)

2 Data import/export

The peakable package provides functions for importing and exporting MACS2 and SEACR custom BED files.

While the plyranges package offers several functions for importing and exporting custom BED files, the peakable package provides additional tools to clean and organize the imported peaks.

Although the plyranges package provides a number of functions to import and export costumed bed files, the peakable package provide additional facility to tidy up the imported peaks.

2.1 MACS2 narrow and broad peaks

read_macs2_narrow() and read_macs2_broad() are wrapper functions to rtracklayers::import.bed(), facilitating the import of MACS2 narrow and broad peak files in the BED6+4 format. The imported intervals are formatted as GRanges instances.

2.1.1 narrowPeaks

Import MACS narrowPeaks and exclude non-essential chromosomes such as poorly annotated chromosomes (“chrUn_…”) and chrM:

# wrapper function of rtracklayer::import.bed()
narrow_file <- 
  system.file('extdata', 
              'chr2_Rep1_H1_CTCF_peaks.narrowPeak', 
              package='peakable')
gr <- read_macs2_narrow(narrow_file,
                        drop_chrM = TRUE,
                        keep_standard_chrom = TRUE,
                        species = 'Homo_sapiens')
gr 
## GRanges object with 5467 ranges and 6 metadata columns:
##          seqnames            ranges strand |                   name     score
##             <Rle>         <IRanges>  <Rle> |            <character> <numeric>
##      [1]     chr2       11304-12188      * | GSM3391651_Rep1_H1_C..       959
##      [2]     chr2       18466-19108      * | GSM3391651_Rep1_H1_C..       118
##      [3]     chr2     142229-142507      * | GSM3391651_Rep1_H1_C..        47
##      [4]     chr2     152119-152336      * | GSM3391651_Rep1_H1_C..        92
##      [5]     chr2     246399-246881      * | GSM3391651_Rep1_H1_C..       105
##      ...      ...               ...    ... .                    ...       ...
##   [5463]    chr21 46325309-46325711      * | GSM3391651_Rep1_H1_C..        24
##   [5464]    chr21 46423906-46424202      * | GSM3391651_Rep1_H1_C..        47
##   [5465]    chr21 46635486-46636005      * | GSM3391651_Rep1_H1_C..        69
##   [5466]    chr21 46639760-46640238      * | GSM3391651_Rep1_H1_C..       104
##   [5467]    chr21 46660714-46661481      * | GSM3391651_Rep1_H1_C..       305
##          signalValue    pValue    qValue      peak
##            <numeric> <numeric> <numeric> <integer>
##      [1]    37.31560  99.66110  95.99710       503
##      [2]     8.61128  14.36300  11.85760       384
##      [3]     5.16677   7.01562   4.76958        19
##      [4]     6.95545  11.63870   9.23085       124
##      [5]     8.03720  13.05580  10.59070       275
##      ...         ...       ...       ...       ...
##   [5463]     3.56330   4.50035   2.43234        88
##   [5464]     5.16677   7.01562   4.76958        66
##   [5465]     6.31494   9.32198   6.97675        77
##   [5466]     7.94158  12.89480  10.44870       202
##   [5467]    16.07440  33.42540  30.53640       476
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

Alternatively, plyranges::read_narrowPeaks() is also a wrapper function provided by the plyranges package to import MACS2 narrowPeaks files.

The extract_summit_macs2 function extracts the summit from the peak ranges:

# extract summit for MACS2 narrowPeaks
summit_macs2 <- extract_summit_macs2(gr)
summit_macs2
## GRanges object with 5467 ranges and 5 metadata columns:
##          seqnames    ranges strand |                   name     score
##             <Rle> <IRanges>  <Rle> |            <character> <numeric>
##      [1]     chr2     11807      * | GSM3391651_Rep1_H1_C..       959
##      [2]     chr2     18850      * | GSM3391651_Rep1_H1_C..       118
##      [3]     chr2    142248      * | GSM3391651_Rep1_H1_C..        47
##      [4]     chr2    152243      * | GSM3391651_Rep1_H1_C..        92
##      [5]     chr2    246674      * | GSM3391651_Rep1_H1_C..       105
##      ...      ...       ...    ... .                    ...       ...
##   [5463]    chr21  46325397      * | GSM3391651_Rep1_H1_C..        24
##   [5464]    chr21  46423972      * | GSM3391651_Rep1_H1_C..        47
##   [5465]    chr21  46635563      * | GSM3391651_Rep1_H1_C..        69
##   [5466]    chr21  46639962      * | GSM3391651_Rep1_H1_C..       104
##   [5467]    chr21  46661190      * | GSM3391651_Rep1_H1_C..       305
##          signalValue    pValue    qValue
##            <numeric> <numeric> <numeric>
##      [1]    37.31560  99.66110  95.99710
##      [2]     8.61128  14.36300  11.85760
##      [3]     5.16677   7.01562   4.76958
##      [4]     6.95545  11.63870   9.23085
##      [5]     8.03720  13.05580  10.59070
##      ...         ...       ...       ...
##   [5463]     3.56330   4.50035   2.43234
##   [5464]     5.16677   7.01562   4.76958
##   [5465]     6.31494   9.32198   6.97675
##   [5466]     7.94158  12.89480  10.44870
##   [5467]    16.07440  33.42540  30.53640
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths
# to export a summit GRanges to a bed file, use rtracklayer::export.bed(summit_macs, file=...)

To exporting a summit Granges to a bed file, use rtracklayer::export.bed. To exporting MACS2 narrowPeaks and preserve the format, use plyranges::write_narrowPeaks().

2.1.2 broadPeaks

read_macs2_broad() imports MACS2 broadPeaks files:

broad_file <- system.file('extdata', 
                          'chr2_Rep1_H1_H3K27me3_peaks.broadPeak',
                           package='peakable')
gr <- read_macs2_broad(broad_file,
                       drop_chrM = TRUE,
                       keep_standard_chrom = TRUE,
                       species = 'Homo_sapiens')
gr
## GRanges object with 1711 ranges and 5 metadata columns:
##          seqnames            ranges strand |                   name     score
##             <Rle>         <IRanges>  <Rle> |            <character> <numeric>
##      [1]     chr2       44716-45426      * | GSM3391653_Rep1_H1_H..        39
##      [2]     chr2     284603-291011      * | GSM3391653_Rep1_H1_H..       113
##      [3]     chr2     467135-470997      * | GSM3391653_Rep1_H1_H..       226
##      [4]     chr2     504630-509340      * | GSM3391653_Rep1_H1_H..        33
##      [5]     chr2     514199-517591      * | GSM3391653_Rep1_H1_H..        28
##      ...      ...               ...    ... .                    ...       ...
##   [1707]    chr21 45644352-45645933      * | GSM3391653_Rep1_H1_H..        18
##   [1708]    chr21 45759617-45759948      * | GSM3391653_Rep1_H1_H..        12
##   [1709]    chr21 45972489-45975189      * | GSM3391653_Rep1_H1_H..        29
##   [1710]    chr21 45982156-45982536      * | GSM3391653_Rep1_H1_H..        19
##   [1711]    chr21 46667096-46668833      * | GSM3391653_Rep1_H1_H..        56
##          signalValue    pValue    qValue
##            <numeric> <numeric> <numeric>
##      [1]     4.51428   5.91375   3.90930
##      [2]     7.39375  13.49210  11.36260
##      [3]    11.06400  24.87600  22.64710
##      [4]     4.10550   5.37188   3.38826
##      [5]     4.06385   4.84263   2.86215
##      ...         ...       ...       ...
##   [1707]     3.04040   3.78996   1.85959
##   [1708]     3.17063   3.12584   1.20869
##   [1709]     4.06586   4.97490   2.99319
##   [1710]     3.60220   3.86604   1.91452
##   [1711]     5.11228   7.73635   5.69934
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

Need to work on peakable::write_brodPeaks().

2.2 SEACR

The peakable package provides functions, such as read_seacr(), extract_summit_searcr(), and write_seacr(), for importing and exporting SEACR-specific peak files.

2.2.1 Import

The metadata columns of SEACR peaks include “AUC”, “max.signal”, and “max.signal.region” corresponding to peak area under curve, maxinum signal within the peak, and the summit region of the peak.

seacr_file <- system.file('extdata',
                          'chr2_Rep1_H1_CTCF.stringent.bed',
                          package='peakable')
seacr_gr <- read_seacr(seacr_file,
                 drop_chrM = TRUE,
                 keep_standard_chrom = TRUE,
                 species = 'Homo_sapiens')
seacr_gr
## GRanges object with 3095 ranges and 3 metadata columns:
##          seqnames            ranges strand |       AUC max.signal
##             <Rle>         <IRanges>  <Rle> | <numeric>  <numeric>
##      [1]     chr2       11295-12358      * |   5313.70   13.04530
##      [2]     chr2     263941-265688      * |   1807.99    3.66898
##      [3]     chr2     500734-501990      * |   1489.81    3.05748
##      [4]     chr2     528753-530212      * |   2486.34    6.11496
##      [5]     chr2     636663-637657      * |   1635.96    3.87281
##      ...      ...               ...    ... .       ...        ...
##   [3091]    chr21 45879950-45881616      * |   1957.60    2.85365
##   [3092]    chr21 45970260-45971678      * |   1585.81    4.68814
##   [3093]    chr21 45972936-45974986      * |   1703.22    1.83449
##   [3094]    chr21 46228266-46229804      * |   2040.97    2.64982
##   [3095]    chr21 46660612-46661533      * |   2540.77    5.50347
##               max.signal.region
##                     <character>
##      [1]       chr2:11802-11811
##      [2]     chr2:264788-264851
##      [3]     chr2:501385-501389
##      [4]     chr2:529386-529390
##      [5]     chr2:637234-637239
##      ...                    ...
##   [3091] chr21:45880606-45880..
##   [3092] chr21:45970732-45970..
##   [3093] chr21:45973768-45973..
##   [3094] chr21:46228746-46228..
##   [3095] chr21:46661173-46661..
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

The extract_summit_seacr() function extracts summit region from the SEACR peaks:

summit_seacr <- extract_summit_seacr(seacr_gr)
summit_seacr
## GRanges object with 3095 ranges and 4 metadata columns:
##          seqnames            ranges strand |          name       AUC max.signal
##             <Rle>         <IRanges>  <Rle> |   <character> <numeric>  <numeric>
##      [1]     chr2       11802-11811      * |    peakname_1   5313.70   13.04530
##      [2]     chr2     264788-264851      * |    peakname_2   1807.99    3.66898
##      [3]     chr2     501385-501389      * |    peakname_3   1489.81    3.05748
##      [4]     chr2     529386-529390      * |    peakname_4   2486.34    6.11496
##      [5]     chr2     637234-637239      * |    peakname_5   1635.96    3.87281
##      ...      ...               ...    ... .           ...       ...        ...
##   [3091]    chr21 45880606-45880843      * | peakname_3091   1957.60    2.85365
##   [3092]    chr21 45970732-45970740      * | peakname_3092   1585.81    4.68814
##   [3093]    chr21 45973768-45973855      * | peakname_3093   1703.22    1.83449
##   [3094]    chr21 46228746-46228762      * | peakname_3094   2040.97    2.64982
##   [3095]    chr21 46661173-46661272      * | peakname_3095   2540.77    5.50347
##              itemRgb
##          <character>
##      [1]     #0000FF
##      [2]     #0000FF
##      [3]     #0000FF
##      [4]     #0000FF
##      [5]     #0000FF
##      ...         ...
##   [3091]     #0000FF
##   [3092]     #0000FF
##   [3093]     #0000FF
##   [3094]     #0000FF
##   [3095]     #0000FF
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

2.2.2 Export

The write_seacr() function exports the SEACR peaks in SEACR-specific bed format:

write_seacr(seacr_gr, file=file.path('./seacr_peaks.bed'))

3 Peakset QC

The peakable package provides two innovative methods for evaluating the correlation of peak ranges across samples of different antibodies and the similarity between biological repliates.

Both methods involve constructing a peak-hit matix from a collection of peaksets across samples of interest:

Concepts:

  1. Constructing a collection of the peaksets of intesrest \(C\): Suppose sample \(s\) has a peakset \(P_s=\{p_1, p_2, ..., p_l\}\), and there are \(m\) peaksets of insterests, \(P=\{P_1, P_2, ..., P_s, ..., P_m\}\). The collection of peaksets \(C\) is formed by cosolidating and merging nearby ranges in \(P\), resulting in a set of \(n\) distinct peak ranges.
  2. Building a binary overlap vector for each sample peakset: For \(P_s\), building an overlap vector of size \(1 \times n\) with \(i\)th entry = 1 if any peak range in \(P_s\) hits the \(i\)th range in \(C\), or 0 otherwise.
  3. Building a \(n \times m\) peak-hit matrix: Cancatenating \(m\) binary vectors column by column to form an \(n-by-m\) peak-hit matrix, where each column corresponds to a sample and each row represents a peak range in \(C\).
Peak-hit matrix

Figure 1: Peak-hit matrix

3.1 Peak-based PCA

Once the peak-hit matrix is constructed, we can apply PCA to observe the correlation across peaksets of samples of interest. First, let’s contrust a data.frame of sample and peak bed files information:

# construct a data.frame of sample information
narrow_pattern = '\\_peaks.narrowPeak$'
sample_info <- data.frame(
  bed_file = list.files(
    system.file('extdata', package = 'peakable'),
    full.names = TRUE, pattern=narrow_pattern)) %>%
  dplyr::mutate(sample_id = 
                  stringr::str_replace(basename(bed_file),
                                       narrow_pattern, ''))   %>%
  dplyr::mutate(antibody = 
                  stringr::str_split(sample_id, '_', 
                                     simplify=TRUE)[, 4]) %>%
  dplyr::relocate(sample_id, antibody, .before='bed_file')

sample_info %>%
  kable(caption='sample information') %>%
  kableExtra::kable_styling('striped')
Table 1: Table 2: sample information
sample_id antibody bed_file
chr2_Rep1_H1_CTCF CTCF /Users/cwo11/Library/R/arm64/4.4/library/peakable/extdata/chr2_Rep1_H1_CTCF_peaks.narrowPeak
chr2_Rep1_H1_H3K4me3 H3K4me3 /Users/cwo11/Library/R/arm64/4.4/library/peakable/extdata/chr2_Rep1_H1_H3K4me3_peaks.narrowPeak
chr2_Rep2_H1_CTCF CTCF /Users/cwo11/Library/R/arm64/4.4/library/peakable/extdata/chr2_Rep2_H1_CTCF_peaks.narrowPeak
chr2_Rep2_H1_H3K4me3 H3K4me3 /Users/cwo11/Library/R/arm64/4.4/library/peakable/extdata/chr2_Rep2_H1_H3K4me3_peaks.narrowPeak

Second, import the peak files as a list of GRanges:

# import bed files
grl <- lapply(sample_info$bed_file, read_macs2_narrow, 
              drop_chrM = TRUE,
              keep_standard_chrom = TRUE,
              species = 'Homo_sapiens')
names(grl) <- sample_info$sample_id

Finally, consolidate the peaks to make the peak-hit matrix (consolidate_peak_hits()):

# construct hit matrix and calculate hit PCA
hits_mat <- peakable::consolidated_peak_hits(grl)
pcs <- peakable:::.getPCA(hits_mat, sample_info, n_pcs=2)
ggplot(pcs, aes(x=PC1, y=PC2, color=antibody)) +
  geom_point() + theme_minimal()

3.2 Cosine similarity

The cos_similarity() constructs a peak-hits-based matrix of two sets of peaks, with column \(u\) and \(v\) and apply cos similarity between two binary vectors \(\vec{u}\) and \(\vec{v}\), i.e., \(\frac{\vec{u} \cdot \vec{v}}{\|\vec{u}\|_2 \|\vec{v}\|_2}\).

# CTCF replicates
ctcf_cos_sim <- cos_similarity(gr_x = grl[["chr2_Rep1_H1_CTCF"]],
                               gr_y = grl[["chr2_Rep2_H1_CTCF"]])

# H3K4me3 replicates
k4me3_cos_sim <- cos_similarity(gr_x = grl[["chr2_Rep1_H1_H3K4me3"]],
                                gr_y = grl[["chr2_Rep2_H1_H3K4me3"]])
# visualize by ggplot
data.frame(antibody = c('CTCF', 'H3K4me3'),
           cos_sim = c(ctcf_cos_sim, k4me3_cos_sim)) %>%
  ggplot(aes(x=cos_sim, y=antibody)) +
    geom_point() +
    geom_segment(aes(x=0, y=antibody, xend=cos_sim, 
                     yend=antibody), color='grey50') +
    theme_light() +
    labs(title='MACS2 narrow peakets: cos similarity between replicates')
cos similarity of peak hits between replicates

Figure 2: cos similarity of peak hits between replicates

4 Finding consensus

The plyranges package provides many tools, such as find_overlaps, filter_by_overlaps, to find the overlaps between two peaksets (GRanges) while preserving the metadata columns of the peaks. For example:

x <- grl[['chr2_Rep1_H1_CTCF']]
y <- grl[['chr2_Rep2_H1_CTCF']]
x %>%
  plyranges::filter_by_overlaps(y, minoverlap = 40L)
## GRanges object with 4342 ranges and 6 metadata columns:
##          seqnames            ranges strand |                   name     score
##             <Rle>         <IRanges>  <Rle> |            <character> <numeric>
##      [1]     chr2       11304-12188      * | GSM3391651_Rep1_H1_C..       959
##      [2]     chr2       18466-19108      * | GSM3391651_Rep1_H1_C..       118
##      [3]     chr2     152119-152336      * | GSM3391651_Rep1_H1_C..        92
##      [4]     chr2     246399-246881      * | GSM3391651_Rep1_H1_C..       105
##      [5]     chr2     264623-265132      * | GSM3391651_Rep1_H1_C..       172
##      ...      ...               ...    ... .                    ...       ...
##   [4338]    chr21 46098165-46098365      * | GSM3391651_Rep1_H1_C..        37
##   [4339]    chr21 46225127-46225475      * | GSM3391651_Rep1_H1_C..       105
##   [4340]    chr21 46228502-46229535      * | GSM3391651_Rep1_H1_C..       105
##   [4341]    chr21 46635486-46636005      * | GSM3391651_Rep1_H1_C..        69
##   [4342]    chr21 46660714-46661481      * | GSM3391651_Rep1_H1_C..       305
##          signalValue    pValue    qValue      peak
##            <numeric> <numeric> <numeric> <integer>
##      [1]    37.31560   99.6611  95.99710       503
##      [2]     8.61128   14.3630  11.85760       384
##      [3]     6.95545   11.6387   9.23085       124
##      [4]     8.03720   13.0558  10.59070       275
##      [5]    10.90760   19.8543  17.22070       171
##      ...         ...       ...       ...       ...
##   [4338]     4.59268   5.92793   3.73807        99
##   [4339]     8.03720  13.05580  10.59070       164
##   [4340]     8.03720  13.05580  10.59070       253
##   [4341]     6.31494   9.32198   6.97675        77
##   [4342]    16.07440  33.42540  30.53640       476
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

4.1 Consensus for MACS2

find_consensus_macs2()

consensus <- find_consensus_macs2(x, y, minoverlap = 40L)
consensus
## GRanges object with 4342 ranges and 6 metadata columns:
##          seqnames            ranges strand |                   name     score
##             <Rle>         <IRanges>  <Rle> |            <character> <numeric>
##      [1]     chr2       11304-12188      * | GSM3391651_Rep1_H1_C..       959
##      [2]     chr2       18466-19108      * | GSM3391651_Rep1_H1_C..       118
##      [3]     chr2     152119-152336      * | GSM3391651_Rep1_H1_C..        92
##      [4]     chr2     246399-246881      * | GSM3391651_Rep1_H1_C..       105
##      [5]     chr2     264623-265132      * | GSM3391651_Rep1_H1_C..       172
##      ...      ...               ...    ... .                    ...       ...
##   [4338]    chr21 46098165-46098365      * | GSM3391651_Rep1_H1_C..        37
##   [4339]    chr21 46225127-46225475      * | GSM3391651_Rep1_H1_C..       105
##   [4340]    chr21 46228502-46229535      * | GSM3391651_Rep1_H1_C..       105
##   [4341]    chr21 46635486-46636005      * | GSM3391651_Rep1_H1_C..        69
##   [4342]    chr21 46660714-46661481      * | GSM3391651_Rep1_H1_C..       305
##          signalValue    pValue    qValue      peak
##            <numeric> <numeric> <numeric> <integer>
##      [1]    37.31560   99.6611  95.99710       503
##      [2]     8.61128   14.3630  11.85760       384
##      [3]     6.95545   11.6387   9.23085       124
##      [4]     8.03720   13.0558  10.59070       275
##      [5]    10.90760   19.8543  17.22070       171
##      ...         ...       ...       ...       ...
##   [4338]     4.59268   5.92793   3.73807        99
##   [4339]     8.03720  13.05580  10.59070       164
##   [4340]     8.03720  13.05580  10.59070       253
##   [4341]     6.31494   9.32198   6.97675        77
##   [4342]    16.07440  33.42540  30.53640       476
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

Venn diagram of overlaps

find_overlaps_venn(x, y, 
                   label_x = 'chr2_Rep1_H1_CTCF',
                   label_y = 'chr2_Rep2_H1_CTCF',
                   minoverlap = 40L)

4.2 consensus_by

consensus_by: take advantage of sample_info and robustly get the consensus between replicates

# consensus_by() group the grl by 'antibody' and returns a data.frame and a list of
# two consensus ranges in GRanges instances
consensus <- 
  peakable:::consensus_by(sample_info, 
                          peaks_grl = grl,
                          consensus_group_by = 'antibody',
                          peak_caller = 'macs2')
consensus$df
##   sample_id number_of_peaks antibody
## 1      CTCF            4304     CTCF
## 2   H3K4me3            2612  H3K4me3
head(consensus$grl[['CTCF']])
## GRanges object with 6 ranges and 6 metadata columns:
##       seqnames        ranges strand |                   name     score
##          <Rle>     <IRanges>  <Rle> |            <character> <numeric>
##   [1]     chr2   11304-12188      * | GSM3391651_Rep1_H1_C..       959
##   [2]     chr2   18466-19108      * | GSM3391651_Rep1_H1_C..       118
##   [3]     chr2 152119-152336      * | GSM3391651_Rep1_H1_C..        92
##   [4]     chr2 246399-246881      * | GSM3391651_Rep1_H1_C..       105
##   [5]     chr2 264623-265132      * | GSM3391651_Rep1_H1_C..       172
##   [6]     chr2 501073-501578      * | GSM3391651_Rep1_H1_C..       129
##       signalValue    pValue    qValue      peak
##         <numeric> <numeric> <numeric> <integer>
##   [1]    37.31560   99.6611  95.99710       503
##   [2]     8.61128   14.3630  11.85760       384
##   [3]     6.95545   11.6387   9.23085       124
##   [4]     8.03720   13.0558  10.59070       275
##   [5]    10.90760   19.8543  17.22070       171
##   [6]     9.07609   15.5128  12.99140       315
##   -------
##   seqinfo: 3 sequences from an unspecified genome; no seqlengths

4.3 Consensus for SEACR

find_consensus_seacr()

# wrapper of plyranges::find_overlaps(); query-based consensus peaks; carry over the metadata
seacr_file_x <- 
  system.file('extdata',
              'chr2_Rep1_H1_CTCF.stringent.bed',
              package='peaklerrr')
seacr_file_y <- 
  system.file('extdata',
              'chr2_Rep2_H1_CTCF.stringent.bed',
              package='peaklerrr')
                            
x <- peaklerrr::read_seacr(seacr_file_x)
y <- peaklerrr::read_seacr(seacr_file_y)
minoverlap <- min(min(width(x)), min(width(y))) / 2
consensus <- find_consensus_seacr(x, y, minoverlap = minoverlap)
consensus
## GRanges object with 2755 ranges and 3 metadata columns:
##                      seqnames            ranges strand |       AUC max.signal
##                         <Rle>         <IRanges>  <Rle> | <numeric>  <numeric>
##      [1]                 chr2       11295-12358      * |   5313.70   13.04530
##      [2]                 chr2     500734-501990      * |   1489.81    3.05748
##      [3]                 chr2     528753-530212      * |   1699.19    6.38286
##      [4]                 chr2     636663-637657      * |   1635.96    3.87281
##      [5]                 chr2     713765-714944      * |   2465.91    8.31705
##      ...                  ...               ...    ... .       ...        ...
##   [2751]                chr21 45970260-45971678      * |   2605.95    7.93021
##   [2752]                chr21 46228266-46229804      * |   2040.97    2.64982
##   [2753]                chr21 46660612-46661533      * |   2540.77    5.50347
##   [2754] chr21_GL383578v2_alt       21494-22983      * |   2019.77    5.09580
##   [2755] chr21_GL383580v2_alt       63758-65400      * |   3199.14    7.94945
##               max.signal.region
##                     <character>
##      [1]       chr2:11802-11811
##      [2]     chr2:501385-501389
##      [3]     chr2:529481-529546
##      [4]     chr2:637234-637239
##      [5]     chr2:714368-714391
##      ...                    ...
##   [2751] chr21:45970621-45970..
##   [2752] chr21:46228746-46228..
##   [2753] chr21:46661173-46661..
##   [2754] chr21_GL383578v2_alt..
##   [2755] chr21_GL383580v2_alt..
##   -------
##   seqinfo: 5 sequences from an unspecified genome; no seqlengths

5 Workflow

peakable:::peakle_flow()