Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bed parsing is too rigid for 12 or 20 columns and files with 13-19 or 21+ columns are ignored #73

@pickettbd

Description

@pickettbd

Currently, CrossMap (bed subcommand) treats all bed files with <12 columns (assuming it has the first 3 columns properly formatted with chrom/start/end) the same, i.e., it uses the first three columns, searches the rest for orientation (+ or -), and then copies the columns as-is to the output coordinate space by modifying the first three columns. It (reasonably) has a rigid format for for 12 column (BED12) and 20 column (genePred) files. However, for files with 13-19 or 21+ columns, it writes an empty primary output file, putting everything in the unmap file.

I suggest the following two changes: (1) enable files with 13-19 or 21+ columns to be treated like the files with <11 columns by default and (2) enable an option to treat 12 and 20 column files like the files with <11 columns. The first suggestion is important because no output is not ideal. The second suggestion is important because there are a wide variety of bed-like files out there, including files that have the same number of columns as some of the formally defined format but that do not adhere to them.

If changing the default behavior for the 13-19 or 21+ columns is undesirable for some reason, then one approach would be to add a flag (imagine something like --naive-bed-parsing) that allows any bed-like file (assuming the first 3 columns are valid) to be treated as files with <12 columns are currently, even if the file has 12 or 20 columns.

See the crossmap_bed_file function in src/cmmodule/mapbed.py, especially lines 90 and 163.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions