Tags: samtools/bcftools
Tags
bcftools release 1.23:
Changes affecting the whole of bcftools, or multiple commands:
* The `-i/-e` filtering expressions and `-f` formatting in `query`
- Add a new function `smpl_COUNT()/sCOUNT()` which returns the
number of elements (#2423)
Changes affecting specific commands:
* bcftools annotate
- Make dynamic variables read from a tab-delimited annotation
file (#2151) work also for regions. For example, while the
first command below was functional, the second was not (#2441)
bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,SCORE,~STR \
-i'TAG={STR}' -k in.vcf
bcftools annotate -a ann.tsv.gz -c CHROM,BEG,END,SCORE,~STR \
-i'TAG={STR}' -k in.vcf
* bcftools consensus
- Fix a bug which prevented reading fasta files containing empty
lines in their entirety (#2424)
- Fix a bug which causes `--absent` miss some absent positions
* bcftools csq
- Add support for complex substitutions, such as AC>TAA
* bcftools +fill-tags
- Fix header formatting error for INFO/F_MISSING which must be
Number=1 (#2442)
- Make `-t 'F_MISSING'` work with `-S groups.txt` (#2447)
* bcftools gtcheck
- The program is now able to process gVCF blocks. Newly,
monoallelic sites are excluded only when the site is
monoallelic in both query and genotype file. The new option
--keep-refs allows to always include monoallelic sites.
- Fix an error in parsing -i/-e command line options where the
`qry:` and `gt:` prefix was not stripped (#2432)
* bcftools mpileup
- Make `-d, --max-depth 0` set the depth to unlimited (#2435)
* bcftools norm
- Make the -i/-e filtering option work for all options, such as
line merging and duplication removal (#2415)
* bcftools query
- Numerical functions, such as SUM(INFO/DP), would previously
return the value 0 when executed on missing values. This was
incorrect, newly a missing value is printed.
* bcftools reheader
- Add options `--samples-list` and `--samples-file` to allow
renaming samples from a list of samples on command line,
rather than from a file of sample names (#2383)
* bcftools +split-vep
- Fix the option `-A, --all-fields`, it was not working properly
and could lead to a segfault (#2473)
bcftools release 1.22: Changes affecting the whole of bcftools, or multiple commands: * Add support for matching lines by ID via the --pair-logic and --collapse options (#1739) * The -i/-e filtering expressions - The expressions now properly match the regex negation of missing values, e.g. -i 'TAG!~"\."' (#2355) - Added support for Fisher's exact test * Add the option `-v, --verbosity INT` to all bcftools commands and plugins. Verbosity values bigger than 3 are passed to the underlying HTSlib library so that the user can investigate network issues and other problems occurring at the library level. Changes affecting specific commands: * bcftools annotate - Fix Number in the header definition of transferred FILTER and ID tags (#2335) * bcftools call - The `-s, --samples` option was not working properly, now also supporting sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2` to include all samples but sample1 and sample2 (#2380) * bcftools consensus - Preserve entire missing gVCF blocks with --missing (#2350) - Fixed a bug, the `-S, --samples-file` option is no longer ignored (#2398) * bcftools convert - The command `convert --gvcf2vcf` was not filling the REF allele when BCF was output (#243) * bcftools csq - Check the input GFF for features outside transcript boundaries and extend the transcript to contain the feature fully (#2323) - Add experimental support for alternative genetic code tables, accessible via a new option `-C, --genetic-code` (#2368) - Change in the `--unify-chr-names` option, no automatic sequence name modification is attempted anymore, the prefixes to trim must be given explictly. For example, if run with `--unify-chr-names chr,Chromosome,-`, the program will trim the "chr" prefix in the VCF, "Chromosome" in the GFF, leaving the fasta unchanged (#2378) * bcftools +fill-tags - Thanks to the extension of filtering expressions with Fisher's exact test, the plugin can now be used to add FT annotation (#1582) * bcftools merge - Preserve phasing in half-missing genotypes (#2331) - The option `--merge none` is expected to create no new multiallelic sites, but it should allow to merge, say, A>C with A>C,AT (#2333) - Make `--merge both` work with indel-only records; for example, the multiallelic site G>GT,T should be merged with G>GT (#2339) - Do not merge symbolic alleles unless they have not just the same type, eg. <DEL>, but also length, i.e the INFO/END coordinate (#2362) - Fix a bug where an incorrectly formatted gVCF file with overlapping blocks would trigger an infinite loop in the program (#2410) * bcftools mpileup - The -r/-R option newly merge overlapping regions, preventing the output of duplicate sites * bcftools norm - Print the number of removed duplicate sites in the final statistics (#2346) - Preserve the original alleles in `--old-rec-tag` when `--check-ref s` requested (#2357) - Print a warning when INFO/SVLEN is not defined as Number=A (#2371) * plot-vcfstats - Make the option `-s, --sample-names` functional again (#2353) * bcftools +prune - New option to remove or annotate clusters of sites within a window * bcftools query - The functions used in -i/-e filtering expressions (such as SUM, MEDIAN, etc) can be now used in formatting expressions (#2271). If the VCF contains INFO/AD and FORMAT/AD, try: bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %sSUM(FMT/AD)]' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %SUM(FMT/AD)]' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(FMT/AD)' bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(INFO/AD)' - Make it possible to refer to the ID column from the FORMAT expression (#2337) bcftools query test.vcf -f 'ID=%ID ID=[ %/ID] vs FMT_ID=[ %ID]' * bcftools roh - New visualization tool misc/roh-viz, see below * bcftools +setGT - Support for setting missing genotypes with arbitrary ploidy via `-n c:./.` (#2303) * bcftools +split-vep - The `-s, --select` option was extended to print only one consequence. Previously it was possible to select a single transcript (e.g., the one with the worst consequence), and it was possible to filter by consequence severity (e.g., missing or worse), but in some cases multiple consequences are reported within a single transcript (e.g., start_lost&splice_region). The extended option allows to print the worst part, for example as --select primary:missense+:worst * bcftools +trio-dnm2 - Fix a problem with --strictly-novel option which would neglect the presence of the apparent de novo allele in the father for male offspring - Fix a problem with uncalled mosaic chrX variants in males * roh-viz - HTML/JavaScript visualization of bcftools/roh output and homozygosity rate. * bcftools +vrfs - New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles) from a large number of unaffected parental samples
bcftools release 1.21: Changes affecting the whole of bcftools, or multiple commands: * Support multiple semicolon-separated strings when filtering by ID using -i/-e (#2190). For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456` * The filtering expression ILEN can be positive (insertion), negative (deletion), zero (balanced substitutions), or set to missing value (symbolic alleles). * bcftools query * bcftools +split-vep - The columns indices printed by default with `-H` (e.g., "#[1]CHROM") can be now suppressed by giving the option twice `-HH` (#2152) Changes affecting specific commands: * bcftools annotate - Support dynamic variables read from a tab-delimited annotation file (#2151). For example, in the two cases below the field 'STR' from the -a file is required to match the INFO/TAG in VCF. In the first example the alleles REF,ALT must match, in the second example they are ignored. The option -k is required to output also records that were not annotated: bcftools annotate -a ann.tsv.gz \ -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf bcftools annotate -a ann.tsv.gz \ -c CHROM,POS,-,-,SCORE,~STR -i'TAG={STR}' -k in.vcf - When adding Type=String annotations from a tab-delimited file, encode characters with special meaning using percent encoding (';', '=' in INFO and ':' in FORMAT) (#2202) * bcftools consensus - Allow to apply a reference allele which overlaps a previous deletion, there is no need to complain about overlapping alleles in such case - Fix a bug which required `-s -` to be present even when there were no samples in the VCF (#2260) * bcftools csq - Fix a rare bug where indel combined with a substitution ending at exon boundary is incorrectly predicted to have 'inframe' rather than 'frameshift' consequence (#2212) * bcftools gtcheck - Fix a segfault with --no-HWE-prob. The bug was introduced with the output format change in 1.19 which replaced the DC section with DCv2 (#2180) - The number of matching genotypes in the DCv2 output was not calculated correctly with non-zero `-E, --error-probability`. Consequently, also the average HWE score was incorrect. The main output, the discordance score, was not affected by the bug * bcftools +mendelian2 - Include the number of good cases where at least one of the trio genotypes has an alternate allele (#2204) - Fix the error message which would report the wrong sample when non-existent sample is given. Note that bug only affected the error message, the program otherwise assigns the family members correctly (#2242) * bcftools merge - Fix a severe bug in merging of FORMAT fields with Number=R and Number=A values. For example, rows with high-coverage FORMAT/AD values (bigger or equal to 128) could have been assigned to incorrect samples. The bug was introduced in version 1.19. For details see #2244. * bcftools mpileup - Return non-zero error code when the input BAM/CRAM file is truncated (#2177) - Add FORMAT/AD annotation by default, disable with `-a -AD` * bcftools norm - Support realignment of symbolic <DUP.*> alleles, similarly to <DEL.*> added previously (#1919,#2145) - Fix in reporting reference allele genotypes with `--multi-overlaps .` (#2160) - Support of duplicate removal of symbolic alleles of the same type but different SVLEN (#2182) - New `-S, --sort` switch to optionally sort output records by allele (#1484) - Add the `-i/-e` filtering options to select records for normalization. Note duplicate removal ignores this option. - Fix a bug where `--atomize` would not fill GT alleles for atomized SNVs followed by an indel (#2239) * bcftools +remove-overlaps - Revamp the program to allow greater flexibility, with the following new options: -M, --mark-tag TAG Mark -m sites with INFO/TAG -m, --mark EXPR Mark (if also -M is present) or remove sites [overlap] dup .. all overlapping sites overlap .. overlapping sites min(QUAL) .. mark sites with lowest QUAL until overlaps are resolved --missing EXPR Value to use for missing tags with -m 'min(QUAL)' 0 .. the default DP .. heuristics, scale maximum QUAL value proportionally to INFO/DP --reverse Apply the reverse logic, for example preserve duplicates instead of removing -O, --output-type t t: plain list of sites (chr,pos), tz: compressed list * bcftools +tag2tag - The conversions --LXX-to-XX, --XX-to-LXX were working but specific cases such as --LAD-to-AD were not. - Print more informative error message when source tag type violiates VCF specification * bcftools +trio-dnm2 - Better handling of the --strictly-novel functionality, especically with respect to chrX inheritance
bcftools release 1.20:
Changes affecting the whole of bcftools, or multiple commands:
* Add short option -W for --write-index. The option now accepts an
optional parameter which allows to choose between TBI and CSI index
format.
Changes affecting specific commands:
* bcftools consensus
- Add new --regions-overlap option which allows to take into
account overlapping deletions that start out of the fasta file
target region.
* bcftools isec
- Add new option `-l, --file-list` to read the list of file names
from a file
* bcftools merge
- Add new option `--force-single` to support single-file edge
case (#2100)
* bcftools mpileup
- Add new option --indels-cns for an alternative indel calling
model, which should increase the speed on long read data
(thanks to using edlib) and the precision (thanks to a number
of heuristics).
* bcftools norm
- Change the order of atomization and multiallelic splitting
(when both -a,-m are given) from "atomize first, then split"
to "split first, then atomize". This usually results in a
simpler VCF representation. The previous behaviour can be
achieved by explicitly streaming the output of the --atomize
command into the --multiallelics splitting command.
- Fix Type=String multiallelic splitting for Number=A,R,G tags
with incorrect number of values.
- Merging into multiallelic sites with `bcftools norm -m +indels`
did not work. This is now fixed and the merging is now more
strict about variant types, for example complex events, such as
AC>TGA, are not considered as indels anymore (#2084)
* bcftools reheader
- Allow reading the input file from a stream with --fai (#2088)
* bcftools +setGT
- Support for custom genotypes based on the allele with higher
depth, such as `--new-gt c:0/X` custom genotypes (#2065)
* bcftools +split-vep
- When only one of the tags is present, automatically choose
INFO/BCSQ (the default tag name produced by `bcftools csq`)
or INFO/CSQ (produced by VEP). When both tags are present,
use the default INFO/CSQ.
- Transcript selection by MANE, PICK, and user-defined
transcripts, for example
--select CANONICAL=YES
--select MANE_SELECT!=""
--select PolyPhen~probably_damaging
- Select all matching transcripts via --select, not just one
- Change automatic type parsing of VEP fields DNA_position,
CDS_position, and Protein_position from Integer to String,
as it can be of the form "8586-8599/9231". The type Integer
can be still enforced with
`-c cDNA_position:int,CDS_position:int,Protein_position:int`.
- Recognize `-c field:str`, not just `-c field:string`, as
advertised in the usage page
- Fix a bug which made filtering expression containing missing
values crash (#2098)
* bcftools stats
- When GT is missing but AD is present, the program determines
the alternate allele from AD. However, if the AD tag has
incorrect number of values, the program would exit with an
error printing "Requested allele outside valid range". This
is now fixed by taking into account the actual number of ALT
alleles.
* bcftools +tag2tag
- Support for conversion from tags using localized alleles (e.g.
LPL, LAD) to the family of standard tags (PL, AD)
* bcftools +trio-dnm2
- Extend --strictly-novel to exclude cases where the
non-Mendelian allele is the reference allele. The change is
motivated by the observation that this class of variants is
enriched for errors (especially for indels), and better
corresponds with the option name.
bcftools release 1.19: Changes affecting the whole of bcftools, or multiple commands: * Filtering expressions can be given a file with list of strings to match, this was previously possible only for the ID column. For example ID=@file .. selects lines with ID present in the file INFO/[email protected] .. selects lines where TAG has a string value listed in the file INFO/[email protected] .. TAG must not have a string value listed in the file * Allow to query REF,ALT columns directly, for example -e 'REF="N"' Changes affecting specific commands: * bcftools annotate - Fix `bcftools annotate --mark-sites`, VCF sites overlapping regions in a BED file were not annotated (#1989) - Add flexibility to FILTER column transfers and allow transfers within the same file, across files, and in combination. For examples see http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info * bcftools call - Output MIN_DP rather than MinDP in gVCF mode - New `-*, --keep-unseen-allele` option to output the unobserved allele <*>, intended for gVCF. * bcftools head - New `-s, --samples` option to include the #CHROM header line with samples. * bcftools gtcheck - Add output options `-o, --output` and `-O, --output-type` - Add filtering options `-i, --include` and `-e, --exclude` - Rename the short option `-e, --error-probability` from lower case to upper case `-E, --error-probability` - Changes to the output format, replace the DC section with DCv2: - adds a new column for the number of matching genotypes - The --error-probability is newly interpreted as the probability of erroneous allele rather than genotype. In other words, the calculation of the discordance score now considers the probability of genotyping error to be different for HOM and HET genotypes, i.e. P(0/1|dsg=0) > P(1/1|dsg=0). - fixes in HWE score calculation plus output average HWE score rather than absolute HWE score - better description of fields * bcftools merge - Add `-m` modifiers to suppress the output of the unseen allele <*> or <NON_REF> at variant sites (e.g. `-m both,*`) or all sites (e.g. `-m both,**`) * bcftools mpileup - Output MIN_DP rather than MinDP in gVCF mode * bcftools norm - Add the number of joined lines to the summary output, for example Lines total/split/joined/realigned/skipped: 6/0/3/0/0 - Allow combining -m and -a with --old-rec-tag (#2020) - Symbolic <DEL> alleles caused norm to expand REF to the full length of the deletion. This was not intended and problematic for long deletions, the REF allele should list one base only (#2029) * bcftools query - Add new `-N, --disable-automatic-newline` option for pre-1.18 query formatting behavior when newline would not be added when missing - Make the automatic addition of the newline character in a more predictable way and, when missing, always put it at the end of the expression. In version 1.18 it could be added at the end of the expression (for per-site expressions) or inside the square brackets (for per-sample expressions). The new behavior is: - if the formatting expression contains a newline character, do nothing - if there is no newline character and -N, --disable-automatic-newline is given, do nothing - if there is no newline character and -N is not given, insert newline at the end of the expression See #1969 for details - Add new `-F, --print-filtered` option to output a default string for samples that would otherwise be filtered by `-i/-e` expressions. - Include sample name in the output header with `-H` whenever it makes sense (#1992) * bcftools +spit-vep - Fix on the fly filtering involving numeric subfields, e.g. `-i 'MAX_AF<0.001'` (#2039) - Interpret default column type names (--columns-types) as entire strings, rather than substrings to avoid unexpected spurious matches (i.e. internally add ^ and $ to all field names) * bcftools +trio-dnm2 - Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM. * bcftools view - Add new `-A, --trim-unseen-allele` option to remove the unseen allele <*> or <NON_REF> at variant sites (`-A`) or all sites (`-AA`)
bcftools release 1.18:
Changes affecting the whole of bcftools, or multiple commands:
* Support auto indexing during writing BCF and VCF.gz via new
`--write-index` option
Changes affecting specific commands:
* bcftools annotate
- The `-m, --mark-sites` option can be now used to mark all sites
without the need to provide the `-a` file (#1861)
- Fix a bug where the `-m` function did not respect the
`--min-overlap` option (#1869)
- Fix a bug when update of INFO/END results in assertion error
(#1957)
* bcftools concat
- New option `--drop-genotypes`
* bcftools consensus
- Support higher-ploidy genotypes with `-H, --haplotype` (#1892)
- Allow `--mark-ins` and `--mark-snv` with a character, similarly
to `--mark-del`
* bcftools convert
- Support for conversion from tab-delimited files
(CHROM,POS,REF,ALT) to sites-only VCFs
* bcftools csq
- New `--unify-chr-names` option to automatically unify different
chromosome naming conventions in the input GFF, fasta and VCF
files (e.g. "chrX" vs "X")
- More versatility in parsing various flavors of GFF
- A new `--dump-gff` option to help with debugging and
investigating the internals of hGFF parsing
- When printing consequences in nonsense mediated decay
transcripts, include 'NMD_transcript' in the consequence part
of the annotation. This is to make filtering easier and
analogous to VEP annotations. For example the consequence
annotation 3_prime_utr|PCGF3|ENST00000430644|NMD is newly
printed as 3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD
* bcftools gtcheck
- Add stats for the number of sites matched in the GT-vs-GT,
GT-vs-PL, etc modes. This information is important for
interpretation of the discordance score, as only the
GT-vs-GT matching can be interpreted as the number of
mismatching genotypes.
* bcftools +mendelian2
- Fix in command line argument parsing, the `-p` and `-P` options
were not functioning (#1906)
* bcftools merge
- New `-M, --missing-rules` option to control the behavior of
merging of vector tags to prevent mixtures of known and missing
values in tags when desired
- Use values pertaining to the unknown allele (<*> or <NON_REF>)
when available to prevent mixtures of known and missing values
(#1888)
- Revamped line matching code to fix problems in gVCF merging
where split gVCF blocks would not update genotypes (#1891,
#1164).
* bcftool mpileup
- Fix a bug in --indels-v2.0 which caused an endless loop when
CIGAR operator 'H' or 'P' was encountered
* bcftools norm
- The `-m, --multiallelics +` mode now preserves phasing (#1893)
- Symbolic <DEL.*> alleles are now normalized too (#1919)
- New `-g, --gff-annot` option to right-align indels in forward
transcripts to follow HGVS 3'rule (#1929)
* bcftools query
- Force newline character in formatting expression when not given
explicitly
- Fix `-H` header output in formatting expressions containing
newlines
* bcftools reheader
- Make `-f, --fai` aware of long contigs not representable by
32-bit integer (#1959)
* bcftools +split-vep
- Prevent a segfault when `-i/-e` use a VEP subfield not included
in `-f` or `-c` (#1877)
- New `-X, --keep-sites` option complementing the existing `-x,
--drop-sites` options
- Force newline character in formatting expression when not given
explicitly
- Fix a subtle ambiguity: identical rows must be returned when
`-s` is applied regardless of `-f` containing the `-a` VEP tag
itself or not.
* bcftools stats
- Collect new VAF (variant allele frequency) statistics from
FORMAT/AD field
- When counting transitions/transversions, consider also
alternate het genotypes
* plot-vcfstats
- Add three new VAF plots
bcftools release 1.17:
Changes affecting the whole of bcftools, or multiple commands:
* The -i/-e filtering expressions
- Error checks were added to prevent incorrect use of vector
arithmetics. For example, when evaluating the sum of two
vectors A and B, the resulting vector could contain nonsense
values when the input vectors were not of the same length.
The fix introduces the following logic:
- evaluate to C_i = A_i + B_i when length(A)==B(A) and set
length(C)=length(A)
- evaluate to C_i = A_i + B_0 when length(B)=1 and set
length(C)=length(A)
- evaluate to C_i = A_0 + B_i when length(A)=1 and set
length(C)=length(B)
- throw an error when length(A)!=length(B) AND length(A)!=1
AND length(B)!=1
- Arrays in Number=R tags can be now subscripted by alleles found
in FORMAT/GT. For example,
FORMAT/AD[GT] > 10 .. require support of more than 10 reads for
each allele
FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample
sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20
* The commands `consensus -H` and `+split-vep -H`
- Drop unnecessary leading space in the first header column
and newly print `#[1]columnName` instead of the previous
`# [1]columnName` (#1856)
Changes affecting specific commands:
* bcftools +allele-length
- Fix overflow for indels longer than 512bp and aggregate alleles
equal or larger than that in the same bin (#1837)
* bcftools annotate
- Support sample reordering of annotation file (#1785)
- Restore lost functionality of the --pair-logic option (#1808)
* bcftools call
- Fix a bug where too many alleles passed to `-C alleles` via
`-T` caused memory corruption (#1790)
- Fix a bug where indels constrained with `-C alleles -T` would
sometimes be missed (#1706)
* bcftools consensus
- BREAKING CHANGE: the option `-I, --iupac-codes` newly outputs
IUPAC codes based on FORMAT/GT of all samples. The `-s,
--samples` and `-S, --samples-file` options can be used to
subset samples. In order to ignore samples and consider only
the REF and ALT columns (the original behavior prior to
1.17), run with `-s -` (#1828)
* bcftools convert
- Make variantkey conversion work for sites without an ALT allele
(#1806)
* bcftool csq
- Fix a bug where a MNV with multiple consequences (e.g. missense
+ stop_gained) would report only the less severe one (#1810)
- GFF file parsing was made slightly more flexible, newly ids can
be just 'XXX' rather than, for example, 'gene:XXX'
- New gff2gff perl script to fix GFF formatting differences
* bcftools +fill-tags
- More of the available annotations are now added by the `-t all`
option
* bcftools +fixref
- New INFO/FIXREF annotation
- New -m swap mode
* bcftools +mendelian
- The +mendelian plugin has been deprecated and replaced with
+mendelian2. The function of the plugin is the same but the
command line options and the output format has changed, and for
this was introduced as a new plugin.
* bcftools mpileup
- Most of the annotations generated by mpileup are now optional
via the `-a, --annotate` option and add several new (mostly
experimental) annotations.
- New option `--indels-2.0` for an EXPERIMENTAL indel calling
model. This model aims to address some known deficiencies of
the current indel calling algorithm, specifically, it uses
diploid reference consensus sequence. Note that in the current
version it has the potential to increase sensitivity but at
the cost of decreased specificity.
- Make the FS annotation (Fisher exact test strand bias)
functional and remove it from the default annotations
* bcftools norm
- New --multi-overlaps option allows to set overlapping alleles
either to the ref allele (the current default) or to a missing
allele (#1764 and #1802)
- Fixed a bug in `-m -` which does not split missing FORMAT
values correctly and could lead to empty FORMAT fields such
as `::` instead of the correct `:.:` (#1818)
- The `--atomize` option previously would not split complex
indels such as C>GGG. Newly these will be split into two
records C>G and C>CGG (#1832)
* bcftools query
- Fix a rare bug where the printing of SAMPLE field with `query`
was incorrectly suppressed when the `-e` option contained a
sample expression while the formatting query did not. See #1783
for details.
* bcftools +setGT
- Add new `--new-gt X` option (#1800)
- Add new `--target-gt r:FLOAT` option to randomly select a
proportion of genotypes (#1850)
- Fix a bug where `-t ./x` mode was advertised as selecting both
phased and unphased half-missing genotypes, but was in fact
selecting only unphased genotypes (#1844)
* bcftools +split-vep
- New options `-g, --gene-list` and `--gene-list-fields` which
allow to prioritize consequences from a list of genes, or
restrict output to the listed genes
- New `-H, --print-header` option to print the header with `-f`
- Work around a bug in the LOFTEE VEP plugin used to annotate
gnomAD VCFs. There the LoF_info subfield contains commas
which, in general, makes it impossible to parse the VEP
subfields. The +split-vep plugin can now work with such files,
replacing the offending commas with slash (/) characters. See
also Ensembl/ensembl-vep#1351
- Newly the `-c, --columns` option can be omitted when a
subfield is used in `-i/-e` filtering expression. Note that
`-c` may still have to be given when it is not possible to
infer the type of the subfield. Note that this is an
experimental feature.
* bcftools stats
- The per-sample stats (PSC) would not be computed when `-i/-e`
filtering options and the `-s -` option were given but the
expression did not include sample columns (1835)
* bcftools +tag2tag
- Revamp of the plugin to allow wider range of tag
conversions, specifically all combinations from
FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT
* bcftools +trio-dnm2
- New `-n, --strictly-novel` option to downplay alleles which
violate Mendelian inheritance but are not novel
- Allow to set the `--pn` and `--pns` options separately for SNVs
and indels and make the indel settings more strict by default
- Output missing FORMAT/VAF values in non-trio samples, rather
than random nonsense values
* bcftools +variant-distance
- New option `-d, --direction` to choose the directionality:
forward, reverse, nearest (the default) or both (#1829)
bcftools release 1.16: * New plugin `bcftools +variant-distance` to annotate records with distance to the nearest variant (#1690) Changes affecting the whole of bcftools, or multiple commands: * The -i/-e filtering expressions - Added support for querying of multiple filters, for example `-i 'FILTER="A;B"'` can be used to select sites with two filters "A" and "B" set. See the documentation for more examples. - Added modulo arithmetic operator Changes affecting specific commands: * bcftools annotate - A bug introduced in 1.14 caused that records with INFO/END annotation would incorrectly trigger `-c ~INFO/END` mode of comparison even when not explicitly requested, which would result in not transferring the annotation from a tab-delimited file (#1733) * bcftools merge - New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately (#1704) * bcftools mpileup - New NMBZ annotation for Mann-Whitney U-z test on number of mismatches within supporting reads - Suppress the output of MQSBZ and FS annotations in absence of alternate allele * bcftools +scatter - Fix erroneous addition of duplicate PG lines * bcftools +setGT - Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy
bcftools release 1.15.1:
* bcftools annotate
- New `-H, --header-line` convenience option to pass a header
line on command line, this complements the existing `-h,
--header-lines` option which requires a file with header lines
* bcftools csq
- A list of consequence types supported by `bcftools csq` has
been added to the manual page. (#1671)
* bcftools +fill-tags
- Extend generalized functions so that FORMAT tags can be filled
as well, for example:
bcftools +fill-tags in.bcf -o out.bcf -- \
-t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'
- Allow multiple custom functions in a single run. Previously the
program would silently go with the last one, assigning the same
values to all (#1684)
* bcftools norm
- Fix an assertion failure triggered when a faulty VCF file with
a '-' character in the REF allele was used with `bcftools norm
--atomize`. This option now checks that the REF allele only
includes the allowed characters A, C, G, T and N. (#1668)
- Fix the loss of phasing in half-missing genotypes in variant
atomization (#1689)
* bcftools roh
- Fix a bug that could result in an endless loop or incorrect
AF estimate when missing genotypes are present and the
`--estimate-AF -` option was used (#1687)
* bcftools +split-vep
- VEP fields with characters disallowed in VCF tag names by the
specification (such as '-' in 'M-CAP') couldn't be queried.
This has been fixed, the program now sanitizes the field names,
replacing invalid characters with underscore (#1686)
bcftools release 1.15: * New `bcftools head` subcommand for conveniently displaying the headers of a VCF or BCF file. Without any options, this is equivalent to `bcftools view --header-only --no-version` but more succinct and memorable. * The `-T, --targets-file` option had the following bug originating in HTSlib code: when an uncompressed file with multiple columns CHR,POS,REF was provided, the REF would be interpreted as 0 gigabases (#1598) Changes affecting specific commands: * bcftools annotate - In addition to `--rename-annots`, which requires a file with name mappings, it is now possible to do the same on the command line `-c NEW_TAG:=OLD_TAG` - Add new option --min-overlap which allows to specify the minimum required overlap of intersecting regions - Allow to transfer ALT from VCF with or without replacement using: bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz * bcftools convert - Revamp of `--gensample`, `--hapsample` and `--haplegendsample` family of options which includes the following changes: - New `--3N6` option to output/input the new version of the .gen file format, see https://www.cog-genomics.org/plink/2.0/formats#gen - Deprecate the `--chrom` option in favor of `--3N6`. A simple `cut` command can be used to convert from the new 3*M+6 column format to the format printed with `--chrom` (`cut -d' ' -f1,3-`). - The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are required and must appear either in the "SNP ID" column or the "rsID" column. The column is autodetected for `--gensample2vcf`, can be the first or the second for `--hapsample2vcf` (depending on whether the `--vcf-ids` option is given), must be the first for `--haplegendsample2vcf`. * bcftools csq - Allow GFF files with phase column unset * bcftools filter - New `--mask`, `--mask-file` and `--mask-overlap` options to soft filter variants in regions (#1635) * bcftools +fixref - The `-m id` option now works also for non-dbSNP ids, i.e. not just `rsINT` - New `-m flip-all` mode for flipping all sites, including ambiguous A/T and C/G sites * bcftools isec - Prevent segfault on sites filtered with -i/-e in all files (#1632) * bcftools mpileup - More flexible read filtering using the options: --ls, --skip-all-set skip reads with all of the FLAG bits set --ns, --skip-any-set skip reads with any of the FLAG bits set --lu, --skip-all-unset skip reads with all of the FLAG bits unset --nu, --skip-any-unset skip reads with any of the FLAG bits unset The existing synonymous options will continue to function but their use is discouraged: --rf, --incl-flags Required flags: skip reads with mask bits unset --ff, --excl-flags Filter flags: skip reads with mask bits set * bcftools query - Make the `--samples` and `--samples-file` options work also in the `--list-samples` mode. Add a new `--force-samples` option which allows to proceed even when some of the requested samples are not present in the VCF (#1631) * bcftools +setGT - Fix a bug in `-t q -e EXPR` logic applied on FORMAT fields, sites with all samples failing the expression EXPR were incorrectly skipped. This problem affected only the use of `-e` logic, not the `-i` expressions (#1607) * bcftools sort - make use of the TMPDIR environment variable when defined * bcftools +trio-dnm2 - The --use-NAIVE mode now also adds the de novo allele in FORMAT/VA
PreviousNext