Annotation Level Filter Options¶
--impact: specify multi impacts (comma delimited) from HIGH, MODERATE, LOW or MODIFIER, it will trigger to include all the associated effects.
--effect: specify either multi effects (comma delimited) or a effect file, which contains one effect per line and can be a mix of INDEL and SNV effects.
New effect input format: impact:effect , Ex. HIGH:start_lost
New effect reference files located: /nfs/goldstein/software/atav_home/data/effect
Order of effects from most damaging to least damaging:
+----+----------+---------------------------------------------------+ | id | impact | effect | +----+----------+---------------------------------------------------+ | 1 | HIGH | chromosome_number_variation | | 2 | HIGH | exon_loss_variant | | 3 | HIGH | frameshift_variant | | 4 | HIGH | rare_amino_acid_variant | | 5 | HIGH | splice_acceptor_variant | | 6 | HIGH | splice_donor_variant | | 7 | HIGH | start_lost | | 8 | HIGH | stop_gained | | 9 | HIGH | stop_lost | | 10 | HIGH | transcript_ablation | | 11 | HIGH | gene_fusion | | 12 | HIGH | bidirectional_gene_fusion | | 13 | MODERATE | 3_prime_UTR_truncation+exon_loss_variant | | 14 | MODERATE | 5_prime_UTR_truncation+exon_loss_variant | | 15 | MODERATE | coding_sequence_variant | | 16 | MODERATE | disruptive_inframe_deletion | | 17 | MODERATE | disruptive_inframe_insertion | | 18 | MODERATE | conservative_inframe_deletion | | 19 | MODERATE | conservative_inframe_insertion | | 20 | MODERATE | missense_variant+splice_region_variant | | 21 | MODERATE | missense_variant | | 22 | MODERATE | regulatory_region_ablation | | 23 | MODERATE | splice_region_variant | | 24 | MODERATE | TFBS_ablation | | 25 | LOW | 5_prime_UTR_premature_start_codon_gain_variant | | 26 | LOW | initiator_codon_variant | | 27 | LOW | initiator_codon_variant+non_canonical_start_codon | | 28 | LOW | splice_region_variant+synonymous_variant | | 29 | LOW | splice_region_variant | | 30 | LOW | start_retained | | 31 | LOW | stop_retained_variant | | 32 | LOW | synonymous_variant | | 33 | MODIFIER | 3_prime_UTR_variant | | 34 | MODIFIER | 5_prime_UTR_variant | | 35 | MODIFIER | coding_sequence_variant | | 36 | MODIFIER | conserved_intergenic_variant | | 37 | MODIFIER | conserved_intron_variant | | 38 | MODIFIER | downstream_gene_variant | | 39 | MODIFIER | exon_variant | | 40 | MODIFIER | feature_elongation | | 41 | MODIFIER | feature_truncation | | 42 | MODIFIER | gene_variant | | 43 | MODIFIER | intergenic_region | | 44 | MODIFIER | intragenic_variant | | 45 | MODIFIER | intron_variant | | 46 | MODIFIER | mature_miRNA_variant | | 47 | MODIFIER | miRNA | | 48 | MODIFIER | NMD_transcript_variant | | 49 | MODIFIER | non_coding_transcript_exon_variant | | 50 | MODIFIER | non_coding_transcript_variant | | 51 | MODIFIER | regulatory_region_amplification | | 52 | MODIFIER | regulatory_region_variant | | 53 | MODIFIER | TF_binding_site_variant | | 54 | MODIFIER | TFBS_amplification | | 55 | MODIFIER | transcript_amplification | | 56 | MODIFIER | transcript_variant | | 57 | MODIFIER | upstream_gene_variant | +----+----------+---------------------------------------------------+
--exclude-effect: use this option to exclude effects from the analysis. Ex. combined use with --impact option
--gene: specify a gene file to limit analysis to a given list of genes.
input can be either gene name with comma delimited or a text file with gene name per line
--gene-boundary: specify a gene-boundary file (space delimited) to indicate which exonic regions you want to include in your analysis. This file is defined by a gene name followed by its region(exon) information. When the gene-boundary option is specified, a variant not only has to be in the regions from the gene-boundary file but also has to match the gene name or gene domain name to be output.
Ex:
gene name: AVPR1B 1 (206224439..206225382,206230806..206231144) 1283
gene domain name: CFHR5_-_0 1 (196946795..196946852,196952015..196952209,196953091..196953095) 258Note: There are 4 columns in this format,separated by space. Column 1 is the gene name or gene domain name; column 2 is the chromosome (1,2,...X,Y); column 3 is a list of regions(exons) that one wants to use to define the gene, separated by comma, enclosed by parenthesis, with each region in the format of region_start..region_end; column 4 is the total count of sites from all regions in column 3. The start/stop positions in gene-boundaries file is one based.
CCDS gene boundaries file directory: /nfs/goldstein/software/atav_home/data/ccds
Gene domain example file: /nfs/goldstein/software/atav_home/data/gene/dRVIS_domain_index_withoutUTR.txt
--ccds-only: restrict the analysis to variants with annotations in ccds genes only. (CCDS r14)
Reference file: /nfs/goldstein/software/atav_home/data/ccds_transcript.txt
--canonical-only: restrict the analysis to variants with annotations in canonical genes only.
--polyphen-humdiv {probably,possibly,benign,unknown}: restrict output to the specified polyphen prediction categories. By default, all 4 categories (probably, possibly, benign and unknown} will be output.- If only one prediction type is specified, only variants matching the type will use used in analysis.
- If any combination of types is specified, only these types will be used in analysis and the most damaging score and prediction type will be reported.
- added "_damaging" to possibly and probably in output. The option itself is not changed.
- the filter is applied at annotation level / per transcript.
- quantitative --> qualitative translation of polyphen scores:
(
∞, 0) -> unknown
[0, 0.4335) --> benign
[0.4335, 0.9035) --> possibly
[0.9035, 1] --> probably
--polyphen-humvar {probably,possibly,benign,unknown}: same logic as above, but filter on polyphen humvar.