Project

General

Profile

Annotation Level Filter Options

--effect: specify either multi effects (comma delimited) or a effect file, which contains one effect per line and can be a mix of INDEL and SNV effects.

New effect input format: impact:effect , Ex. HIGH:start_lost
New effect reference files located: /nfs/goldstein/software/atav_home/data/effect
Order of effects from most damaging to least damaging:

+----+----------+---------------------------------------------------+
| id | impact   | effect                                            |
+----+----------+---------------------------------------------------+
|  1 | HIGH     | chromosome_number_variation                       |
|  2 | HIGH     | exon_loss_variant                                 |
|  3 | HIGH     | frameshift_variant                                |
|  4 | HIGH     | rare_amino_acid_variant                           |
|  5 | HIGH     | splice_acceptor_variant                           |
|  6 | HIGH     | splice_donor_variant                              |
|  7 | HIGH     | start_lost                                        |
|  8 | HIGH     | stop_gained                                       |
|  9 | HIGH     | stop_lost                                         |
| 10 | HIGH     | transcript_ablation                               |
| 11 | HIGH     | gene_fusion                                       |
| 12 | HIGH     | bidirectional_gene_fusion                         |
| 13 | MODERATE | 3_prime_UTR_truncation+exon_loss_variant          |
| 14 | MODERATE | 5_prime_UTR_truncation+exon_loss_variant          |
| 15 | MODERATE | coding_sequence_variant                           |
| 16 | MODERATE | disruptive_inframe_deletion                       |
| 17 | MODERATE | disruptive_inframe_insertion                      |
| 18 | MODERATE | conservative_inframe_deletion                     |
| 19 | MODERATE | conservative_inframe_insertion                    |
| 20 | MODERATE | missense_variant+splice_region_variant            |
| 21 | MODERATE | missense_variant                                  |
| 22 | MODERATE | regulatory_region_ablation                        |
| 23 | MODERATE | splice_region_variant                             |
| 24 | MODERATE | TFBS_ablation                                     |
| 25 | LOW      | 5_prime_UTR_premature_start_codon_gain_variant    |
| 26 | LOW      | initiator_codon_variant                           |
| 27 | LOW      | initiator_codon_variant+non_canonical_start_codon |
| 28 | LOW      | splice_region_variant+synonymous_variant          |
| 29 | LOW      | splice_region_variant                             |
| 30 | LOW      | start_retained                                    |
| 31 | LOW      | stop_retained_variant                             |
| 32 | LOW      | synonymous_variant                                |
| 33 | MODIFIER | 3_prime_UTR_variant                               |
| 34 | MODIFIER | 5_prime_UTR_variant                               |
| 35 | MODIFIER | coding_sequence_variant                           |
| 36 | MODIFIER | conserved_intergenic_variant                      |
| 37 | MODIFIER | conserved_intron_variant                          |
| 38 | MODIFIER | downstream_gene_variant                           |
| 39 | MODIFIER | exon_variant                                      |
| 40 | MODIFIER | feature_elongation                                |
| 41 | MODIFIER | feature_truncation                                |
| 42 | MODIFIER | gene_variant                                      |
| 43 | MODIFIER | intergenic_region                                 |
| 44 | MODIFIER | intragenic_variant                                |
| 45 | MODIFIER | intron_variant                                    |
| 46 | MODIFIER | mature_miRNA_variant                              |
| 47 | MODIFIER | miRNA                                             |
| 48 | MODIFIER | NMD_transcript_variant                            |
| 49 | MODIFIER | non_coding_transcript_exon_variant                |
| 50 | MODIFIER | non_coding_transcript_variant                     |
| 51 | MODIFIER | regulatory_region_amplification                   |
| 52 | MODIFIER | regulatory_region_variant                         |
| 53 | MODIFIER | TF_binding_site_variant                           |
| 54 | MODIFIER | TFBS_amplification                                |
| 55 | MODIFIER | transcript_amplification                          |
| 56 | MODIFIER | transcript_variant                                |
| 57 | MODIFIER | upstream_gene_variant                             |
+----+----------+---------------------------------------------------+

--gene: specify a gene file to limit analysis to a given list of genes.

input can be either gene name with comma delimited or a text file with gene name per line

--gene-boundary: specify a gene-boundary file (space delimited) to indicate which exonic regions you want to include in your analysis. This file is defined by a gene name followed by its region(exon) information. When the gene-boundary option is specified, a variant not only has to be in the regions from the gene-boundary file but also has to match the gene name or gene domain name to be output.

Ex:

gene name: AVPR1B 1 (206224439..206225382,206230806..206231144) 1283
gene domain name: CFHR5_-_0 1 (196946795..196946852,196952015..196952209,196953091..196953095) 258

Note: There are 4 columns in this format,separated by space. Column 1 is the gene name or gene domain name; column 2 is the chromosome (1,2,...X,Y); column 3 is a list of regions(exons) that one wants to use to define the gene, separated by comma, enclosed by parenthesis, with each region in the format of region_start..region_end; column 4 is the total count of sites from all regions in column 3. The start/stop positions in gene-boundaries file is one based.
CCDS gene boundaries file directory: /nfs/goldstein/software/atav_home/data/ccds
Gene domain example file: /nfs/goldstein/software/atav_home/data/gene/dRVIS_domain_index_withoutUTR.txt

--ccds-only: restrict the analysis to variants with annotations in ccds genes only. (CCDS r14)

Reference file: /nfs/goldstein/software/atav_home/data/ccds_transcript.txt

--canonical-only: restrict the analysis to variants with annotations in canonical genes only.

--polyphen-humdiv {probably,possibly,benign,unknown}: restrict output to the specified polyphen prediction categories. By default, all 4 categories (probably, possibly, benign and unknown} will be output.
  1. If only one prediction type is specified, only variants matching the type will use used in analysis.
  2. If any combination of types is specified, only these types will be used in analysis and the most damaging score and prediction type will be reported.
  3. added "_damaging" to possibly and probably in output. The option itself is not changed.
  4. the filter is applied at annotation level / per transcript.
  • quantitative --> qualitative translation of polyphen scores:

(∞, 0) -> unknown
[0, 0.4335) --> benign
[0.4335, 0.9035) --> possibly
[0.9035, 1] --> probably

--polyphen-humvar {probably,possibly,benign,unknown}: same logic as above, but filter on polyphen humvar.