Project

General

Profile

AnnoDB

ATAV program launch file

From (old command):

/nfs/goldstein/software/sh/atav_annodb.sh

Command option:

--function: specify either multi functions (comma delimited) or a function file, which contains one function per line and can be a mix of INDEL and SNV functions (Note: you can still use '--exclude-snv' or '--exclude-indel' if desired.)
By default will list all variants include INTERGENIC ones.

Standard function files:
/nfs/goldstein/software/atav_home/data/function/all.txt
/nfs/goldstein/software/atav_home/data/function/codingandsplice.txt
/nfs/goldstein/software/atav_home/data/function/functional.txt
/nfs/goldstein/software/atav_home/data/function/stopandstart.txt
/nfs/goldstein/software/atav_home/data/function/transcript.txt
The available SNV functional list: 
STOP_GAINED, STOP_LOST, START_LOST, NON_SYNONYMOUS_CODING, START_GAINED, NON_SYNONYMOUS_START,
SPLICE_SITE_ACCEPTOR, SPLICE_SITE_DONOR, INTRON_EXON_BOUNDARY, UTR_5_PRIME, UTR_3_PRIME,
SYNONYMOUS_CODING, SYNONYMOUS_START, SYNONYMOUS_STOP, EXON, UPSTREAM, DOWNSTREAM, INTRON, INTRAGENIC
The available INDEL functional list:
STOP_GAINED, FRAME_SHIFT, STOP_LOST, START_LOST, CODON_CHANGE_PLUS_CODON_DELETION, CODON_CHANGE_PLUS_CODON_INSERTION,
EXON_DELETED, CODON_DELETION, CODON_INSERTION, SPLICE_SITE_ACCEPTOR, SPLICE_SITE_DONOR, INTRON_EXON_BOUNDARY, 
UTR_3_DELETED, UTR_5_DELETED, UTR_5_PRIME, UTR_3_PRIME, EXON, UPSTREAM, DOWNSTREAM, INTRON, INTRAGENIC
Most damaging ranking list:

STOP_GAINED > FRAME_SHIFT > STOP_LOST > START_LOST > NON_SYNONYMOUS_CODING > START_GAINED > NON_SYNONYMOUS_START >
EXON_DELETED > CODON_CHANGE_PLUS_CODON_DELETION > CODON_CHANGE_PLUS_CODON_INSERTION > CODON_DELETION > CODON_INSERTION > 
SPLICE_SITE_ACCEPTOR > SPLICE_SITE_DONOR > INTRON_EXON_BOUNDARY > UTR_5_DELETED > UTR_3_DELETED > UTR_5_PRIME >
UTR_3_PRIME > SYNONYMOUS_CODING > SYNONYMOUS_START > SYNONYMOUS_STOP > EXON > UPSTREAM > DOWNSTREAM > INTRON > INTRAGENIC

When a variant has multiple annotations, the 'Function' column in the output will list 
the most damaging function for that variant that is included in the functionlist in your ATAV command. 
The 'Gene Transcript' column will list all functions that are included in the functionlist in your ATAV command.

--max-ctrl-maf: specify a maximum variant allele frequency in controls. For example, if one specifies "--max-ctrl-maf 0.05", ATAV will load variants with frequencies that are either <= 0.05 or >= 0.95.

--loo-maf: the maf is calculated based on all samples in sample file (ignoring the one where the variant was observed)

--exac-maf is to specify a threshold and it will require variants to pass the threshold in all populations from --exac-pop

--var-status {all}: specify pass, pass+intermediate or all.

Pass/intermediate/fail status is based on VQSLOD scores/tranches from GATK’s VQSR module. The VQSR module uses known SNV sites in this case from HapMap v3.3, dbSNP, and the Omni chip array from the 1000 Genomes Project and known indels sites from the Mills dataset and high quality indels from the 1000 Genomes Project to estimate the probability that each variant called is a true variant as opposed to an artifact.
For SNVs in genomes, PASS is a VQSLOD tranche below 99.9%, and FAIL is a tranche from 99.9-100%.
For SNVs in exomes, PASS is a VQSLOD tranche below 99%, INTERMEDIATE is a VQSLOD tranche from 99-99.9%, and FAIL is a tranche from 99.9-100%.
For indels in genomes, PASS is a VQSLOD tranche below 95%, INTERMEDIATE is a VQSLOD tranche from 95-99%, and FAIL is a tranche from 99-100%.
For indels in exomes, hard filters are used: PASS requires QD>2, FS<200, and ReadPosRankSum>-20; all else is FAIL.

--min-coverage: specify a minimum coverage (read depth). (Only accept value 3, 10, 20, 201)

--hap-score: specify a maximum haplotype_score.

--snv-hap-score: specify a minimum haplotype_score for SNVs.
--indel-hap-score: specify a minimum haplotype_score for indels.

Output field:

  • CADD Score Phred: a higher score indicates more likely to be deleteriousness (24487276) (CADD) score v1.3.
  • Polyphen Humdiv Score: PolyPhen-2 HumDiv Score for missense variants from Ensembl server (ATAV outputs most damaging score within CCDS transcripts - this is so as not to artificially inflate the score based on unusual transcripts)
  • Polyphen Humdiv Prediction: PolyPhen-2 HumDiv Classification for missense variants from Ensembl server (ATAV outputs most damaging score within CCDS transcripts - this is so as not to artificially inflate the score based on unusual transcripts)
  • Polyphen Humvar Score: PolyPhen-2 HumVar Score for missense variants from Ensembl server (ATAV outputs most damaging score within CCDS transcripts - this is so as not to artificially inflate the score based on unusual transcripts)
  • Polyphen Humvar Prediction: PolyPhen-2 HumVar Classification for missense variants from Ensembl server (ATAV outputs most damaging score within CCDS transcripts - this is so as not to artificially inflate the score based on unusual transcripts)
  • Function: Predicted effect of variant based on Ensembl (via SNPEff)
  • Genotype: hom ref or het or hom
  • Samtools Raw Coverage: this is the coverage value for this site from samtools
  • Gatk Filtered Coverage: this is the coverage value for this site from gatk, with certain reads filtered out
  • Pass Fail Status: pass if highest quality, intermediate if not so sure about it, fail if likely to be false; failed variants were not included in this analysis
  • Is Minor Ref: FALSE = alternative allele is less frequent; TRUE = reference allele is less frequent (based on your input control samples)
  • Haplotype Score: quality stat from GATK

Submitting Jobs:

Submitting jobs is done using the qsub command.

[zr2180@igm-atav-qsub01 ~]$ qsub -V ...
Your job 8186300 ("atav") has been submitted
  • The -V option to qsub states that the job should have the same environment variables as the shell executing qsub (recommended)

Example Command:

qsub -V /nfs/goldstein/software/sh/atav.sh \
--list-var-geno \
--sample /nfs/goldstein/software/atav_home/data/sample/ALS_1424_DukeGr_ctrl.txt \
--function /nfs/goldstein/software/atav_home/data/function/functional.txt \
--gene TBK1 \
--ctrl-maf 0.01 \
--min-coverage 10 \
--out ~/hello_atav