Project

General

Profile

Variant Level Filter Options

--rs-number: use a rs number file (one value per line) or a rs number list (comma delimited)

Ex. --rs-number /home/zr2180/atav_home/data/variant/rs-number.txt
Ex. --rs-number rs79585140,rs75454623,rs71252250

--variant: use a variant id file (one id per line) or a variant id list (comma delimited)

Ex. --variant /home/zr2180/atav_home/data/variant/CHGV_trios_dnmArtifactProneSites.txt
Ex. --variant 1-11133956-A-G,1-13038679-G-A,1-15439098-C-T
variant id: chr-pos-ref-alt or rs#

--exclude-variant: use a variant id file (one id per line)

standard artifact variant files:
/nfs/goldstein/software/atav_home/data/variant/CHGV_ESP_n654_REALCONTROLS_2x10-8.txt

this file lists protein-coding variants that achieved 2-tailed FET p<2x10-8 (corrected for the number of ESP protein-coding variants) for an allelic imbalance between CHGV 654 EA non-disease ascertained samples compared to ESP's 4300 EA samples.

/nfs/goldstein/software/atav_home/data/variant/CHGV_trios_dnmArtifactProneSites.txt

this file lists variants reported as a putative DNM across >5 unrelated probands from an initial set of 604 CHGV sequenced trios, after ensuring family structure.

--exclude-artifacts: this filter will be reserved to include only the highest significant (high confidence) artifacts that everyone can then opt to exclude. These variants were identified by Slave and Quanli as showing a significant frequency imbalance between about 650 European controls (mostly unaffected parents) in annodb and 4300 European samples in EVS despite good coverage. (recommend to be used for all analyses) - it leverages the two aforementioned standard variant files.

Reference file: /nfs/goldstein/software/atav_home/data/variant/CHGV_SuggestedExcludeVariants.txt
Other artifact information can be found here: http://redmine.igm.cumc.columbia.edu/projects/bioinfo_tools/wiki/Artifacts

--exclude-multiallelic-variant: exclude variant when the site has > 1 variants

--exclude-multiallelic-variant-2: exclude variant when the site has > 2 variants

--exclude-snv: exclude all SNV variants.

--exclude-indel: exclude all INDEL variants.

--evs-maf: this is the cutoff used for evs ctrls.

--evs-pop {all}: use this option to specify which population should be used for the evsmaf cutoff. (ea, aa or all)

--exclude-evs-qc-failed: exclude all evs qc failed variants.

--min-evs-all-average-coverage: specify a minimum evs all average coverage value.

--min-ctrl-average-coverage: specify a minimum IGM ctrl average coverage value.

--exac-pop is to specify which population (global, afr, amr, eas, sas, fin, nfe or oth) should be used for the exac maf cutoff. you can require the cutoff to be met in multiple populations by putting a comma between them, for example --exac-pop afr,amr,nfe.

--exac-af is to specify a threshold and it will require variants to pass the threshold in all populations from --exac-pop

--exac-maf is to specify a threshold and it will require variants to pass the threshold in all populations from --exac-pop

--min-exac-vqslod-snv is to specify a threshold and ATAV will only keep the SNVs if exac vqslod score is larger than the threshold.

Accordign to ExAC (ftp://ftp.broadinstitute.org/pub/ExAC_release/release0.1/README.known_issues): The VQSR 99.6% SNP Sensitivity is too conservative and filters ~17% of singletons. Our analysis of singleton TiTv, Doubleton transmission in Trios, validated de-novo mutations and comparison against PCR-Free WGS has shown filtering ~10% of singletons is a better trade off. This corresponds to VQSLOD > -2.632: we recommend using --min-exac-vqslod-snv -2.632 for all analyses.

--min-exac-vqslod-indel is to specify a threshold and ATAV will only keep the indelif exac vqslod score is larger than the threshold.

SNV and indel VQSLODs are calculated separately, so separate cutoffs are required. We recommend using --min-exac-vqslod-indel 1.2168 to fit with the 95% tranche cutoff we use for our genome indels, which is what ExAC used for their data.

--gnomad-exome-pop is to specify which population (global, controls, non_neuro, afr, amr, asj, eas, sas, fin, nfe, controls_afr, controls_amr, controls_asj, controls_eas, controls_sas, controls_fin, controls_nfe, non_neuro_afr, non_neuro_amr, non_neuro_asj, non_neuro_eas, non_neuro_sas, non_neuro_fin, non_neuro_nfe) should be used for the gnomad-exome af cutoff. you can require the cutoff to be met in multiple populations by putting a comma between them, for example --gnomad-exome-pop afr,amr,nfe.

--gnomad-exome-af is to specify a threshold and it will require variants to pass the threshold in all populations from --gnomad-exome-pop

--gnomad-exome-maf is to specify a threshold and it will require variants to pass the threshold in all populations from --gnomad-exome-pop

--gnomad-exome-rf-tp-probability-snv is to specify a threshold and ATAV will only keep the SNVs if rf_tp_probability is larger than the threshold.
Suggested cutoff: prob >= 0.1 (high-confidence SNVs)

--gnomad-exome-rf-tp-probability-indel is to specify a threshold and ATAV will only keep the INDELs if rf_tp_probability is larger than the threshold.
Suggested cutoff: prob >= 0.2 (high-confidence indels)

--gnomad-genome-pop is to specify which population (global, controls, non_neuro, afr, amr, asj, eas, fin, nfe, controls_afr, controls_amr, controls_asj, controls_eas, controls_fin, controls_nfe, non_neuro_afr, non_neuro_amr, non_neuro_asj, non_neuro_eas, non_neuro_fin, non_neuro_nfe) should be used for the gnomad-genome af cutoff. you can require the cutoff to be met in multiple populations by putting a comma between them, for example --gnomad-genome-pop afr,amr,nfe.

--gnomad-genome-af is to specify a threshold and it will require variants to pass the threshold in all populations from --gnomad-genome-pop

--gnomad-genome-maf is to specify a threshold and it will require variants to pass the threshold in all populations from --gnomad-genome-pop

--gnomad-genome-rf-tp-probability-snv is to specify a threshold and ATAV will only keep the SNVs if rf_tp_probability is larger than the threshold.
Suggested cutoff: prob >= 0.4 (high-confidence SNVs)

--gnomad-genome-rf-tp-probability-indel is to specify a threshold and ATAV will only keep the INDELs if rf_tp_probability is larger than the threshold.
Suggested cutoff: prob >= 0.4 (high-confidence indels)

If you are using gnomad filters, please make sure to use --exclude-igm-gnomad-sample

--min-gerp-score: specify a minimum gerp score. (include NA)

--min-trap-score: specify a minimum trap score. (include NA)
  1. trap filter apply to missense annotation when it failed to pass polyphen filter
  2. trap filter apply to missense annotation when polyphen filter not applied
--min-trap-score-non-coding: specify a minimum trap score. (include NA)
  1. trap filter apply to annotation that effect less damaging than missense_variant and not 5_prime_UTR_premature_start_codon_gain_variant

--min-revel-score: specify a minimum REVEL score. (include NA)

--min-primate-ai: specify a minimum PrimateAI score. (include NA)

--max-sub-rvis-domain-score-percentile: specify a maximum subRVIS Domain Score Percentile. (include NA)

--max-mtr-domain-percentile: specify a maximum MTR Domain Percentile. (include NA)

--max-sub-rvis-exon-score-percentile: specify a maximum subRVIS Exon Score Percentile. (include NA)

--max-mtr-exon-percentile: specify a maximum MTR Exon Percentile. (include NA)

--max-limbr-domain-percentile: specify a maximum LIMBR Domain Score Percentile. (include NA)

--max-limbr-exon-percentile: specify a maximum LIMBR Exon Score Percentile. (include NA)

--min-ccr-percentile: specify a maximum CCR Percentile. (include NA)

--discovehr-af: specify a maximum discovehr allele frequency. (include NA)

--mtr: specify a maximum value for cutoff. (include NA)

--mtr-fdr: specify a maximum value for cutoff. (include NA)

--mtr-centile: specify a maximum value for cutoff. (include NA)

Suggested cutoff for collapsing analyses is --mtr-centile 50

--min-mpc: specify a minimum MPC score. (include NA)

--min-pext-ratio: specify a minimum PEXT ratio. (include NA)

--known-var-only: use this option to restrict output variants only within ClinVar, HGMD or dbDSM variant set.

--max-genome-asia-af: specify a maximum GenomeAsia allele frequency. (include NA)

--genome-asia-maf: specify a GenomeAsia minor allele frequency. (include NA)

--max-iranome-af: specify a maximum Iranome allele frequency. (include NA)

--iranome-maf: specify a Iranome minor allele frequency. (include NA)

--max-gme-af: specify a maximum GME allele frequency. (include NA)

--gme-maf: specify a GME minor allele frequency. (include NA)

--max-top-med-af: specify a maximum TOPMed allele frequency. (include NA)

--top-med-maf: specify a TOPMed minor allele frequency. (include NA)

--include-evs: use this option to include EVS data in variant based output file.

--include-exac: use this option to include ExAC data in variant based output file.

--include-gnomad-exome: use this option to include gnomAD Exome data in variant based output file.

--include-gnomad-genome: use this option to include gnomAD Genome data in variant based output file.

--include-gerp: use this option to include Gerp data in variant based output file.

--include-known-var: use this option to include KnownVar data in variant based output file.

--known-var-pathogenic-only: use this option to only include Pathogenic ClinVar variants and DM HGMD variants (not exist from ClinVar).

--include-rvis: use this option to include RVIS data in variant based output file.

--include-sub-rvis: use this option to include subRVIS data in variant based output file.

--include-trap: use this option to include TraP data in variant based output file.

--include-mgi: use this option to include MGI data in variant based output file.

--include-denovo-db: use this option to include DenovoDB data in variant based output file.

--include-limbr: use this option to include LIMBR data in variant based output file.

--include-discovehr: use this option to include DiscovEHR data in variant based output file.

--include-mtr: use this option to include MTR data in variant based output file.

--include-revel: use this option to include REVEL data in variant based output file.

--include-primate-ai: use this option to include PrimateAI data in variant based output file.

--include-ccr: use this option to include CCR data in variant based output file.

--include-loftee: use this option to include LOFTEE data in variant based output file.

--include-gnomad-gene-metrics: use this option to tigger include gnomAD gene metrics fields data in variant based output file.

--include-mpc: use this option to include MPC data in variant based output file.

--include-pext: use this option to include PEXT data in variant based output file.

--include-genome-asia: use this option to include GenomeAsia data in variant based output file.

--include-iranome: use this option to include Iranome data in variant based output file.

--include-gme: use this option to include GME data in variant based output file.

--include-top-med: use this option to include TOPMed data in variant based output file.

Default filter rules for sex chromosome variants:

female & chr Y & outside Pseudoautosomal Regions --> excluded
male & Het & (chr X or chr Y) & outside Pseudoautosomal Regions --> excluded
variants on sex chromosomes in pseudoautosomal regions are treated the same as variants on autosomes.