Project

General

Profile

Coverage Comparison

Command examples:

atav.sh --coverage-comparison --sample PATH_TO_SAMPLE_FILE --gene-boundary PATH_TO_GENE_BOUNDARY_FILE --min-coverage 10 --out PATH_TO_OUTPUT

Command options:

--coverage-comparison: trigger coverage comparison analysis.

--exon-max-percent-cov-difference: Include this option to set user defined cutoff value(between 0 and 1) for maximum average coverage difference between case and control groups. if no user option is specified, ATAV will use the automated cutoff value for the absolute mean coverage difference instead.

abs(case#/total#/(#sites_in_region) - ctrl#/total#/(#sites_in_region)) < cutoff

case# and ctrl# is sum of all qualified samples (passed your --min-coverage cutoff) on one exon
total# is total samples# * #sites_in_region
#sites_in_region is the number of sites in one exon

--quantitative: specify a quantitative file to include all your interested samples and relevant quantitative traits.

File format: sample name and quantitative trait (tab delimiter)

Output:

coverage.summary.by.exon.csv: This file provides the average coverage information at the exon level for both cases and controls.

  • Exon: The Gene_Exon identifier.
  • AvgCase: Average percentage of bases in cases per exon which pass the minimum coverage threshold.
  • AvgCtrl: Average percentage of bases in controls per exon which pass the minimum coverage threshold.
  • AbsDiff: Absolute difference between AvgCase and AvgCtrl at the exon level.
  • Length: The length of given exon.

exon.clean.txt: a 'cleaned'/'filtered' version of gene boundary file that can be used to run Collapsing analysis.

Note: This filtered gene boundary file excluded exons that have more than a minimum coverage difference (either user specified or automatically calculated) between the cases and controls.

coverage.summary.csv: This file provides the average coverage information at the gene level for both cases and controls.

  • Gene: The gene name extracted from the input gene-boundaries file.
  • Chr
  • AvgCase: Average percentage of bases in cases per gene which pass the minimum coverage threshold.
  • AvgCtrl: Average percentage of controls in cases per gene which pass the minimum coverage threshold.
  • AbsDiff: Absolute difference between AvgCase and AvgCtrl for the gene.
  • Length: Total number of bases extracted from the gene-boundaries file for the gene.

coverage.summary.clean.csv: This file provides the average coverage information at the gene level for both cases and controls after exons are 'cleaned'/'filtered'. The format is the same as that of coverage.summary.csv.

sample.summary.csv and coverage.details.csv are the same as the one in Coverage Summary.

Note:
If a gene appears multiple times on different chromosomes, only the first appearance will be analyzed at the moment. This means genes that are in the pseudoautosomal region on the X and Y will only have their X chr data analyzed at the moment. This problem will eventually be resolved by combining the data for both genes into one analysis.