Project

General

Profile

Site Coverage Comparison

Command examples:

atav.sh --site-coverage-comparison --sample PATH_TO_SAMPLE_FILE --gene-boundary PATH_TO_GENE_BOUNDARY_FILE --min-coverage 10 --out PATH_TO_OUTPUT

Command options:

--site-coverage-comparison: trigger coverage comparison analysis.

--site-max-percent-cov-difference: Include this option to set user defined cutoff value between 0 and 1 for maximum average coverage difference between case and control groups. if no user option is specified, ATAV will use the automated cutoff value for the absolute mean coverage difference instead.

abs(case#/total#/(#sites_in_region) - ctrl#/total#/(#sites_in_region)) < cutoff

case# and ctrl# is sum of all qualified samples (passed your --min-coverage cutoff) on one site
total# is total samples#
#sites_in_region is 1

Output:

site.summary.csv: This file outputs the summary coverage information for each site for case control groups separately.

  • Gene: The gene name the site is associated with.
  • Chr: The chromosome the site is located.
  • Pos:Site location within the chromosome(1-based).
  • Site coverage: Total number of samples with sufficient coverage.
  • Site coverage Case: Total number of case samples with sufficient coverage.
  • Site coverage Control: Total number of control samples with sufficient coverage.

site.clean.txt: a 'cleaned'/'filtered' version of gene boundary file that can be used to run Collapsing analysis.

Note: This cleaned gene boundary file excluded sites that have more than a minimum coverage difference (either user specified or automatically calculated) between the cases and controls.

coverage.summary.csv and coverage.summary.clean.csv are the same as the one in Coverage Comparison.

sample.summary.csv and coverage.details.csv are the same as the one in Coverage Summary.

Note:

If a gene appears multiple times on different chromosomes, only the first appearance will be analyzed at the moment. This means genes that are in the pseudoautosomal region on the X and Y will only have their X chr data analyzed at the moment. This problem will eventually be resolved by combining the data for both genes into one analysis.