Project

General

Profile

ExAC

Release 0.3 of the ExAC ( http://exac.broadinstitute.org ) data has been uploaded to AnnoDB and can be used in ATAV filtering.
The source data is located at /nfs/seqscratch10/ExAC/v0.3. Version 0.1 source data can be found at /nfs/seqscratch10/ExAC/v0.1.

The data was processed as followed:

  • The master vcf file was split down into individual chromosome files with ExAC’s annotations removed.
  • Each file was then annotated with SnpEff v3.3 (The same used in our AnnoDB pipeline).
  • Each chromosome file was divided into separate SNV and indel files.
  • Indel files were then run through Yujun’s indel matching script (/nfs/goldstein/goldsteinlab/Bioinformatics/scripts/CHGV_variant_matcher1.7.pl)
  • All indels >100bp were excluded by Yujun’s script
  • Each file was then processed by a custom python script to convert the vcf into a flat file. (/nfs/seqscratch10/ExAC/v0.3/convert_ExAC_v3_VCF_To_AnnoDB.py)
    • GTS fields (HmzRef/Het/HmzAlt) were calculated as follows:
      • Total = AN_Adj / 2
      • HmzRef = Total - HmzAlt - Het
        • For X: All HmzRef values for the X chromosome are “NA” due to a lack of information.
        • For Y: HmzRef = Total - Hemi
      • HmzAlt = Hom_[Ethnic Group]
      • Het = AC_[Ethnic Group] - HmzAlt *2
      • Hemi = Hemi_[Ethnic Group]
  • All SNV files were inserted into database table snv_maf_r03.
  • All indel files were inserted into database table indel_maf_r03.
  • Coverage data did not change for this release however the sample count was lowered to 60,706. The column covered_10x in the coverage_03 table was recalculated.
    • Covered_10x = round(perc_covered_10x * 60706)

After processing the data has been loaded to an ExAC logical database on each AnnoDB slave/query mysql server: sva0, sva5, sva8, sva10, and sva12. Within the ExAC database, there is a single table for snv allele frequencies, indel allele frequencies, and one for aggregate coverage summary data.

The table data is described below:

summary of ExAC release03
60,706 (121,412 chromosomes) Basement MAF (singleton) = 0.00082%
AFR African & AA 5,203 (10406 chr; baseline MAF=0.0096%)
LAT Latino 5,789 (11578 chr; baseline MAF=0.0086%)
EAS East Asian 4,327 (8654 chr; baseline MAF=0.0116%)
FIN Finnish 3,307 (6614 chr; baseline MAF=0.0151%)
NFE Non-Finnish European 33,370(66740 chr; baseline MAF=0.0015%)
SAS South Asian 8,256 (16512 chr; baseline MAF=0.0061%)
OTH Other 454 (908 chr; baseline MAF=0.1101%)

snv_maf_r03

+--------------------+-------------+------+-----+---------+-------+
| Field              | Type        | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+-------+
| chr                | varchar(10) | NO   | PRI | NULL    |       |
| pos                | int(9)      | NO   | PRI | NULL    |       |
| ref_allele         | varchar(1)  | NO   |     | NULL    |       |
| alt_allele         | varchar(1)  | NO   | PRI | NULL    |       |
| total_chr          | int(6)      | NO   |     | NULL    |       |
| unfiltered_af      | float       | NO   |     | NULL    |       |
| global_af          | float       | NO   |     | NULL    |       |
| global_gts         | varchar(25) | NO   |     | NULL    |       |
| AFR_af             | float       | NO   |     | NULL    |       |
| AFR_gts            | varchar(25) | NO   |     | NULL    |       |
| AMR_af             | float       | NO   |     | NULL    |       |
| AMR_gts            | varchar(25) | NO   |     | NULL    |       |
| EAS_af             | float       | NO   |     | NULL    |       |
| EAS_gts            | varchar(25) | NO   |     | NULL    |       |
| OTH_af             | float       | YES  |     | NULL    |       |
| OTH_gts            | varchar(25) | YES  |     | NULL    |       |
| SAS_af             | float       | NO   |     | NULL    |       |
| SAS_gts            | varchar(25) | NO   |     | NULL    |       |
| FIN_af             | float       | NO   |     | NULL    |       |
| FIN_gts            | varchar(25) | NO   |     | NULL    |       |
| NFE_af             | float       | NO   |     | NULL    |       |
| NFE_gts            | varchar(25) | NO   |     | NULL    |       |
| qual               | float       | YES  |     | NULL    |       |
| VQSLOD             | float       | YES  |     | NULL    |       |
| GQ_MEAN            | float       | YES  |     | NULL    |       |
| GQ_STDDEV          | float       | YES  |     | NULL    |       |
| qual_by_depth_QD   | float       | YES  |     | NULL    |       |
| strand_bias_FS     | float       | YES  |     | NULL    |       |
| rms_map_qual_MQ    | float       | YES  |     | NULL    |       |
| base_qual_rank_sum | float       | YES  |     | NULL    |       |
| map_qual_rank_sum  | float       | YES  |     | NULL    |       |
| read_pos_rank_sum  | float       | YES  |     | NULL    |       |
| inbreeding_coeff   | float       | YES  |     | NULL    |       |
| clipping_rank_sum  | float       | YES  |     | NULL    |       |
| culprit            | varchar(25) | YES  |     | NULL    |       |
+--------------------+-------------+------+-----+---------+-------+

indel_maf_r03

+--------------------+---------------+------+-----+---------+-------+
| Field              | Type          | Null | Key | Default | Extra |
+--------------------+---------------+------+-----+---------+-------+
| chr                | varchar(10)   | NO   | PRI | NULL    |       |
| pos                | int(9)        | NO   | PRI | NULL    |       |
| ref_allele         | varchar(1024) | NO   | PRI | NULL    |       |
| alt_allele         | varchar(1024) | NO   | PRI | NULL    |       |
| total_chr          | int(6)        | NO   |     | NULL    |       |
| unfiltered_af      | float         | NO   |     | NULL    |       |
| global_af          | float         | NO   |     | NULL    |       |
| global_gts         | varchar(25)   | NO   |     | NULL    |       |
| AFR_af             | float         | NO   |     | NULL    |       |
| AFR_gts            | varchar(25)   | NO   |     | NULL    |       |
| AMR_af             | float         | NO   |     | NULL    |       |
| AMR_gts            | varchar(25)   | NO   |     | NULL    |       |
| EAS_af             | float         | NO   |     | NULL    |       |
| EAS_gts            | varchar(25)   | NO   |     | NULL    |       |
| OTH_af             | varchar(25)   | YES  |     | NULL    |       |
| OTH_gts            | varchar(25)   | YES  |     | NULL    |       |
| SAS_af             | float         | NO   |     | NULL    |       |
| SAS_gts            | varchar(25)   | NO   |     | NULL    |       |
| FIN_af             | float         | NO   |     | NULL    |       |
| FIN_gts            | varchar(25)   | NO   |     | NULL    |       |
| NFE_af             | float         | NO   |     | NULL    |       |
| NFE_gts            | varchar(25)   | NO   |     | NULL    |       |
| qual               | float         | YES  |     | NULL    |       |
| VQSLOD             | float         | YES  |     | NULL    |       |
| GQ_MEAN            | float         | YES  |     | NULL    |       |
| GQ_STDDEV          | float         | YES  |     | NULL    |       |
| qual_by_depth_QD   | float         | YES  |     | NULL    |       |
| strand_bias_FS     | float         | YES  |     | NULL    |       |
| rms_map_qual_MQ    | float         | YES  |     | NULL    |       |
| base_qual_rank_sum | float         | YES  |     | NULL    |       |
| map_qual_rank_sum  | float         | YES  |     | NULL    |       |
| read_pos_rank_sum  | float         | YES  |     | NULL    |       |
| inbreeding_coeff   | float         | YES  |     | NULL    |       |
| clipping_rank_sum  | float         | YES  |     | NULL    |       |
| culprit            | varchar(25)   | YES  |     | NULL    |       |
+--------------------+---------------+------+-----+---------+-------+

coverage_03

+-------------+----------------------+------+-----+---------+-------+
| Field       | Type                 | Null | Key | Default | Extra |
+-------------+----------------------+------+-----+---------+-------+
| chr         | varchar(10)          | NO   | PRI | NULL    |       |
| pos         | int(9) unsigned      | NO   | PRI | NULL    |       |
| mean_cvg    | float                | NO   |     | NULL    |       |
| covered_10x | smallint(5) unsigned | NO   |     | NULL    |       |
+-------------+----------------------+------+-----+---------+-------+