Project

General

Profile

EVS individual data

Individual EVS VCFs were downloaded containing individual genotypes for ~5,500 samples of the 6503 that are available in the aggregate dataset. The other 1000 samples are not available for individual control use and cannot be downloaded.

Approval was granted for the following 15 dbGaP studies/accessions:

  1. JHS Heart Cohorts phs000402
  2. MESA Heart Cohorts phs000403
  3. Bronchiectasis phs000518
  4. Familial A-Fib phs000362
  5. ARIC Heart Cohorts phs000398
  6. CARDIA phs000399
  7. WHI phs000281
  8. PAH Lung Cohort phs000290
  9. Asthma Lung Cohort phs000422
  10. Aortic Disease phs000347
  11. FHS Heart Cohorts phs000401
  12. COPDGene ESP phs000296
  13. CHS Heart Cohorts phs000400
  14. LungGO LHS COPD phs000291
  15. Cystic Fibrosis Lung Cohort phs000254

In AnnoDB, the CHGVID for each sample from these studies takes the form:

evs_[aa or ea]_[study accession]_[subject ID]

Some examples:
evs_aa_phs000281_700117
evs_ea_phs000281_713590
evs_ea_phs000254_5862v
evs_ea_phs000401_bi_12785v

The 'v' at the end of the third & fourth examples indicate that the string following the third underscore ('_') in the CHGVID ('5862' and 'bi_12785') is the name of the sample in the VCF, not the dbGaP subject ID. This occurred when no subject ID could be determined due to missing phenotype or meta-data.

The original dbGaP download location for all files across these studies is dispersed within the /nfs/seqscratch10/tx_temp/dbGaP-6013/ directory. The sub-directories containing data for each sub-directory are given in the attached spreadsheet, as well as the number of samples, VCF files, and samples-per-VCF file within each study.

The working directory used to load all of the VCF genotype data into AnnoDB is located at /nfs/seqscratch10/ANNOTATION/GROUP_DATA/ . There is a sub-directory for each study accession. Within each study accession directory there is a sub-directory for each VCF file for that study.

The files used to link the phenotype data & and SraRunTable entries to the VCF files were produced by a combined team effort in the bioinformatics group. They are located within the directory /nfs/seqscratch10/ANNOTATION/GROUP_DATA/evs_sample_lists/ as *.csv files.