It has been the responsibility of the Bioinformatics team to prepare BAM and VCF files for transfer to dbGaP. The IT team has managed the physical data transfer, but they must first be provided a list of paths to the properly formatted data.
The BAM files produced by the alignment/genotyping pipeline need no modification prior to dbGaP transfer. The paths to the files can most easily be generated by querying the AlignSeqFileLoc column from tine SequenceDB.seqdbClone database table and appending the following string to the path:
VCF uploading requires a modification of the header section and a removal of the snpEff annotations, if they exist. Two scripts exist to automate this process:
createReplaceVcfHeaderScript.pl creates a bash script called
replace_header.sh, which can be run locally without any parameters or submitted to an SGE cluster using SGE.
remove_vcf_snpeff_annotations.pl is executed within the
replace_header.sh bash script.
perl goldsteinlab/Bioinformatics/scripts/createReplaceVcfHeaderScript.pl -s [sample_name] -t [sequence_type] -d [scratch directory]
perl goldsteinlab/Bioinformatics/scripts/createReplaceVcfHeaderScript.pl -s otepi7775y1 -t genome -d /nfs/seqscratch10/ALIGNMENT/samples/otepi7775y1/
Running the example code above creates the directory structure similar to the alignment/genotyping pipeline, rooted at the specified scratch directory. The created sub-directories are:
Logs/ , and
Scripts . The dbGaP-compatible VCF file along with intermediate output files will be created in the
combined/ directory, standard out and standard error logs will be written to the
Logs/ directory, and the script which actually does all of the work is created in the
Scripts/ directory, named
The final files produced will be called
[sample_name].analysisReady.vcf.gz (compressed by bgzip).
The final VCF will have had any SnpEff annotations removed, if they existed in the original VCF. This particular task is done by the utility script
remove_vcf_snpeff_annotations.pl reads a VCF from standard-in and writes a VCF to standard out, with annotations removed.