ATAV (Analysis Tool for Annotated Variants) is a command line tool that is designed to detect complex disease-associated rare genetic variants by performing association analysis on annotated variants derived from whole-genome or whole-exome sequencing data which are all stored in our centralized database - AnnoDB.
The AnnoDB database houses variants, variant calls, coverage depths, and sample meta data for high-throughput sequencing samples. The database is implemented as a mysql fully normalized schema. The system consists of a master server and a set of slave servers. The primarily ingest method for incrementally adding samples to the system is via the AnnoDB pipeline which annotates a single-sample VCF file, parses the VCF while inserting novel variants into the variant library on the master server, and printing tab-delimetted files containing the variant call data from the VCF which are then bulk-loaded to the slave servers using "mysql import". Coverage data is parsed from a compressed (gzip) mpileup file generated by the main alignment/genotyping pipeline and transformed into a custom, optimized text-based format, which is also bulk-loaded to the slave servers.
The slave servers host single-sample and cohort-level analysis queries, primarily as implemented in the ATAV.
Source code - https://github.com/igm-team/atav
Any questions, please contact Nick Ren (firstname.lastname@example.org)