SAGA (Scalable Analysis of Genotypes & Annotations) represents the next generation of the IGM genotype database analysis system. SAGA will make possible the efficient analysis of cohort sizes unattainable with the current ATAV/AnnoDB approach, capable of scaling to sample sizes in the tens of thousands of genomes and upward. Since SAGA is based on the Apache Hadoop framework, leveraging modern high-performance ecosystem technologies such as Cloudera Impala and Apache Spark , there is no absolute upward limit to the amount of genotyped samples that can be stored and efficiently analyzed. Increasing the number of nodes within the cluster will grant a nearly linear increase in performance and capacity.

SAGA Sample Ingest

Samples that have been processed by the AnnoDB Mater Pipeline Script are eligible to load to the SAGA system.