SubRVIS is a gene sub-region based score from the RVIS franchise intended to help in the interpretation of human sequence data. It provides users with a score denoting the degree of intolerance of the exon or protein domain in which a variant falls.
The protein domain sub-regions are defined as the regions within the gene that aligned to the Conserved Domain Database (http://www.ncbi.nlm.nih.gov/cdd/) and the unaligned regions between each conserved domain alignment.
In its current form, subRVIS is based upon allele frequency as represented in whole exome sequence data from the NHLBI-ESP6500 data set. The score is designed to rank the genic sub-regions based on their intolerance to functional variation. A lower score indicates a more intolerant sub-region, while a higher score indicates a more tolerant sub region.
Note, interpretation of subRVIS scores should be done in conjunction with the EVS coverage of the assessed regions. Low-coverage regions' scores are based on less information and should be treated as such. Coverage information can be found at www.subrvis.org.
ATAV has a function for listing subRVIS scores (List_subRVIS).
If you have questions about this dataset, email Ayal Gussow (email@example.com).
OEratio(CDD) and OEratio(exon)¶
OEratio is a complementary approach to subRVIS for when there is limited resolution in the sequence region of interest for RVIS to excel. OEratio scores (and percentiles) are now available for each of the same precise CDD and exons used in Gussow et al (2016). In the current formulation, the expected rate leverages the genic mutability and the observed rate is based on the rate of non-synonymous variants identified in the subregion of interest based on the ExAC (release 0.3) standing variation data from ~60,706 samples.
Work is unpublished. Refinements are underway so final scores are subject to modification. A note will be posted if the scores are updated. For severe pediatric and neurological disorders, to achieve highest specificity the recommendation based on application on real-world datasets is to focus on mutations affecting regions with a corresponding OEratio(CDD) percentile <10%. 5% can be used for greatest specificity. For a guide, please see the attached brief slide set. The greatest benefit appears when used in conjunction with "probably" damaging classifications by PolyPhen-2 (HVar) - see slides.
Note, OEratio(CDD) and OEratio(exon) are only currently recommended for use when interpreting missense mutations. For LoF alleles currently recommend relying on the gene's FDR depletion score and pLI (see: https://redmine.igm.cumc.columbia.edu/projects/atav/wiki/List_RVIS).
If you have questions about the OEratio scores, e-mail Slavé Petrovski (firstname.lastname@example.org).