This tool performs kinship pruning (removing related individuals) from a cohort of interest, as described below. The protocol we use is to create a PED and MAP pair on the samples and variants of interest, run KING to estimate pairwise kinship coefficients, and then provide this script with the PED file (or alternately, our custom sample file format), along with the file output from KING. This will produce a new pared down PED file, in which we have removed at least one individual in every pair of relatives.
This method is fast, allows an analyst to not be concerned with manually removing relatives when building an initial sample set, and ensures consistency in that we will not mistakenly include a related pair if two samples are not annotated as being related. Specifying -v/--verbose generates a log of the actions performed in order to arrive at the final sample set.
N.B. the analyst must choose a set of SNVs to use to generate the PED, but we have built a set of SNPs that should be appropriate for most purposes, which is located at /nfs/goldstein/software/atav_home/data/variant/informative_snps.ld_pruned.37MB.txt. Briefly, this was generated by taking intermediate MAF SNPs from a large set of exome samples of varied ancestry, restricted to the targeted regions of the Nextera 37 MB kit (as it is the smallest subset of targeted regions for our samples), and finally LD-pruned.
Additionally, the parameter -r RELATEDNESS_THRESHOLD is by default 0.0884. This is the recommended value from the authors of KING to remove second-degree or greater relatives.
/nfs/goldstein/software/atav_home/lib/run_kinship.py --help usage: run_kinship.py [-h] [-r RELATEDNESS_THRESHOLD] [--sample_coverage_summary SAMPLE_COVERAGE_SUMMARY] [--seed SEED] [-v] [-o OUTPUT] PED_FILE KINSHIP_FILE Take a KING kinship file, PED/FAM/sample file (for getting phenotype), and optional coverage summary file in order to generate a list of samples to remove iteratively as follows: 1. remove affected that is related to the most other affecteds a. break ties by removing the affected distantly related to the most other affecteds b. break ties by removing the affected related to the most unaffecteds c. break ties by removing the affected distantly related to the most unaffecteds d. (optional) break ties by removing the affected with the least coverage 2. remove unaffected that is related to the most other unaffecteds a. break ties by removing the unaffected that is related to the most affecteds b. break ties by removing the unaffected that is distantly related to the most affecteds c. break ties by removing the unaffected that is distantly related to the most unaffecteds d. (optional) break ties by removing the unaffected with the least coverage 3. remove unaffected that is related the most affecteds a. break ties by removing the unaffected that is distantly related to the most affecteds b. break ties by removing the unaffected that is distantly related to the most unaffecteds c. (optional) break ties by removing the unaffected with the least coverage KING should be run like: king -b <bed_infile> --kinship --related --degree 3 Written by Brett Copeland <firstname.lastname@example.org> positional arguments: PED_FILE a PED/MAP/ATAV sample file to get phenotype KINSHIP_FILE the KING kinship output file to read optional arguments: -h, --help show this help message and exit -r RELATEDNESS_THRESHOLD, --relatedness_threshold RELATEDNESS_THRESHOLD consider kinship coefficients above this value to be related (default: 0.0884) --sample_coverage_summary SAMPLE_COVERAGE_SUMMARY break ties by removing the sample with the lowest coverage as indicated in this file (default: None) --seed SEED set a random seed to guarantee the same results each time (default: None) -v, --verbose verbose mode (default: False) -o OUTPUT, --output OUTPUT the output file (default: <open file '<stdout>', mode 'w' at 0x7f20474bc150>)