This article reports on a project that developed a globally applicable resource of 92 SNPs for individual identification (IISNPs) with extremely low probabilities of any two unrelated individuals from anywhere in the world having identical genotypes.
An efficient method for uniquely identifying every individual would have value in quality control and sample tracking of large collections of cell lines or DNA, as is now often the case with whole genome association studies. Such a method would also be useful in forensics. SNPs represent the best markers for such purposes. In the current study, the SNPs were identified by screening over 500 likely/candidate SNPs on samples of 44 populations representing the major regions of the world. All 92 IISNPs have an average heterozygosity >0.4 and the F st values are all <0.06 on the 44 populations that composed a universally applicable panel irrespective of ethnicity or ancestry. No significant linkage disequilibrium (LD) occurs for all unique pairings of 86 of the 92 IISNPs (median LD = 0.011) in all the 44 populations. The remaining 6 IISNPs showed strong LD in most of the 44 populations for a small subset (7) of the unique pairings in which they occurred due to close linkage. A total of 45 of the 86 SNPs are spread across the 22 human autosomes and show very loose or no genetic linkage with each other. These 45 IISNPs constitute an excellent panel for individual identification, including paternity testing with associated probabilities of individual genotypes less than 10−15, smaller than achieved with the current panels of forensic markers. This panel also improves on an interim panel of 40 IISNPs previously identified using 40 population samples. The unlinked status of the subset of 45 SNPs identified in the project also makes them useful for situations that involve close biological relationships. Comparisons with random sets of SNPs illustrate the greater discriminating power, efficiency, and more universal applicability of this IISNP panel to populations around the world. The full set of 86 IISNPs that do not show LD can be used to provide even smaller genotype match probabilities in the range of 10−31–10−35, based on the 44 population samples studied. 18 references (publisher abstract modified)