A simple method was developed to detect signatures of ongoing selective sweeps in single nucleotide polymorphism (SNP) data. Based largely on the traditional site frequency spectrum (SFS), the method additionally incorporates linkage disequilibrium (LD) between pairs of SNP sites and uniquely represents both SFS and LD information as hierarchical “barcodes.” This barcode representation allows the identification of a hitchhiking genomic region surrounding a putative target site of positive selection, or a core site. Sweep signals at linked neutral sites are then measured by the proportion (F c ) of derived alleles within the hitchhiking region that are linked in the derived allele group defined at the core site. In measuring F c or intra-allelic variability in an informative way, certain conditions for derived allele frequencies are required, as illustrated with the human ST8SIA2 locus. Coalescent simulators with and without positive selection are used to assess the false-positive and false-negative rates of the F c statistic. To demonstrate its power, the method was further applied to the LCT, OCA2, EDAR, SLC24A5 and ASPM loci, which are known to have undergone positive selection in human populations. Overall, the method is powerful and can be used to identify core sites responsible for ongoing selective sweeps.
All Science Journal Classification (ASJC) codes