This paper reports a comprehensive analysis of the dispersion and insertion polymorphism of the youngest known L1 subfamily (i.e., Ta) within the human genome; the computational approach described is an efficient and high-throughput method for the recovery of Ta L1Hs elements from the human genome.
The Ta (transcribed, subset a) subfamily of L1 LINEs (long interspersed elements) is characterized by a 3-bp ACA sequence in the 3' untranslated region and contains approximately 520 members in the human genome. The current study extracted 468 Ta L1Hs (L1 human specific) elements from the draft human genomic sequence and screened individual elements using polymerase-chain-reaction (PCR) assays to determine their phylogenetic origin and levels of human genomic diversity. All of the Ta L1 elements analyzed by PCR were absent from the orthologous positions in nonhuman primate genomes, except for a single element (L1HS72), which was also present in the common and pygmy chimpanzee genomes. Sequence analysis revealed that this single exception is the product of a gene conversion event involving an older pre-existing L1. Sequence analysis of the Ta L1 elements showed a low level of nucleotide divergence with an estimated age of 1.99 million years, suggesting that expansion of the L1 Ta subfamily occurred after the divergence of human and African apes. Forty-five percent (n = 115) of the Ta L1 elements were polymorphic regarding insertion presence or absence and will serve as identical-by-descent markers for the study of human evolution. 2 tables, 5 figures, and 69 references