Discovering and Annotating Transposable Elements

Repetitive DNA, especially that due to transposable elements (TEs), makes up a large fraction of most eukaryotic genomes.  Over 58% of the human genome is derived from transposition and duplication of these prototypically selfish sequences.  While only 1 in 20 humans have a new germline TE insertion, the somatic TE activity in tumors, during neural development in the brain, or during creation of pluripotent stem cells has a dramatic impact on the genome and potentially on human health.

We have developed two of the leading software packages used in the study of TEs: RepeatMasker (http://www.repeatmasker.org) for the high-quality annotation of TE copies in a genomic sequence, and RepeatModeler for the de novo discovery of TE families in newly sequenced genomes.  In addition, we are developing the Dfam resource (https://www.dfam.org): a comprehensive database of TE families, sequence models, and genome annotations for eukaryotic genomes.

Current Project Leads:

Arian SmitRobert Hubley

Image attributed to John Kauffman