TY - JOUR
TI - Modeling the feasibility of whole genome shotgun sequencing using a pairwise end strategy
AU - Siegel, A. F.
AU - van den Engh, G.
AU - Hood, L.
AU - Trask, B.
AU - Roach, J.
T2 - Genomics
AB - In pairwise end sequencing, sequences are determined from both ends of random subclones derived from a DNA target. Sufficiently similar overlapping end sequences are identified and grouped into contigs. When a clone’s paired end sequences fall in different contigs, the contigs are connected together to form scaffolds. Increasingly, the goals of pairwise strategies are large and highly repetitive genomic targets. Here, we consider large-scale pairwise strategies that employ mixtures of subclone sizes. We explore the properties of scaffold formation within a hybrid theory/simulation mathematical model of a genomic target that contains many repeat families. Using this model, we evaluate problems that may arise, such as falsely linked end sequences (due either to random matches or to homologous repeats) and scaffolds that terminate without extending the full length of the target. We illustrate our model with an exploration of a strategy for sequencing the human genome. Our results show that, for a strategy that generates 10-fold sequence coverage derived from the ends of clones ranging in length from 2 to 150 kb, using an appropriate rule for detecting overlaps, we expect few false links while obtaining a single scaffold extending the length of each chromosome.
DA - 2000/09/15/
PY - 2000
VL - 68
IS - 3
SP - 237
EP - 46
KW - Cloning
KW - Feasibility Studies
KW - Gene Library
KW - Genetic
KW - Genome
KW - Genomics
KW - Human
KW - Humans
KW - Models
KW - Molecular/methods
KW - Reproducibility of Results
KW - Statistical
ER -